0x00 - Understanding Format String

Table of contents

In this blog, we will develop our own printf function, and of course with some limitation. The goal of creating our own printf function is to gain a better grasp of how it works and to have more control over the output of your applications.

Alert 🚨

To summarize, while it is feasible to develop a function similar to printf without using the printf() function, it would be a time consuming effort, and the existing printf() function provides a reliable and well-tested implementation.

Before we start to understand about format string vulnerability, lets understand about why it is exists at the first place.

What is format string in C

Format string or format specifier is a string used in formatted input and output function. String formatting plays an important role in computer programming because it allows more efficient data representation and increases readability. The C programming language includes various built-in methods and libraries that allow you to modify and format strings as needed. The printf() function, which is part of the stdio.h library, is the essential tool for string formatting in C. On the output screen, this function allows you to display variables, text, and structured data [1]. It is a special kind of ANSI C function, that takes a variable number of arguments, from which one is called format string.

Creating basic printf

Once we understand basic functionality of format string, we can start to create our custom printf function. Before we jump right into it, below are the test cases for our format parameters:

  • _printf("%c\n", 'a');
  • _printf("%s\n", "abc");
  • _printf("%d\n", 123);
  • _printf("%d\n", -123);
  • _printf("%x\n", 123);
  • _printf("123459%n\n", &overwrite_int);

Noted that we’ll not cover:

  • 2’ complement for hex
  • precision, width and length of data type in the memory
  • only cover int not long/short size. Therefore, %p cannot be implemented

Handling variables

printf by default can have multiple arguments and the number of arguments may be vary. In order to solve this problem, we can use variadic function. Variadic function is a C function that take variable number of arguments[2]. This function is helpful especially when the arguments needed is unknown. To use this function, we need to define <stdarg.h> header file. Variadic function have four macros, which are:

  • va_list: hold arguments
  • va_start: initialize arguments list
  • va_arg: fetch next arguments
  • va_end: cleanup va_list variable

Start Building Code

First we can start a function with initialize the arguments list.

int _printf(const char *format, ...) {
    // ...
    va_list args; // hold arguments
    va_start(args, format); // Initialize arguments list

The function will take two arguments:

  • a string with format specifiers
  • since the number of variable passed to our _printf is unknown, we can use ... as a way to tell the application to take all variables supplied by user

Next we want to loop over all the character in the format.

    char print_buf[1024];
    // ...
    for(str = print_buf; *format; format++) {

Then we need to check if the character contains β€˜%’ or not. If it has β€˜%’, we move to the next character to see if it contains format specifier by iterating it, otherwise store the character into the buffer.

if(*format != '%') { // if the character is not '%' move to next character
    *str++ = *format;
    printed++;

    continue;
}

++format; // if there is '%', move to the next character

What we going to do after β€˜%’? We do need to check if the character is fit with valid format specifier such as β€˜%s’ and β€˜%d’.

Case 1: %c

If the the user supply c, we can call the variable in the list by using va_arg(args, int) as an integer and store it into the buffer.

switch (*format) {
    case 'c':
        *str++ = va_arg(args, int); // take next variable as an integer
        continue;
}

Case 2: %s

%s means to print string. For string case also pretty straight forward except we need to loop into the string and print the character one by one.

case 's':
    s = va_arg(args, char *); // fetch next arguments as char*
    // printed++;
    while(*s) {
        *str++ = *s++;
    }
    continue;

Case 3: %d or %i

For integer is little bit complex. First, fetch the next variable as an unsigned integer and store it into variable. Then we need to convert the integer into string since we need to print it as a string. For example, if we want to print a character, we can use putchar() to print the character into our screen. The problem is putchar() function accept integer only, so if we want to print integer 65, it will print β€˜A’ instead of β€˜65’. This is because putchar will convert the number into ascii character.

❯ cat int.c
int main() {
    int n = 65;
    putchar(n);
    return 0;
}

❯ gcc int.c -o int && ./int
A

In order to solve this problem, I took the function from online and tweak it to accept hex value(we’ll get into it later).

char *itoa(int value, char *str, char *buff, int base) {
    char* ptr = str, *ptr1 = str, tmp_char;
    int tmp_value;
    char tmpBuffer[32];
    char *pTmpBuffer = tmpBuffer;

    do {
        tmp_value = value;
        value /= base;
        *pTmpBuffer++ = "zyxwvutsrqponmlkjihgfedcba9876543210123456789abcdefghijklmnopqrstuvwxyz" [35 + (tmp_value - value * base)];
    }while(value);

    // apply negative sign
    if(tmp_value < 0) {
        *str++ = '-';
        if(base == 16) {
            *str++ = '0';
            *str++ = 'x';
        }
    }
    *pTmpBuffer-- = '\0';

    while(pTmpBuffer >= tmpBuffer) {
        *str++ = *pTmpBuffer--;
    }

    return str;
}
case 'd': // only cover integer size only
case 'i':
    n = va_arg(args, unsigned int); // fetch next variable as integer
    str = itoa(n, str, print_buf, 10); // convert the integer into integer string
    continue;

Case 4: %x

If the next character is β€˜x’, it means hexadecimal. So the flow just same as handling integer but with base 16.

case 'x':
    n = va_arg(args, int); // fetch next variable as int

    str = itoa(n, str, print_buf, 16); // convert the integer into hex string
    continue;

Case 5: %n

If the next character is β€˜n’, we will returns nothing. β€˜%n’ in c means it will print nothing and write the number of characters printed so far to an int variable. This stackoverflow post has great explaination on the application of β€˜%n’. To deal with β€˜%n’, we have to fetch next variable’s pointer, and assign the written character size so far to it.

case 'n': // takes in a pointer(memory address) and writes there the number of character written so far
    int *ip = va_arg(args, int *); // fetch the pointer of next variable
    *ip = (str - print_buf);
    continue;

Case 6: Anything except c, d, i, s, x, n

Print as it is if user supply with invalid format specifier other than c, d, i, s, x, n.

default:
    *str++ = '%'; // if after '%' does not have valid identifier, print as usual
    if(*format) {
        *str++ = *format;
    }

Then we need to end the va_list by using va_end macros.

    // ...
    va_end(args);
    return 0;
}

Now we need to print the buffer to screen, thus, we need a function to print. Below is how to print the buffer into screen.

void print_buffer(char buffer[], char *str) {
    for(int i = 0; i < str - buffer; i++) {
        putchar(buffer[i]);
    }
}

Everything almost done, good job. But there is one more thing to consider. If you remember, everytime we run through cases, we will store the output into buffer right? And the buffer of course has fixed size which is 1024. Now this could be a problem if we have a string more than 1024 bytes. It could be catastrophe and overflow the buffer with some garbage data. So it is good thing to flush the buffer if the buffer size is full. This blog has better explaination why we need to flush the buffer to output stream.

Lets create a function that check if the size of buffer is more than it should be and print it out if the size exceeding the limit.

int check_size(char *str, char *print_buff) { // check buffer size
    if(str - print_buff > 1024) {
        return str - print_buff;
    }

    return 0;
}

Below is example of code snippet for basic flushing.

if(check_size(str, print_buf)) {
    print_buffer(print_buf, str);
    str = print_buf; // reset buffer pointer
}

Full Source Code

#include <stdio.h>
#include <stdarg.h>

void print_buffer(char buffer[], char *str);
int check_size(char *str, char *print_buff);
char *itoa(int value, char *str, char *buff, int base);

void print_buffer(char buffer[], char *str) {
    for(int i = 0; i < str - buffer; i++) {
        putchar(buffer[i]);
    }
}

int check_size(char *str, char *print_buff) { // check buffer size
    if(str - print_buff > 1024) {
        return str - print_buff;
    }

    return 0;
}

char *itoa(int value, char *str, char *buff, int base) {
    char* ptr = str, *ptr1 = str, tmp_char;
    int tmp_value;
    char tmpBuffer[32];
    char *pTmpBuffer = tmpBuffer;

    do {
        tmp_value = value;
        value /= base;
        *pTmpBuffer++ = "zyxwvutsrqponmlkjihgfedcba9876543210123456789abcdefghijklmnopqrstuvwxyz" [35 + (tmp_value - value * base)];
    }while(value);

    // apply negative sign
    if(tmp_value < 0) {
        *str++ = '-';
        if(base == 16) {
            if(str - buff > 1022) {
                print_buffer(buff, str);
                str = buff;
            }
        *str++ = '0';
        *str++ = 'x';
        }
    }
    *pTmpBuffer-- = '\0';

    while(pTmpBuffer >= tmpBuffer) {
        *str++ = *pTmpBuffer--;
        if(check_size(str, buff)) {
            print_buffer(buff, str);
            str = buff;
        }
    }

    return str;
}

int _printf(const char *format, ...) {
    char print_buf[1024];
    // int printed = 0;
    int n;
    char *str = NULL;
    const char *s = NULL;

    va_list args; // hold arguments
    va_start(args, format); // Initialize arguments list

    for(str = print_buf; *format; format++) {
        if(*format != '%') { // if the character is not '%' move to next character
            *str++ = *format;
            // printed++;

            if(check_size(str, print_buf)) {
                print_buffer(print_buf, str);
                str = print_buf; // reset buffer pointer
            }

            continue;
        }

        ++format; // if there is '%', move to the next character

        switch (*format) {
        case 'c':
            *str++ = va_arg(args, int); // take next character as integer
            // printed++;
            if(check_size(str, print_buf)) {
                print_buffer(print_buf, str);
                str = print_buf; // reset buffer pointer
            }
            continue;
        case 's':
            s = va_arg(args, char *); // fetch next arguments as char*
            // printed++;
            while(*s) {
                *str++ = *s++;
                if(check_size(str, print_buf)) {
                    print_buffer(print_buf, str);
                    str = print_buf; // reset buffer pointer
                }
            }
            continue;
        case 'd': // only cover integer size only
        case 'i':
            n = va_arg(args, unsigned int); // fetch next character as integer
            str = itoa(n, str, print_buf, 10);
            continue;
        case 'x':
            n = va_arg(args, int);

            str = itoa(n, str, print_buf, 16);
            continue;
        case 'n': // takes in a pointer(memory address) and writes there the number of character written so far
            int *ip = va_arg(args, int *);
            *ip = (str - print_buf);
            continue;

        default:
            *str++ = '%'; // if after '%' does not have valid identifier, print as usual
            if(*format) {
                *str++ = *format;
            }
            if(check_size(str, print_buf)) {
                print_buffer(print_buf, str);
                str = print_buf; // reset buffer pointer
            }
        }
    }

    va_end(args);
    print_buffer(print_buf, str);
    return 0;
}

int main() {
    char character = 'A';
    char string[2048] = "shauqi!";
    int positive_num = 1234;
    int negative_num = -1234;
    int integer_num = 345543;
    int overwrite_int = 1;

    _printf("123459%n\n", &overwrite_int);
    _printf("Printing overwrite_int = %d\n", overwrite_int);
    _printf("Hex number = %x\n", negative_num);
    _printf("Positive number = %d xxxx\n", positive_num);
    _printf("Negative number = %d xxxx\n", negative_num);
    _printf("My name is = %s\n", string);
    _printf("First letter is = %c\n", character);
    _printf("it must print %a\n");
    _printf("it must print %x\n");

    return 0;
}

References

  1. https://www.studysmarter.co.uk/explanations/computer-science/computer-programming/string-formatting-c/
  2. https://onepunchcoder.medium.com/variadic-functions-explained-fd3b4ab6fd84
  3. https://medium.com/@noransaber685/creating-a-custom-printf-function-in-c-a-step-by-step-guide-432fd2ecf48a
  4. https://medium.com/@noransaber685/creating-a-custom-printf-function-in-c-a-step-by-step-guide-432fd2ecf48a