Type safe printf

20 Oct 2008

Current compilers perform checks on "format strings" in printf-style functions, however none, to my knowledge makes them truly type safe. This can result in program errors and possible core dumps. The following proposes a pre-processing step that does a minimal (and reversible) program transformation that allows gcc to do strict type checking. The transformed program is 100% compatible with the original and compiles into a identical binary. Further enhancements would allow alternative higher performance printf implementations or automatic conversion to C++ streams.

A sample transformer is at http://code.google.com/p/typesafeprintf/


We all know that printf style functions can have security problems, and the right mismatch or arguments (or lack of them) can cause core dumps The more horrible cases can be caught using gcc with -Werror -Wformat=2, but this will not catch application errors. For instance:

#include 

int main() {
  int a = -1;
  printf("%d %u %hhd\n", a, a, a);

  unsigned int b = 512;
  printf("%d %u %hhd\n", b, b, b);

  double c = 2.0;
  printf("%d %f %u %hhd\n", c, c, c, c);

  return 0;
}

Running this gives us:

$ ./a.out
-1 4294967295 -1
512 512 0
0 0.000000 1073741824 0

which is probably not what you would expect. This is hard to unit test for since the wrong values will only occur with some of the values.

As mentioned gcc has a few options for checking format strings. It catches some security issues and wrong number of arguments. Unfortunately it doesn't catch sign conversions or shortening conversions (32 bit integer to 8 bit integer).

$ gcc -Wall -Wextra -Wconversion -Wformat=2 main3.c
main3.c: In function ‘main’:
main3.c:12: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘double’
main3.c:12: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘double’
main3.c:12: warning: format ‘%hhd’ expects type ‘int’, but argument 5 has type ‘double’

This only caught the float-to-integer conversion but the others. Your application may emit wrong data. in addition, I'm quite sure that the "right combination" of data and incorrect format string will still core dump, especially on 64-bit platforms. This may be a libc bug and maybe only on some CPUs.

My favorite scenario is some customer logger that uses printf-style varargs, and some error condition happens and starts to log something. But the format is broken, and core dumps. Good times.

So how can you make printf style functions truly safe and type-safe?

Fix the compilers

I'm not touching GCC, and I'm not sure it can handle shortening conversions due to the way it's parser works.

I took a look at LLVM clang. To do this you'll have to re-write the current format checker to really parse format string. This is somewhat hard. Also it's not quite so straightforward to figure out the type being passed in. For instance:

bool foo = ???;
int a = 1;
float b = 100.0;
printf("%d",  foo ? a : b);

clang is super-cool but probably won't be ready for a good year for use in production, anyways.

Switch to C++ streams

This is clearly not practical for many applications. And C++ streams come with a lot of baggage.

Program Transformation

The fact any of this is a problem at all is a surprise. At compile time we have the format and the arguments, but yet printf is a regular function that parses the format string every time it is called. That doesn't seem so smart. It seems that somehow we could transform the printf function into another form.

I took a look at Rose Compiler but I could not get it to compile correctly. It's a monster and seems a bit of overkill for this project

I also took at look at dumping the C syntax tree (the AST) for both gcc and clang, but that was successful

Full on parsing C or C++ is not trivial at all

However transforming each printf into a individual function will allow the compiler to perform it's normal functional call checks and type information with the flag -Wconversion.

printf("%d", 1);

into

static int printf_1(const char* format,  int a) {
   return printf("%d", a);
}

printf_1("%d", 1);

This is easy to "undo" as well. Using the example from above, our new source file becomes:

#include 

/* VARARG TRANSFORMATION START */
/* This is autogenerated */

static void printf_1(const char* format __attribute__((unused)), int a0, unsigned int a1, char a2) {
   printf("%d %u %hhd\n", a0, a1, a2);
}


static void printf_2(const char* format __attribute__((unused)), int a0, unsigned int a1, char a2) {
   printf("%d %u %hhd\n", a0, a1, a2);
}


static void printf_3(const char* format __attribute__((unused)), int a0, double a1, unsigned int a2, 
char a3) {
   printf("%d %f %u %hhd\n", a0, a1, a2, a3);
}
/* VARARG TRANSFORMATION END */

int main() {
  int a = -1;
  printf_1("%d %u %hhd\n", a, a, a);

  unsigned int b = 512;
  printf_2("%d %u %hhd\n", b, b, b);

  double c = 2.0;
  printf_3("%d %f %u %hhd\n", c, c, c, c);

  return 0;
}

Compiling this:

$ gcc -Wconversion main4.c
main4.c: In function ‘main’:
main4.c:26: warning: passing argument 3 of ‘printf_1’ as unsigned due to prototype
main4.c:26: warning: passing argument 4 of ‘printf_1’ with different width due to prototype
main4.c:29: warning: passing argument 2 of ‘printf_2’ as signed due to prototype
main4.c:29: warning: passing argument 4 of ‘printf_2’ with different width due to prototype
main4.c:32: warning: passing argument 2 of ‘printf_3’ as integer rather than floating due to prototype
main4.c:32: warning: passing argument 4 of ‘printf_3’ as integer rather than floating due to prototype
main4.c:32: warning: passing argument 5 of ‘printf_3’ as integer rather than floating due to prototype

Now it catches 100% of the type errors.


Comment 2010-04-11 by None

This is a really cool concept. Do you mind if I cite this in an article that I'm working on?

I'll probably have some extensions to printf.py. Would you mind if I submit these as patches to your typesafeprintf project on Google Code?

You can contact me at joshkel at gmail dot com.


Comment 2010-04-20 by None

I forgot to mention, another option, is to transform printf functions into code that uses functions that are smarter than printf.

Say, like the numtoa code in stringencoders
http://code.google.com/p/stringencoders/