written on November 24, 2010
Inspired by Minecraft (like countless other people out there) I decided to give a clone a try to see how this can be accomplished. My last few adventures into the 3D world went through C++ but that always presented me with a problem: The C++ code I write does not scale well to computer games. This time I thought I can probably try to work around the problem by forcing myself to simple C.
This worked really well up to the point where I needed two kinds of lists. One that accepted floats and another one that works with arbitrary pointers.
The C++ way for this is obvious: use templates. That's also one of the best features in C++ and works surprisingly well for the (I imagine) great hack it is. However in C we don't have that but the closest to templates in C (or code generation in general) is the preprocessor.
A warning upfront: all these examples use very generic names. This is a
very bad idea. If you want to use something like this in your own code,
make sure to prefix all macros, functions, types etc. with a unique
prefix. (For instance instead of CAT
name it MYLIB_CAT
).
Generally there are two ways to do code generation for C. One involves external tools that create new C files, the other involves the C preprocessor which is normally intended to expand macros, include other files or remove comments before the compiler goes over your code.
I normally like to avoid tools that generate new C files for the very simple reason that these usually generate ugly looking code I then have to look at which is annoying or at least require yet another tool in my toolchain which causes headaches when compiling code on more than on operating system. A simple Python script that generates a C file sounds simple, but it stops being simple if you also want that thing to be part of your windows installation where Python is usually not available or works differently.
So for my collections I went straight with the C preprocessor. When you limit yourself to the C preprocessor (CCAT) there are two common usage patterns:
The second downside of this approach is that macros generally are really unfriendly to write. They must be in a single line and when you need more than one you have to backslash escape the newline. If you forget about that, welcome to interesting compiler errors.
Macros are fine if they are small, bug looking at endless lines of generated C code from the preprocessor can be hairy and frustrating.
This is what I ended up doing for my collection classes.
Before we head over to the details of the implementation, let's have a look at some of the features of the C preprocssor.
#include "file"
This is the best known feature of the preprocessor. It includes
another file from the search path and inserts it at the same location.
Unless you protect your file from multiple inclusions with include
guards (macros that are set and detected) you can include a file more
than once. This is actually quite helpful in our case.
#error "message"
With this directive you can emit an error message that will abort the
compilation. You can combine it with preprocessor conditionals to
notify the user about unsupported configurations, platforms etc. In
our case it can help us giving the user feedback about missing
defines.
#ifdef / #ifndef
Executes code only if a macro of a given name was defined or not.
This does not support checking for more than one macro, so often it
makes more sense to use #if defined(MACRO)
.
#if / #elif / #else
Can test arbitrary preprocessor conditions. It can perform basic
arithmetic and check if other macros are defined (defined(X)
).
#define MACRO value
Defines a new simple macro. From that point onwards each occurrence
of MACRO
will be replaced with value
. In fact, after MACRO
more
than one C token of any form can be placed if necessary. You can let
a macro be replaced by a full function definition.
#define MACRO(X) X
Macros can also have parameters that are simple tokens. Whenever that
token appears in the list of tokens that will act as replacement, that
token is replaced with the actual token that was passed to the macro.
For example this is a very simple and stupid way to specify an abs()
macro that will take a value and return the absolute value:
#define ABS(X) ((X) < 0 ? -(X) : (X))
The preprocessor will then expand the macro upon usage:
int x = ABS(-42);
/* this is expanded to this: */
int x = ((-42) < 0 ? -(-42) : (-42))
The additional parentheses are there to avoid ambuiguities in case there are operators involved in the passed expression.
Because macro arguments work by replacing tokens I always use uppercase letters as first letter of a macro argument. The reason for this is that nothing in my C code is written in camelcase and thus there is no way this could clash with an actual token that might be in use.
#
Inside macro expressions the #
operator can be used to convert the
following macro argument token passed into a string. Please keep in
mind that this only works for macro arguments, not arbitrary tokens.
This is very helpful if you want to implement things like assert()
and have helpful error messages:
#define assert(Expr) do { \
if (!(Expr)) fail_with_message("Assertion failed: " #Expr, \
__LINE__, __FILE__); \
} while (0)
This also showcases two other things you have to keep in mind when using the preprocessor:
The macro might be used in the body of an if expression and using
a sole if
there might cause the dangling else problem. As a
simple workaround, always wrap your macros in a loop that only
runs once (do { ... } while (0)
). Also make sure to not
include a trailing semicolon. The user of the macro should add
the semicolon, not the author of the macro.
If a macro spans more than one line you have to escape the newlines by adding a backslash before them. Also be sure not to add any other whitespace before the newline or this will break.
##
The ##
operator can be used to concatenate a macro argument token
with any other token. Again, this only works if a macro argument
token is involved, it will not work on arbitrary tokens.
This can for example be used to dynamically generate functions that are prefixed with something else:
#define TEST(TestName) int mylib_##TestName(void)
TEST(foo)
{
assert(foo == 42);
}
Now that we know the basics of the preprocessor we can also infer what
problems might exist. Mainly the interesting operators for code
generation (#
and ##
) can only operate on macro arguments. This
is not a problem for the former, but it will become somewhat of a
limitation in case of the latter. Thankfully this can be countered
nicely with another macro
#define _CAT(A, B) A##B
#define CAT(A, B) _CAT(A, B)
Why do we need two macros here? Wouldn't the first macro be enough to concatenate macros? Unfortunately not because when a macro argument is another macro argument it wouldn't be expanded. Look here:
#define CAT(A, B) A##B
int
main(void)
{
int CAT(foo, CAT(bar, baz));
}
This would generate the following C code:
#define CAT(A, B) A##B
int
main(void)
{
int fooCAT(bar, baz);
}
The extra indirection solves this problem nicely.
The second macro I like to declare for code generation is an UCAT
macro that concatenates two tokens with an underscore instead of
concatenating them directly:
#define UCAT(A, B) CAT(A, CAT(_, B))
Now we have everything to get started implementing a simple list type. For this we first create a header where we declare all list types we want to use. In my case I am interested in a list for pointers and floats. The header looks like this:
#ifndef _INC_LIST_H_
#define _INC_LIST_H_
/* list of pointers */
#define _COLLECTION_TYPE void *
#define _COLLECTION_NAME list
#include "_list.h"
/* list of floats */
#define _COLLECTION_TYPE float
#define _COLLECTION_NAME floatlist
#include "_list.h"
#endif
As you can see we have a standard include guard and then we include another header in there twice (once for each list type we want to have). Before including that header, we also define the type for the list and the name we want to use.
That header then declares the struct for the list and the methods we want
to have. For this to work we will need another header that is used both
by this header as well as the implementation C file. Let's call this
header _collection_pre.inc
. Because we have a pre
header we will also need
a post
header (_collection_post.inc
). The purpose of the pre
header is
to declare some helper macros that return function names prefixed with the
necessary name and the idea of the post
header is to get rid of these
macros again to allow the inclusion of this header another time (for the
next type).
This is what these headers look like:
_collection_pre.inc
:
/* include the header that declares CAT and UCAT */
#include "pputils.h"
/* ensure that the includer set type and name */
#if !defined(_COLLECTION_TYPE) || !defined(_COLLECTION_NAME)
# error "Includer has to set _COLLECTION_TYPE and _COLLECTION_NAME"
#endif
/* helper macros to declare types and methods */
#define _COLLECTION_TYPENAME SC_PP_UCAT(_COLLECTION_NAME, t)
#define _COLLECTION_METHOD(Name) SC_PP_UCAT(_COLLECTION_NAME, Name)
_collection_post.inc
:
/* get rid of everything declared in _collection_pre.h and the includer */
#undef _COLLECTION_NAME
#undef _COLLECTION_TYPE
#undef _COLLECTION_TYPENAME
#undef _COLLECTION_METHOD
Now we finally have everything in place to implement our _list.h
header
that declares types and methods. This is how it can look like:
#include "_collection_pre.inc"
typedef struct {
size_t size;
size_t allocated;
_COLLECTION_TYPE *items;
} _COLLECTION_TYPENAME;
/* creates a new list */
_COLLECTION_TYPENAME *_COLLECTION_METHOD(new)(void);
/* frees the list */
void _COLLECTION_METHOD(free)(_COLLECTION_TYPENAME *self);
/* appends a new item to the list */
int _COLLECTION_METHOD(append)(_COLLECTION_TYPENAME *self, _COLLECTION_TYPE item);
/* removes the last item from the list */
_COLLECTION_TYPE _COLLECTION_METHOD(pop)(_COLLECTION_TYPENAME *self);
#include "_collection_post.inc"
The preprocessor will then use this to generate a list_t
, floatlist_t
,
list_new()
, floatlist_new()
etc.
The actual implementation of the list (list.c
) looks similar to our
list.h
header, just that we are including _list.inc
instead of
_list.h
. In both cases however we are using the same tricks as we did
with our header files:
list.c
:
/* list of pointers */
#define _COLLECTION_TYPE void *
#define _COLLECTION_NAME list
#include "_list.inc"
/* list of floats */
#define _COLLECTION_TYPE float
#define _COLLECTION_NAME floatlist
#include "_list.inc"
_list.inc
:
#include "_collection_pre.inc"
_COLLECTION_TYPENAME *
_COLLECTION_METHOD(new)(void)
{
_COLLECTION_TYPENAME *rv = malloc(sizeof(_COLLECTION_TYPENAME));
if (!rv)
return NULL;
rv->size = 0;
rv->allocated = 32;
rv->items = malloc(sizeof(_COLLECTION_TYPE) * rv->allocated);
if (!rv->items) {
free(rv);
return NULL;
}
return rv;
}
void
_COLLECTION_METHOD(free)(_COLLECTION_TYPENAME *self)
{
if (!self)
return;
free(self->items);
free(self);
}
int
_COLLECTION_METHOD(append)(_COLLECTION_TYPENAME *self, _COLLECTION_TYPE item)
{
if (self->size >= self->allocated) {
size_t new_size = (size_t)(self->allocated * 1.33f);
_COLLECTION_TYPE *rv = realloc(self->items,
sizeof(_COLLECTION_TYPE) * new_size);
if (!rv)
return 0;
self->allocated = new_size;
self->items = rv;
}
self->items[self->size++] = item;
return 1;
}
_COLLECTION_TYPE
_COLLECTION_METHOD(pop)(_COLLECTION_TYPENAME *self)
{
return self->items[--self->size];
}
#include "_collection_post.inc"
And this is then how you would use that list:
#include "list.h"
int
main(void)
{
floatlist_t *list = floatlist_new();
floatlist_append(list, 42.0f);
floatlist_append(list, 23.0f);
assert(list->size == 2);
assert(list->items[0] == 42.0f);
assert(list->items[1] == 23.0f);
assert(floatlist_pop(list) == 23.0f);
floatlist_free(list);
}
On top of that general concept you can then implement arbitrary data structures. The main problem with this over the template system from C++ is not only that it needs more files or does not have virtual functions, but that it requires you to explicitly specify the types you want in the header and implementation files and then generate specific typedefs and functions for it. There is really nothing you can do to change this, this is how the language works.
Another problem is that you can't use the preprocessor to generate other macros. So if you want to declare a type specific macro that returns an item from the list after doing an size assertion, you are out of luck. However all modern compilers do support inlines, so what you want is to create a static, inline function in the header instead of a macro.
Generally speaking though, this is probably good enough to cover the majority of use cases and small applications. It did the trick for me at least.