Hacker News new | past | comments | ask | show | jobs | submit login
What is your favorite C programming trick? (2009) (stackoverflow.com)
227 points by Fiveplus 2 days ago | hide | past | favorite | 175 comments





Named parameters using a struct.

    struct calculate_args {
        int x, y;
        enum {add=0, sub} operator;
    };

    int calculate_func(struct calculate_args args) {
        if (args.operator == add)
            return args.x + args.y;
        else
            return args.x - args.y;
    }

    #define calculate(...) calculate_func((struct calculate_args){__VA_ARGS__})
Now you can combine positional and named parameters or omit them.

    calculate(1, 3); // 4
    calculate(8, 3, .operator=sub); // 5
    calculate(.operator=sub, .y=7); // -7
Works very well in cases where there is a lot of parameters that default to 0. Keep in mind that you still need to know how structs work and you lose compile-time error detection.

In a similar spirit, overloading functions based on the number of parameters. Even works with tcc.

    #include <stdio.h>
    #define CONCAT2(a, b) a##b
    #define CONCAT(a, b) CONCAT2(a, b)
    #define COUNT_ARGS2(a0, a1, a2, a3, a4, a5, a6, a7, a8, N, ...) N
    #define COUNT_ARGS(...) COUNT_ARGS2(__VA_ARGS__, 9, 8, 7, 6, 5, 4, 3, 2, 1)
    #define foo(...) CONCAT(foo_, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)

    void foo_1(const char *a){
        printf("one arg: %s\n", a);
    }

    void foo_2(int a, int b){
        printf("two args: %i and %i\n", a, b);
    }

    int main(){
        foo("asdf");
        foo(314159, 271828);
    }
Small gotcha: Does not work with zero parameters, but functions with zero parameters probably mess with global state and that's evil anyway.

If other people use and debug your code please consider NOT using “#define” meta programming. They are a nightmare to debug.

This!

I just wanted to post it, but you already did. :-)


Here's a trick that will actually help produce more secure and reliable programs.

    *outArg = myPtr; myPtr = NULL;
    free(aPtr); aPtr = NULL;
Set your pointers to null when you free them! Set them to null when you transfer ownership! Stop leaving dangling pointers everywhere!

Some people say they like dangling pointers because they want their program to crash if something is freed when they don't expect it to be. Good! Do this:

    assert(ptr);
There are also many more tricks you can do once you start nulling pointers. You can use const to mark pointers that you don't own and thus can't free. You can check that buffers are all zero before you free them to catch memory leaks (this requires zeroing out other fields too of course).

Please, null out your pointers and stop writing (most) use-after-free bugs!


Better then

#define ZFREE(p) do { free(p); p = NULL; } while(0)


You can even do

  #define free(p) do { free(p); p = NULL; } while(0)
And when you want to call the original free:

  (free)(p);
The preprocessor will not substitute this occurence of free.

C n00b here - why the `do {} while(0)`? Couldn't you just do something like `#define free(p) { free(p); p = NULL; }`?

Semi-experienced C user here, I believe the anonymous block is perfectly adequate here. No idea why they are wrapping it in a single instance do loop, unless they’re unaware of block scoping or I’m unaware of some UB here.

do {} while(0) is a common idiom for macros in C, because it consumes the trailing semicolon, which a bare {} block doesn't do.

    if(x) MACRO();
    else something();
expands to

    if(x) { ... }; // Error!
    else something();

Here's the actual macro I (sometimes) use:

    #define FREE(ptrptr) do { \
        __typeof__(ptrptr) const __x = (ptrptr); \
        free(*__x); *__x = NULL; \
    } while(0)
There might be a better way of doing it though. Also, __typeof__() obviously isn't standard C.

Edit to add: I've honestly been moving away from using a macro and just putting both statements on one line like in the OP. For something so simple, using a macro seems like overkill.


What's the benefit of that over nn3's version?

It'll only evaluate the pointer once. It's possible to make this a function though, that might be preferable

Good point. But it seems like it would require usage like this:

    int* p = malloc(sizeof(int));
    FREE(&p);
What if we instead define the macro like this:

    #define FREE(ptr) do { \
        __typeof__(ptr)* const __x = &(ptr); \
        free(*__x); *__x = NULL; \
    } while(0)
Then make usage slightly shorter, as well as more similar to free():

    int* p = malloc(sizeof(int));
    FREE(p);

Taking a pointer-to-pointer is intentional to make it clear that the pointer will be modified. That's actually the most important difference from nn3's version IMHO.

I tried making it a plain function at one point but ran into some weirdness around using void * * with certain arguments (const buffers?). You don't want to accept plain void * because it's too easy to pass a pointer instead of a pointer to a pointer. Using a macro is (ironically) more type safe.

Maybe someone else could figure out how to do it properly, since I'd definitely prefer a function.


Your approach requires extra checks, though, which are easy to forget. Also, NULL is not guaranteed to be the stored as zeros, plus padding is going to make your life annoying.

Well, dangling pointers are also easy to forget... Yes, it requires some discipline. Good code requires discipline, doesn't it?

The trick of checking that buffers are zeroed is purely a debugging tool, so it's okay if it doesn't work on some platforms. And if you allocate with calloc(), the padding will be zeroed for you. It's actually very rare that you will have to call memset() with this technique.


> Good code requires discipline, doesn't it?

This is like the most clichéd way of saying “my code has security vulnerabilities” that there is. I have yet to see code that has remained secure solely on the “discipline” of programmers remembering to check things.

> The trick of checking that buffers are zeroed is purely a debugging tool, so it's okay if it doesn't work on some platforms.

Fair.

> And if you allocate with calloc(), the padding will be zeroed for you.

It might get unzeroed if you work with the memory.


All code is full of vulnerabilites. If you say your code isn't, then I'm sure it is. I just do the best I can to keep the error rate as low as possible. But it's a rate, and it's never zero.

Also, it's not just about vulns in security-critical code. It's also about ordinary bugs. Why not be a little more careful? It won't hurt.

> It might get unzeroed if you work with the memory.

Maybe, but it isn't very common. I'm not sure when the C standard allows changing padding bytes, but in practice the compilers I've used don't seem to do it. And again, it's just a debugging aid, if it causes too much trouble on some platform, just turn it off.


It’s better to have automatic checks than rely on programmers being careful enough to remember to add them. For padding: this probably happens more on architectures that don’t do unaligned accesses very well.

Help me out here, because I'm really trying to understand. Are you saying that dangling pointers that blow up if you double-free them is an "automatic check"? If not, what kind of automatic check are you talking about?

If the extra code is really that bothersome, just use a macro or wrapper function.


It's a much better situation than NULLing them out, because that hides bugs and makes tools like Address Sanitizer useless. A dangling pointer, when freed, will often throw an assert in your allocator; here's an example of how this looks like on my computer:

  $ clang -x c -
  #include <stdlib.h>
  
  int main(int argc, char **argv) {
      char *foo = malloc(10);
      free(foo);
      free(foo);
  }
  $ ./a.out
  a.out(14391,0x1024dfd40) malloc: *** error for object 0x11fe06a30: pointer being freed was not allocated
  a.out(14391,0x1024dfd40) malloc: *** set a breakpoint in malloc_error_break to debug
  Abort trap
As you turn up your (automatic) checking this will be caught more and more often. Setting the pointer to NULL will silently hide the error as free(NULL) is a no-op and nothing will catch it. Thus, the suggestion here was

1. advocating adding additional code, which has historically been hard to actually do in practice, and

2. providing a suggestion that is generally worse.


Good points, thank you for explaining!

I can see an argument for wrapping it in a macro so you can turn off nulling in debug builds (ASan might even have hooks so you can automate this, I know Valgrind does). But use-after-free is worse than just double-frees, and if you read a dangling pointer in production there's no real way to catch it AFAIK. Last I heard (admittedly been a few years since I checked), you're not supposed to deploy ASan builds because they actually increase the attack surface.

So, your program's memory is full of these dangling pointers, and at some point you will have a bug you didn't catch and use one. And you can't even write an assertion to check that it's valid. What do you propose?

And again to clarify, I'm not trying to advocate for hiding bugs. I want to catch them early (e.g. with assertions), but I also want to avoid reading garbage at runtime at all costs, because that's how programs get pwn'd.


> But use-after-free is worse than just double-frees

From an exploitability point of view they are largely equivalent.

As for the rest of your comment: my point of view is largely "you should catch these with Address Sanitizer in debug", so I don't usually write code like "I should assert if I marked this as freed by NULLing it out". If I actually need to check this for program logic, then of course I'll add something like this.

The macro you suggest would alleviate my concerns, I suppose, and it wouldn't really be fair for me to shoot that solution down solely because I personally don't like these kinds of assertions in production. So it's not a bad option by any means, other than my top-level comment of this requiring extra code. I know some libraries like to take a pointer-to-a-pointer so they can NULL it out for you, so that is an option for your wrapper. And a double-free that doesn't crash can sometimes open up exploitable bugs too since it messes with program invariants that you didn't expect. But these are much rarer than the typical "attacker controlled uninitialized memory ended up where it shouldn't" so it's not a big deal.


Very reasonable! Thank you for the discussion :)

> I have yet to see code that has remained secure solely on the “discipline” of programmers remembering to check things.

That's not what the parent comment said.


I’m not sure what it could have said after saying that programmers should have disciple after I mentioned that their thing required extra checks to work.

Parent said: "Good code requires discipline, doesn't it?"

You retort: "I have yet to see code that has remained secure solely on the “discipline” of programmers remembering to check things."

I think that is a dishonest misrepresentation of what the parent comment said, isn't it?


The “discipline” in this case (see the whole thread) is “have programmers remember to insert checks”, which has historically been a good way to have security holes crop up. So I’m not sure what was dishonest about it?

They argued that discipline is necessary, not sufficient, to produce good code. You represented the argument as: "discipline is sufficient for secure (good) code"

You took the original argument, changed it to be fallacious, and used it as a strawman. That's what was dishonest about it.


I appreciate you defending me, but I don't think he was trying to be dishonest.

I don't think that's fair in this case because nulling out pointers isn't the first line of defense. If you forget to do it once, it's not going to cause a bug in and of itself. You can easily grep the code periodically to find any cases you missed.

I think that's the misunderstanding, then, because to me it seemed to be a defensive coding practice (I think it was certainly presented as such in the top comment). My "you need extra checks" claim was mostly aimed at the additional things you add on to your code assuming that you are now zeroing out freed pointers, which I think can lead to dangerous situations where you may come to rely on this being done consistently when it's a manual process that is easy to forget.

Left unsaid due to the fact I was out doing groceries this morning when I posted that was that I don't think this is even a very good practice in general, as I explained in more detail in other comments here.


Indeed, it shouldn't be a first line of defense (nulling + an assert seems reasonable, fwiw), and accessing a nulled out pointer is just as UB as any other UB. It's probably more likely to crash immediately in practice, but it's also easier for an optimizer to "see through", so you may get surprising optimizations if you get it wrong.

Honestly, unless you really cannot afford it time-budget wise, I would just ship everything with ASAN, UBSAN, etc. and deal with the crash reports.


Shipping code with Address Sanitizer enabled is generally not advisable; it has fairly high overhead. You should absolutely use it during testing, though!

> NULL is not guaranteed to be the stored as zeros

Is that a real issue, though?

> padding is going to make your life annoying

Just memset?


>> NULL is not guaranteed to be the stored as zeros

> Is that a real issue, though?

Of course, it's not, but that's one of those factoids that everyone learns at some point and feels like needing to rub it into everyone else's face assuming that these poor schmucks are as oblivious to it as they once were. A circle of life and all that.


Forgive me for encouraging the adoption of portable, compliant code to those who may not otherwise be aware of it. If you want to assume all the world’s an x86 that’s great but you should at least know what part of your code is going to be wrong elsewhere.

AMD GPUs use a non-zero NULL pointer[0].

[0] https://reviews.llvm.org/D26196


Interesting. And it isn't even always non-zero; sometimes it's 32-bit -1, sometimes it's 32-bit 0 and sometimes it's 64-bit 0:

https://llvm.org/docs/AMDGPUUsage.html#address-spaces


NULL is required to be stored as all zeros on POSIX systems.

Please don’t get me wrong but these precautions sound like you are sweeping problems under the carpet which will come out one day back again. It sounds like you have ownership issues in the design and trying to hide ‘possible future bugs’.

Do you use sanitizers for use-after free bugs? I see many people still don’t use them even though sanitizers have become very good in the last 5-6 years


It's defensive coding. Do you think defensive driving is 'sweeping problems under the carpet'? (It is, but it's still useful...)

I use every tool at my disposal. Sanitizers, static analyzers... and also not leaving dangling pointers in the first place. Why would I do anything less? It doesn't cost anything except a little effort.

Take a look at this recent HN link: https://www.radsix.com/dashboard1/ . Look at all those use-after-free bugs. Even if it only happens 1% or 0.01% of the time... It's a huge class of bugs in C code. Why not take such a simple step?


If it works for you, then it is okay. It is not ‘a little effort’ for me to worry about someone else might use this pointer mistakenly, so I need to think about that all the time. It shifts my focus from problem solving to preventing future undefined behavior bugs. These bugs in the link, I don’t know C++, it is a big language which does a lot of things automatically, so it is already scary for me :) Maybe that is it, I write C server side code mostly(database) with very well defined ownership rules. Things are a bit more straightforward compared to any c++ project I believe. I just checked again, we don’t have any use-after free bugs in the bug history, probably that is because of %100 branch coverage test suite + fuzzing + sanitizers. So I rather adding another test to the suite than doing defensive programming. It is a personal choice I guess.

Generally, it is considered preferable to find problems as early as possible. If a program fails to compile or quickly crashes (because of a failed assertion), then I consider that better than having to unit test and fuzz test your code to find that particular problem.

As an added benefit the code also becomes more robust in the production environment, if there are use cases you failed to consider -- 100% branch coverage does not guarantee that there are none!


> Generally, it is considered preferable to find problems as early as possible.

Whole heartedly agree.

> If a program fails to compile or quickly crashes (because of a failed assertion), then I consider that better than having to unit test and fuzz test your code to find that particular problem.

This confuses me. My typical order would be:

fails to compile > unit test > quick crash at runtime > slow crash at runtime (fuzzing)

I am curious to understand why we differ there.


Every problem can be solved in many different ways. If you think you've already got use-after-free bugs under control, then more power to you! You absolutely have to concentrate your effort on whatever your biggest problems are.

But I'll also say that if you don't have any use-after-free bugs in the history of a large C codebase... you might not even be on the lookout for them? I still have them sometimes, mainly when it comes to multiple ownership. And those are just the ones I found eventually.

So yes, different strokes for different folks, but if you make the effort to incorporate tricks like this into your "unconscious" coding style, the ongoing effort is pretty minimal. Even if you decide this trick isn't worth it, there are countless others that you might find worthwhile. I'm always on the lookout for better ways of doing things.


I meant no use-after-free bugs in production, otherwise we find a lot in development with daily tests etc. but looks like we catch them pretty effectively. It works good for us but doesn’t mean it’ll work for all other projects, so yeah I can imagine myself applying such tricks to a project some time, especially when you jump to another project which has messy code, you become paranoid and start to ‘fix’ possible crash scenarios proactively :)

Big reason for defensive coding like nulling pointers is to make the code fail hard when someone messes up when they make a change. One can imagine the sort of hell unleashed if later the code is changed to make use of a dangling pointer. That's often the type of bug that slips through testing and ends up causing rare unexplained crashes/corruption in shipped code. Worse it can take multiple iterations of changes to finally expose the bug.

This makes UAF easier to detect but double-free impossible to detect. I would consider that to be worse than not doing anything at all, especially since it isn't amenable to modern tooling that is much better at catching these issues than hand-rolled defensive coding.

Here is an assert that is static if possible:

  #ifndef __clang__
  #define assume(expr) \
      __extension__ ({ \
          static_assert(__builtin_choose_expr( \
              __builtin_constant_p(expr), expr, true), #expr); \
          assert(__builtin_choose_expr( \
              __builtin_constant_p(expr), true, expr)); \
      })
  #else
  #define assume(expr) \
      __extension__ ({ \
          static_assert(__builtin_constant_p(expr) ? (expr) : true, #expr); \
          assert(__builtin_choose_expr( \
              __builtin_constant_p(expr), true, expr)); \
      })
  #endif
Note that the clang semantics of language extensions sometimes differ from GCC.

I also like to use __auto_type very much:

  #define var __auto_type
  #define let __auto_type const
As in:

  #define m_max(a, b) \
      __extension__ ({ \
          let _a = (a); \
          let _b = (b); \
          _a > _b ? _a : _b; \
      })

What is the advantage of your assume macro over regular C11 static_assert / assert? Apologies if it is immediately obvious.

And in case anyone else is wondering, the advantage of that m_max macro is that it evaluates each of its arguments only once.


The types of things that _Static_assert takes is substantially more limited than this construct, as it can only take an "integral constant expression" which is in practice basic integer arithmetic and nothing else. This construct works with more complicated things that are nonetheless known at compile time, such as "asdf"[4] (should be 0).

Declaring, instancing and initializing a global state struct in one go:

    static struct {
        int a;
        float b;
        const char* c;
        struct {
            float x, y, z;
        } nested;
    } state = {
        .a = 1,
        .b = 2.0f,
        .c = "Hello World!",
        .nested = {
            .x = 1.0f, .z = 3.0f
        },
    };
Careful if the the struct contains big arrays though, this will let the executable size explode because a copy of the struct content is placed in the executable.

Arrays of anonymous structs are also useful for non-global data:

    static struct {
        const char *key;
        const char *value;
    } elems[] = {
        { "k1", "v1" },
        { "k2", "v2" },
        { "k3", "v3" }
    };
    for (int i=0; i<sizeof(elems)/sizeof(*elems); i++) {
        printf("%s => %s\n", elems[i].key, elems[i].value);
    }

X Macros! Mostly because it's one of the more understandable and funky things you can do with the preprocessor. You can do some crazy stuff with them :)

https://blog.gboards.ca/2020/02/adventures-in-obscure-c-feat...

https://en.m.wikipedia.org/wiki/X_Macro


Oh, that has a name. In C, often used with an include file instead of a body macro, and often (?) used where code wants multiple internal representations of some table of data.

The best C programming trick is to use no tricks. Making your code as boring and plain as possible greatly reduces bugs and increases productivity.

  > The best C programming trick is to use no tricks.
Yes! C's monotony is its killer feature. Take away power when you are coding and you have more power to share the code and debug it.

I agree wholeheartedly. I may be called a hater and I may be raining on everyone's parade, but I opened this thread expecting to find horrors and I did (plenty of metaprogramming).

The very notion of there being tricks and that knowing them makes you better is something I hate about C and C++. Most tricks I read here are bandaids over usability issues the languages have. Yes, they alleviate an issue but may introduce unexpected consequences and distance your dialect from the rest of the community.

I am so very thankful that C and C++ are no longer the only options for low level, non garbage collected programming.


As an embedded C developer, I'm interested. What do you recommend looking into as other "low level, non garbage collected" options?

I imagine they are referring to Rust.

True, but I also consider interesting combinations of standard features, especially C99+ features useful "tricks", because C99 features which make life so much easier are usually little known in predominantly C++ circles, because C++ only supports an outdated and non-standard subset of C.

E.g. this is 'named, optional arguments East Egg' is a useful trick which also improves readability:

    my_func((my_struct) { .bla = 1, .blub = 2, });

I've tried something similar previously.

    typedef struct pixel {
        int r;
        int g;
        int b;
    } pixel_t;

    #define PIXEL(red, green, blue) \
    (pixel_t)                       \
    {                               \
      .r = (red),                   \
      .g = (green),                 \
      .b = (blue)
    }
Then I could invoke the function

    void display_pixel(pixel_t pixel);
by calling,

    display_pixel(PIXEL(255, 255, 255));
And you could even,

    return PIXEL(255, 255, 255);
Like magic. In other words, how do you pretend that you have syntax-level OOP in C... On retrospect, the macro could reduce readability, writing the argument explicitly may be better.

    display_pixel((pixel_t) {.r = 255, .g = 255, .b = 255});

I would recommend not doing that. Just use the initializer syntax, or write an inline function.

C++ supports C features to the extent that they don't overlap with C++ ones.

The compatibility with the C library is up to C11 for example.


I think what GP is referring to is C++ not supporting C's designated initializers, restrict qualifiers, or flexible array members features. These are roughly the only C features not in C++ that are worth supporting (the STL in C++ works better than VLAs, and templates work better than type-generic macros). All of the other new C features are either backported C++ features with different spellings, or new functionality (mostly library) that C++ adopts.

VLAs are gone, were made optional in C11, thus every compiler that jumped from C89 to C11 never bothered with them.

Google even sponsored a massive effort to clean the Linux kernel of their use.

A subset of designated initializers is supported in C++20.


There's a surprising amount of perfectly valid C code that's not valid C++ (not even taking regrettable design warts like VLAs into account). The "common subset" of C and C++ is both a subset of C, and a subset of C++, e.g. C++ has forked C and turned its C subset into a non-standard dialect.

It's interesting that the other C descendant Objective-C has decided to "respect" its C subset instead of messing with it, with the result that new C standards are automatically supported in ObjC.


A descendant that would be dead by now wouldn't Apple decided to buy NeXT.

VLAs are dead, a broken design fixed in C11 by removing it from the standard, no idea why everyone keeps referring to them.


The one good use of VLAs I know of is casting a buffer into a multidimensional (variable-length) array to allow for subscripting to work.

Whatever good uses VLAs might actually have had, they are gone with C11.

Well, they're a conditional feature now, you are supposed to check __STDC_NO_VLA__.

Which basically means that outside gcc and clang based compilers, __STDC_NO_VLA__ will be false, for example on MSVC or IAR Embedded Workbench.

> A subset of designated initializers is supported in C++20.

Took them only 21 years... And this standard being so recent, most projects won't have that for years to come.


Doesn't matter, the point was what C subset does C++ support today.

https://godbolt.org/z/GW59rM


C++20 designated initializers must be specified in definition order, whereas they don't in C99.

In your example,

    Point point = {.z = 2.0, .x = 1.0};
produces a compilation error. This is annoying, but workable. And at least it's a compile-time error, rather than a bug that perniciously sneaks into production.

Why do you think I wrote "A subset of designated initializers is supported in C++20."?

By far my favorite trick is the one described here https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

Effectively generators a la python or javascript, implemented using a set of macros that expand to a Duff's device control structure.

    #define crBegin(state) \
        int *__state_ptr__ = &(state); \
        *__state_ptr__ = 0; \
        switch(*__state_ptr__) { case 0:
    #define yield(x) \
        do { \
            *__state_ptr__ = __LINE__; \
            return x; \
            case __LINE__:; \
        } while (0)
    #define crFinish }
    
    struct counter_frame {
        int state;
        int i;
    };
    
    int counter(struct counter_frame *frame) {
        crBegin(frame->state);
        frame->i = 0;
        while (1) {
            yield(frame->i++);
        }
        crFinish;
    }
    
    int main() {
        struct counter_frame my_counter;
        while (1) {
            printf("%d\n", counter(&my_counter));
        }
        return 0;
    }

Using asserts AND keeping them in production builds.

This really helps whipping people into shape and making them pay attention to what they write. Also helps documenting the code in a very succinct way.


And in terms of performance it costs next to nothing. Just don't put them into your critical loops.

You can also print backtraces when an assertion fails: https://stackoverflow.com/questions/77005/how-to-automatical...


Depends on what’s in your asserts. If, for example, you check validity of your custom data structure in an assert, even a single assert can cost a lot to more than nothing.

(Whether you should use assert for that kind of checks that probably only should run in tests is debatable, but it sometimes happens)


What really costs nothing is putting assert(0) in any place that should be unreachable. For example:

  switch (...) {
    case A:
      ...
    case B:
      ...
    ...
    default: // all possible cases are dealt with above
      assert(0);
  }

One ugly trick for unreachable branches is this:

    assert(!"You gave an invalid letter");
The string literal is a pointer, so !pointer is false. And then you get a nice explanation message when the assertion fails. I wish all assert gave optional explanation messages.

I often do assert(condition && "String explanation") myself.

For anyone following along. Don't do this if the code base is already hotshit. This will only make things worse and not prevent errors, just cause more outages.

A friend of mine's showed me this in a code he used for programming an educational operating system. If you have a pointer to a member of a struct, with the macro container_of you can retrieve a pointer to the enclosing struct.

  /* Return the offset of 'member' relative to the beginning of a struct type */
  #define offsetof(type, member)  ((size_t) (&((type*)0)->member))
  
  #define container_of(ptr, type, member) \
   ((type *)((char *)(ptr) - offsetof(type, member)))

No need to define offsetof, it’s part of <stddef.h>

Still interesting to see the trick behind it tho.


It’s added to the standard because macro version has undefined behavior. (yet it still works on any compiler without a problem)

I was about to ask if it was legal to dereference a null pointer and then take the address of it... I presume it is not, but I'm surprised compilers don't complain at compilation time.

Dereferencing a null pointer and then immediately "undoing" it by taking its address is actually legal, I believe. I think the undefined behavior here is the member access instead of the magic sequence &* which is supposed to cancel out.

It is used in production os like linux kernel, mostly in intrusive data structures.

wow that is cool, do you have any pointers (no pun intended) for usage of this technique in practical terms?

This is used by the Ganesha project (userspace NFS server). Look for the symbol "container_of" and usages of it in https://github.com/nfs-ganesha/nfs-ganesha/ (disclaimer: I'm a minor contributor).

The way it's used is that Ganesha supports defining of alternate filesystem backends and serving them as NFS shares. Handles to objects (e.g. files) would exist as pointers which live inside the struct of the backend's handle struct. i.e.:

  struct my_file_data {
    struct ganesha_file_data {
      // generic data
    };

    // data specific to my module
  };
The "my" module would take pointers to ganesha_file_data when the NFS core code calls it. The "my" module then uses container_of to convert ganesha_file_data ptr to my_file_data ptr.

Stretch Buffers in the stb header-only library:

https://github.com/nothings/stb/blob/master/stretchy_buffer....

This basically allows you to use std::vector<T> like vectors in C, but with an added benefit that you can subscript the vector like arr[3] rather than using unwieldly functions like vector_get(arr, 3) or vector_put(arr, 3, value).


Foreach macros. Nice when you have a list of constant that you need for declaring a lot of tables or enumerations. Here an example with ISO-639 language codes

Example with 3 "values". This is the base definition from which all the tables and enums are produced.

  #define FOREACH_LAN(LAN)\
    LAN(GA, IE, C_ANSI    )  /**< Irish Gaelic   :  0 */ \
    LAN(DE, DE, C_ANSI    )  /**< German         :  1 */ \
    LAN(DA, DA, C_ANSI    )  /**< Danish         :  2 */ \
    LAN(EL, EL, C_GREEK   )  /**< Greek          :  3 */ \
    LAN(EN, GB, C_ANSI    )  /**< English        :  4 */ \
    LAN(ES, ES, C_ANSI    )  /**< Spanish        :  5 */ 
   etc.
Let's define an enum indexing with these languages

    #define GENERATE_LANIDX(lan,country,codepage) LANIDX_ ## lan,

    typedef enum {
      LANIDX_UNDEFINED = -1,        // 
      
    FOREACH_LAN(GENERATE_LANIDX)  

      LANIDX_MAX                    // Automatically get the upper bound
    } LANIDX_TYPE;
This is equivalent to

    typedef enum {
      LANIDX_UNDEFINED = -1,        // 
      LANIDX_GA,      
      LANIDX_DE,      
      LANIDX_DA,      
      LANIDX_EL,      
      LANIDX_EN,      
      LANIDX_ES,      
      LANIDX_MAX                    // Automatically get the upper bound
    } LANIDX_TYPE;
but I didn't need to repeat all the language codes, the macro did it for us.

Now in the module I can define tables, also without needing to repeat the codes

   const char *LanIdx2LanTable[] = {
    #define GENERATE_LANIDDX_2_LAN(lan,country,codepage)  [1+ LANIDX_ ## lan]={#lan},
    FOREACH_LAN(GENERATE_LANIDDX_2_LAN)
   };

   const char *LanIdx2CodePageTable[] = {
     #define GENERATE_LANCP(lan,country,codepage)  [1+LANIDX_ ## lan]=codepage,
     FOREACH_LAN(GENERATE_LANCP)
   };

It's been a while since I've done any pure C, and I'm sure I'll be outshone by others', but I've always liked RAII in C: https://vilimpoc.org/research/raii-in-c/

Function typedefs to untangle function pointers.

This is from APUE, 3rd Edition.

  void (*signal(int signo, void (*func)(int)))(int);
vs

  typedef void Sigfunc(int);
  Sigfunc *signal(int, Sigfunc *);

Comment toggle:

    foo();
    //*
    some();
    code();
    which_you_want_to_comment_out();
    // */
    other_code();
Code between comments are 'enabled' for now, and can be commented out by simply removing a slash in the '//*'.

It is very common to use #if 0 ... #endif for this.

Even better, #ifdef NOTYET or similar, to help future readers (which may include "you, tomorrow") understand why it's not happening.

First example I’ve seen in this entire post that I really like.

A lot of the rest of these look like they’re just trying to be clever for the sake of it.


There's also:

  /** /
  do_thing();
  /*/
  do_other();
  //*/
which allows quickly toggling between `thing` and `other` while debugging.

Ha, that's my question from almost 12 years ago.

The ‘cleanup’ variable attribute¹:

  __attribute__((cleanup(cleanup_function)))
It allows you to create, in essence, destructors for pure C code.

1. https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Common-Variabl...


That is GCC C, not ISO C. Not pure at all.

I meant “pure C” as in “without using C++ or Objective C”.

Quick-n-dirty log level:

    debug && printf("counter: %d\n", i);

Rarely used feature that I quite like is defining function types.

   typedef bool foo_fn(int, char);
This declares a type of a function.

   foo_fn  foo1, foo2, foo3, foo4;
This declares 4 functions of the same type. It's equivalent to

   bool foo1(int, char);
   bool foo2(int, char);
   bool foo3(int, char);
   bool foo4(int, char);
Unfortunately the type can not be used at function definition but that is not where function are interesting. They are neat for function pointers especially those that require casting.

   void function_taking_foo(int, foo_fn *);

   function_taking_foo(1, foo1);   // no cast necessary as type is identical and even
instead of

   function_taking_foo(1, (bool(*)(int,char))foo1);
When you have a lot function pointers it is incredibly more readable than the usual syntax.

I rarely see the comma operator used, but it can be quite handy, eg:

  if (turn_on() || reset(), turn_on()) {  // try to turn it on. if fail, reset and try again
    // it's on
  }

> if fail, reset and try again

It's always going to try again, though, even if successful the first time. Perhaps you forgot parentheses?

  if (turn_on() || (reset(), turn_on()))

The "arrow decrement" is fun:

int x = 10;

while (x --> 0);


Appropos of this, the idiomatic (!= obvious, sadly) way to iterate backwards over a array:

  for(size_t i=N; i-- > 0 ;) do_stuff_with(&a[i]);

In general, pointer math. When I learned that myArray[10] would produce the exact same result as 10[myArray], it forced me to dig deeper into the whole C pointer model deeper and respect the architecture even more.

This is really a handout to bad parsers, not really anything that should really exist.

When I understood that it made my hate/love of C hate side stronger.


    #ifdef _DEBUG
    #define __REVIEW_ME      / ## / 
    #else
    #define __REVIEW_ME      $review_me$
    #endif
When reworking a large amount of code, this can be used to tag places that may need another pass or a review. The debug version will build fine, but the release won't until all review tags are removed.

This also allows adding free form comments if needed:

    foo += 13;    __REVIEW_ME -- why 13 ?!

Interesting -- which compilers accept this? gcc and clang reject it in C and C++ modes with a message like "error: pasting formed '//', an invalid preprocessing token".

Ha, indeed. I had no need to use it with either, so I didn't bother to check. MSVC was the compiler.

Zero-length arrays, used to implement a variable-sized structure with header: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html. Although apparently it’s a GCC extension, which I didn’t realize until now.

C99 made it official with "type foo[];" syntax. Better than using "type foo[1];" and potentially adding unnecessary padding.

My favorite C trick is not related to programming - but to debugging and disassembly. Unlike C++, there is no name mangling, so stack traces are a breeze to read, especially with -fno-inline-functions and -O0 or -O1. There is no implicit action at all (i.e. no exception handling or destructors) so there is a simple mapping between the assembly and source code.

My favorite "programming" trick only applies if I'm not sharing my code. I just forego header files entirely, and just #include the .c source files. Also, put `#pragma once` in all the .c files to avoid double-inclusion without the hassle of #if ... etc. This requires a bit more diligence since you can't have mutual recursion between source files.


  switch (fruit) {
    case APPLE:
      if (true) {
        pick_apple();
      } else
    case BANANA:
      {
        pick_banana();
      }
      box_fruit();
  }

Use arrays of 1 to avoid using '.' and '&'. So if you have a structure, say:

    struct foo { int a, b; };
A function that works on the structure looks like this, you use '->':

    void pr(struct foo *n) { printf("%d %d\n", n->a, n->b); }
But when you make an instance of this structure, you have to use this different syntax:

    struct foo p;
    p.a = 1;  // Yuck!
    pr(&p); // Bad!
But you can do this instead:

    struct foo p[1];
    p->a = 1; // Very nice..
    pr(p); // Yes!

Curious, why don’t you like ‘.’ and ‘&’?

Why does C even have two member selection operators?

Anyway it means you can't cut and paste code from one place to another without changing '.' to '->' or vice-versa.


A '->' is always a runtime indirection involving an extra memory access, while a '.' is always resolved into a single offset at compile time.

E.g. a:

int x = a->b->c->d;

means there's 3 memory accesses, while

int x = a.b.c.d;

means there's one memory access for the whole expression.

Also consider this:

int x = a->b.c->d;

I can immediately see where pointer indirections are happening.

...unless you're in C++ of course which messed up this simple rule when references were added to the language.


>A '->' is always a runtime indirection

It's not true if the compiler can figure out that the left side is a constant, consider:

    struct foo { int a; };
    struct foo z = { 7 };
    struct foo *const p = &z;
Then in z->a, no indirection is necessary. GCC -O2 makes this:

    int fred() { return p->a; }

    fred:
        movl    z(%rip), %eax
        ret

    p:
        .quad   z

    z:
        .long   7

That's (usually) only true for the very first '->' in a chain and as you said, depends on the compiler figuring out if the pointer indirection can be resolved at compile time.

A chain of '.' on the other hand is always guaranteed to be resolved into a single offset at compile time.


> Why does C even have a two member selection operators?

Because using `(*ptr).member` everywhere is annoying. There's plenty of times you want or need to have direct access to a member rather than always dereferencing a pointer.


It's too bad C's pointer-deref operator is prefix instead of postfix. In Pascal it's ^ so you write ptr^.member and there's no special -> operator. Even better, declarations and expressions would read intuitively left-to-right instead of spiraling out through stars on the left and brackets on the right.

So that's where the lens notation comes from...

https://github.com/ekmett/lens/wiki/Examples

> ("hello","world")^._2


In C you can use [0] for postfix pointer dereferencing.

Alas, that's clumsy, and for declarations not possible. I have used it in expressions at times.

Here's a variation that seems plausible: make postfix p^ be like C's p[0], and infix p^i like C's p[i]. (With a tighter binding for ^ than C has.)


ptr.member should work here- I mean the compiler knows the left side is a pointer, so it should automatically dereference it.

C was practically a portable assembler when it was designed, and it was likely helpful for performance reasoning that all indirections were clearly visible.

Writing boring clean code, without clever tricks that force everyone to keep their copy of ISO C and compiler extensions manuals open.

not a 'favorite-c-trick' but, i really find this quite cool: https://ccodearchive.net/list.html

Unfortunately, one can't see the code, without downloading a tarball.

Unions for sure. You can create tagged unions with different data types and shapes (even function code to execute dynamically).

Unions are always better than casts. It would be better if casts in C looked like union deselection, because parenthesis are ugly. There really should be an infix casting operator.

An example of where this matter is an AST forest, like this:

    struct node { int tag; }; // Generic node
    struct infixnode { int tag; struct node *l, *r; };
    struct intnode { int tag; int val; };
But it's really ugly to use. If you have an expression represented in this AST like 'a(b(c+d))' and you want to access 'd', you need to do this:

    struct node *n;
    int d = ((struct intnode *)(((struct infixnode *)(((struct infixnode *)(((struct infixnode *)(((struct infixnode *)n)->r))->r)))))->val;
But if you use unions:

    union node { struct infixnode infix; struct floatnode floatval; struct intnode intval; };
Then you can say this:

    union node *n;
    int val = n->infix.r->infix.r->infix.r->intval.val;
The same holds for C++. In C++ you could make a class hierarchy for your AST. But you still have the casting to convert to the derived types, which is just as ugly.. But you can instead make inline access functions in the base class whose sole purpose is to do this casting, you end up something like this:

    node *n;
    int val = n->infix()->r->infix()->r->infix()->r->intval()->val;

Pack your structs, use tcpip host to network, and network to host to marshal data in and out, save state as ascii, never trust a float and don’t get creative and it’ll “just work“.

I've made a blog post about this some time ago[1]!

Definitely not exhaustive as this thread though.

[1] https://ne02ptzero.me/blog/c-tricks-that-i-use-everyday.html


1. Finding the size of a struct member (lifted from the Linux kernel)

#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))

2. Not strictly C: using `pahole` to analyze how structs are laid out in memory

https://linux.die.net/man/1/pahole


Lots of people reply with obscure, land-mine “#define” hacks. Am I the only one that avoids meta programming with “#define” like the plague?

One might argue it’s not even C..


Share your own "trick" then :) one that doesn't use #define

I like (v&(v-1))==0 (is v an exact power of 2?).

Use VLA pointer arithmetic for invisible multiplication.

Also, anything involving arrays of pointers to functions is cool by definition.


I've been missing duff's device in Rust. Maybe I'm crazy.

With modern compilers (and hardware, for that matter), Duff's device is probably a deoptimization rather than an optimization. Duff's device creates an unstructured control-flow graph (a loop with multiple entry points), which is going to cause several optimizations to bail out, and will absolutely prevent any loop optimizations (such as vectorization) from kicking in for that loop. In hardware terms, for tight loops, the entire loop in regular terms is probably going to be in a hardware loop µop cache, and the loop branch predictor will probably predict the loop exit condition with 100% accuracy.

Inline asm?

I remember a good 20years ago a friend teaching me: !!

(This "narrows" an integral value into a boolean 0/1, for those curious.)

struct foo { int a; char star b; }

struct foo star p = malloc(sizeof (struct foo) + 100); p->b[50] = 'x';


This will crash, because 'b' will be uninitialized and pointing at some random memory spot.

A conventional and working form of the same is

     struct foo { int a; char b[1]; }
     struct foo * p = malloc(sizeof (struct foo) + 99); 
     p->b[50] = 'x';
or, if the compiler allows it,

     struct foo { int a; char b[0]; }
     struct foo * p = malloc(sizeof (struct foo) + 100); 
     p->b[50] = 'x';

This is undefined behaviour except when using msvc. For GCC and clang, you should use [] (empty square brackets) to denote a variable length array at the end of a struct. For msvc, [1] is used all over win32 and is explicitly supported for that purpose.

Edit: it's UB because you're accessing past the end of an array with size given at compile time.


[] syntax in the last struct member is a "flexible array member" not a VLA. The former hasn't been deprecated in C11.

Right yeah the middle with char b[1]. I was fighting with the asterisks being deleted to notice the mis-type. This method is used a lot in OS programming.

You can quote your code (by preceeding it with four spaces) so that your stars become visible:

    struct foo { int a; char *b; }
    struct foo *p = malloc(sizeof (struct foo) + 100);
    p->b[50] = 'x'; 
    ...
    free(p);

Isn't it two spaces at least?

  Test
Upd: yes it is.

I think you have an initialized read in your code…

* uninitialized :/

Duff's device

I was excited then annoyed. I hate the SO moderators. It makes no difference to them how many people seek the answer, if they deem it off topic all those people are out of luck.

As the original asker of the question, I agree. This question was from 12 years ago, back when Stackoverflow was a lot more fun. I asked the question because while I knew a few neat little things, I knew other people also knew cool things that I didn't know, and I wanted to find out what they might be.

I don't get that exclusionist attitude either. If they think its subjective, why not just tag it accordingly, or move it into some sub-forum instead of outright killing the discussion.

Because they'd have to exclude it from search because it'd affect the results

People might like it or not, but SO is a tool that provides fast access to answers on technical questions because people want to solve their problems(and learn), it's QA.

If you want to discuss stuff then there are forums, reddits, discords or even HN.


How many times have I googled a question only to arrive at a closed SO question- too many times. It's truly annoying. Plainly people do want to discuss things on SO.

it's not meant to be a discussion forum, it's more a database of questions and answers ideally written to Wikipedia type quality. It gets a lot of hate for its quite strong moderation, however, it's ended up one of the best resources to find answers to programming questions, so they are doing something right.

Actually you can regard "closed" questions as a way to find the best questions.

C has tricks, serious languages have coding patterns

How is C not a serious language?

Memory corruptions?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: