kaashif's blog

Programming, with some mathematics on the side

Is implementing alloca(3) in C really impossible?

2023-07-13

alloca is a function provided by several C libraries (e.g. glibc) that lets you allocate a block of memory that will be freed when the calling function exits. It's usually done by allocating memory on the stack.

But here are a couple of questions:

  • No C standard or POSIX standard mentions alloca, so what "should" it really do?

  • Given that no C standard mentions the stack, is it even possible to implement alloca in C, or do you need assembly to move the stack pointer?

  • Given that compiling code with -fomit-frame-pointer usually results in addresses being expressed as relative to the stack pointer rather than the frame pointer, is it safe to move the stack pointer ourselves?

TL;DR: The answer is that you need special support from the compiler to implement alloca and you can't do it yourself, in C or assembly.

What should alloca do?

There's no standard to refer to, so let's look at the man pages. From Linux: https://man7.org/linux/man-pages/man3/alloca.3.html

The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.

From OpenBSD: https://man.openbsd.org/alloca

The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed on return.

gnulib says something interesting: https://www.gnu.org/software/gnulib/manual/html_node/alloca.html

The alloca module provides for a function alloca which allocates memory on the stack, where the system allows it. A memory block allocated with alloca exists only until the function that calls alloca returns or exits abruptly.

There are a few systems where this is not possible: HP-UX systems, and some other platforms when the C++ compiler is used. On these platforms the alloca module provides a malloc based emulation. This emulation will not free a memory block immediately when the calling function returns, but rather will wait until the next alloca call from a function with the same or a shorter stack length. Thus, in some cases, a few memory blocks will be kept although they are not needed any more.

That's weird, OpenBSD and Linux both support HP PA-RISC but never said anything about a stack based alloca being impossible. That must be a quirk of HP-UX i.e. the OS rather than the hardware. I don't really have an explanation for that, since HP-UX actually does have alloca, and it does mention the stack: https://www.unix.com/man-page/hpux/3C/alloca/

Allocates space from the stack of the caller

Note that the stack on PA-RISC grows the opposite way to x86.

So alloca is consistent across a lot of Unices - allocates memory on the stack and frees it when the calling function exits.

Why should it be difficult to implement in C? Let's try it.

Can alloca be implemented in C?

No C standard mentions the stack, so if we take the stack-based definition of alloca seriously, it's completely impossible to implement in standards-compliant, platform-independent, compiler-independent C.

Let's forget about the stack.

Our goal is to be able to do this:

void f(size_t size) {
    char* arr = alloca(size);
    // ...
}

and for arr to be automatically freed at the end of f, that's it.

If we restrict ourselves to GCC and clang, and allow __attribute__((cleanup)) https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html, we can kind of do it:

void free_memory(void* ptr) {
    free(*(void**)ptr);
}

void f(size_t size) {
    __attribute__((cleanup(free_memory))) char* arr = malloc(size);
    // ...
}

This is okay, but isn't exactly what we're looking for. We can define some macros to make it nicer, but (1) this is on the heap, (2) we'd still need to add something to the start of our declarations to get this to work.

I don't think there's any way to get this to work.

Variable length arrays (VLAs) let us allocate arrays of variable length on the stack, but that is done through the compiler doing magic. Let's at least try to avoid that for the moment.

Can alloca be implemented in x86 assembly?

We can't implement alloca as a normal function, since functions get their own stacks. Using the System V x86_64 calling convention, rsp is callee preserved - we're not allowed to change the size of the calling function's stack.

Let's do it using inline assembly:

#define grow_stack(s) asm("sub rsp, " #s)

This doesn't work because although it does grow the stack, and we can store things there, we aren't restoring the original stack pointer before we return.

This means code as simple as this will segfault:

#define grow_stack(s) asm("sub rsp, " #s)

int main() {
    grow_stack(1);
    return 0;
}

The compiler knows nothing about our stack pointer manipulation:

int main() {
    size_t s = 1;
    grow_stack(1);
    printf("%zu", s);
    return 0;
}

Although we grew the stack, when we get to the printf, the compiler still believes s is at the top of the stack, printing s will print garbage if the compiler is using offsets from rsp. It might work otherwise. This depends on the optimization level, which is a hint that we shouldn't be doing this.

How is alloca actually implemented?

We need help from the compiler. The compiler needs to know when we call alloca so that it can adjust all references to the stack pointer after the "call" to alloca.

The glibc "implementation" of alloca is:

#ifdef    __GNUC__
# define alloca(size)  __builtin_alloca (size)
#endif /* GCC.  */

But in gnulib (the GNU portability library, intended to work with lots of compilers), there's an actual implementation of alloca in C. It works very differently and has different semantics to the normal alloca: https://github.com/coreutils/gnulib/blob/master/lib/alloca.c

/* (Mostly) portable implementation -- D A Gwyn

   This implementation of the PWB library alloca function,
   which is used to allocate space off the run-time stack so
   that it is automatically reclaimed upon procedure exit,
   was inspired by discussions with J. Q. Johnson of Cornell.
   J.Otto Tennant <[email protected]> contributed the Cray support.

   There are some preprocessor constants that can
   be defined when compiling for your specific system, for
   improved efficiency; however, the defaults should be okay.

   The general concept of this implementation is to keep
   track of all alloca-allocated blocks, and reclaim any
   that are found to be deeper in the stack than the current
   invocation.  This heuristic does not reclaim storage as
   soon as it becomes invalid, but it will do so eventually.

   As a special case, alloca(0) reclaims storage without
   allocating any.  It is a good idea to use alloca(0) in
   your main control loop, etc. to force garbage collection.  */

This doesn't reclaim on caller exit, it reclaims garbage on the next call to alloca. If you read through the implementation, you'll also notice it uses malloc and free, not the stack.

If GNU couldn't make alloca without cheating work, then we can't either.

Final word

There's a page here about the advantages of alloca: https://www.gnu.org/software/libc/manual/html_node/Advantages-of-Alloca.html

But I don't buy it - the unavoidable downside of alloca is that if you try to allocate too much memory, it'll fail. And it's very hard to tell if it failed, you don't just get back NULL like with malloc, your program will likely crash in some way.

This why every alloca man page warns against its use. From the OpenBSD man page:

The alloca() function is unsafe because it cannot ensure that the pointer returned points to a valid and usable block of memory. The allocation made may exceed the bounds of the stack, or even go further into other objects in memory, and alloca() cannot determine such an error. Avoid alloca() with large unbounded allocations.

Don't use alloca.