The stack

Pub. 2020 Sep 3

I was thinking a little the other day about the stack. You know, with the stack pointer and whatnot. The stack seems to be one of those abstractions that seem to contradict some of my own beliefs about good software practices.

It's a large fixed size array whose size isn't really known. It gets filled up haphazardly without really knowing if it's going to fit or not.
The way it's supposed to be used is that you organize your program into a hierarchical tree whose links are defined locally.
Forcing all programs into a common abstraction.

But despite these things the stack seems to work very well. Why, and what can I learn from it?

Fixed size buffer

The sizes of buffers are a big deal to people, the design of most modern languages should make that clear enough. And hell, it's a big deal to me too, probably more than it should be. But the stack being a fixed size doesn't bother me, or really anyone else from what I can tell.

The first reason I think this doesn't bother me is that I know, in the absence of recursive functions, if my testing runs every code path once, there won't be a problem. Stacks are mostly set up during preambles, and I know for debug builds on Windows I can rely on __chkstk() to instantly crash.

The second reason I feel okay with fixed buffer stacks is because they grow so slowly. Personal experience tells me that the average size of the stack of a (non-recursive) program should be around O(log(size_of_program)). This says nothing about the maximum stack size, but consider how being 8 or 9 stack frames deeper than the average isn't that bad when growing the average of a stack frame by 1 takes O(size_of_program) more code.

The first observation is something I already try to emulate. I've come to the conclusion that provably overflow-free code is a nice goal but IMO the cons outweigh the pros, so my personal style is defensively assert(alloc.write_tail - alloc.write_head >= alloc_size) in the handful of allocation functions I have.

I think the second would be hard to mimic. The stack is able to have that nice asymptotic property because every time the stack exits the memory it stores is discarded. In most situations where buffers are used to store information, they tend to either be very temporary (basically stack data but usually not allocated on the stack due to the whole fixed-size thing), or are very long-lived. For short lived data, I prefer slab allocators anyways, so that's not really an issue. For long lived data that gets added to over time... I have no idea. The log term is very suggestive of storing your long-lived data in a compressed form, which for some situations probably wouldn't be that bad, but I'd have to try it out to say more.

Trees

How many billions of man-hours have been spent globally by people re-organizing their music collection from artist/album/song to genre/year/album/song to 'subgenres/artist/year/album/song' etc. If that's not you, then maybe it's your picture library. Oh I know, maybe it's your codebases! Does common/, util/, third-party/, third-party-common/ sound familiar? Aspys are drawn to hierarchical categorization like flies to rotting food.

On top of the obvious problems with trying to fit tag-like information into a trees, the problem of pointer chasing is a huge issue for performance. This is an inherent property of trees. Even technically-pointer-free trees like heaps can have large dependency chains at the microcode level. For some problem domains, ones where random-access lookups have to be very fast but inserts+deletes happen very frequently, trees are a necessary evil. However I think it's good practice to show restraint.

The second problem is easy to resolve for the stack: the overheads to the jump (the prep to the stack on both sides of the call, the overhead of grabbing the new code page from memory) are usually very small. If you don't buy into the whole no-functions-over-10-lines garbage and your code isn't 50% getters and setters, your methods are probably spending far far more time doing actual work. Plus now that everyone's on x64 these days register usage isn't a problem.

However, there is always a benefit to micro-optimizing jumps. Why else would so much effort be spent to make things like switch(var) { case 1: f1(); break; case 2: f2(); break; }) compile down to a jump table? Once you start taking the extreme opinion and start trying to inline all (non-recursive) calls you can start getting into questions about the amount of code pages pulled in, but the jumps to fix that problem don't need the stack. This is an optimization that should be enabled by declaring all functions in your C program static, but I'm ashamed to say that I don't actually know the conditions that force any major C compiler to generate code like that. I think it's interesting that the default for C declarations is not static and so all functions take the overhead of stack pushes by default. I was still learning to walk in the late 90s so I don't remember the circumstances that birthed things like stdcall and fastcall and friends, but I'm guessing this had something to do with it.

The first issue I raised is beyond me. Why is it that classification seems to work so well for code but so poorly for data? Why is it that mathematics is so good at classifying objects while also defying attempts to classify itself? Calling functions inside of other functions is a form of divide-and-conquer, because different pieces of the work are being done separately and then combined at the end. Pushing fresh stack frames is a form of abstraction, because it removes outside context from a problem temporarily so that there are less variables to have to juggle. I may be wrong, but I'm of the opinion that divide-and-conquer, abstraction, and pattern recognition are the only strategies that humans really know to solving general problems. Why that seems to be the case and why I can't think of anything different is a mystery to me. Also, if code and data are the same, why do they seem to act so differently?

When in Rome

Personal experience tells me that it's pretty often worth it to code to the platform's spec, even if it means bending over backwards a little. Users want customization and uniformity where it makes sense. It's hard to be uniform with the rest of the platform unless you're a domain expert or the platform's spec outlines all of the little behaviors that it supports. Good luck with that. When trying to customize something, having to learn enough about each system to customize it is a pain that people would rather not have to deal with. That isn't to say that breaking spec or ignoring it is bad or anything. If you need to do something specific and you're prepared to accept the consequence that people with crazy use cases will be annoyed, go for it.

Take this blog for example. I've complained in a previous post about how it's nearly impossible to code to the HTML/CSS spec and come out with something good, but I tried to anyways. I used the appropriate tags in the appropriate places, even though I don't normally have much of a stomach for that sort of thing. However, despite the fact that it's entirely text, I didn't take any time making this blog screen-reader accessible. And I won't either, until I either get a reader who uses one or I go out and buy several different screen readers to test my blog on. This is because screen readers don't really have a spec to conform to, and anything that I do to "improve" the site in that department would be based on wild conjecture and hearsay. If there's one thing I've learned about software it's that it's always a million times more fucked than even your craziest assumptions. I also broke HTML/CSS spec and added some javascript in order to render equations, something that's completely impossible to do a reasonable job with in those specs. This means having to accept with the fact that RSS feeds can't render anything with equations in them.

So what about the stack? At the end of the day, the platform that I program on is x64, and it's designed to run stack-based software. The operating system that I run on is Windows, and it's a stack-based OS. This leads to some... interesting decisions, like MessageBoxA() being a function, etc. When libraries get called, they get brand new stack frames, which is hopefully a similar enough environment to the one that was used to create and test the library in, so that's good.

The alternative I guess would be interupt-style communication to the OS. I've never really dealt with interupt-heavy code, but from why I've heard from other devs at talks it seems like something that should be kept to a minimum.

djtaylor.me

The stack

Fixed size buffer

Trees

When in Rome