Making up facts about history

Pub. 2023 Apr 9

The original make was written back in 1976, nearly 50 years ago. It allowed people to declare a pipeline for building their programs. The common case is that users specify the input source files -> output files relationships and the commands that perform that transformation. Then make would look at the timestamps of the input files and output files, and if the last modified date on the input files were newer than the last modified date of the output files that were already on disk, make would know that the output files were out of date and needed to be re-built.

Make came so close to being great, to simultaneously lifting up the good parts of a pipeline while also burying the bad parts. Consider an alternative version of make where your Makefile looks like

gcc -c file2.c -o file2.o
gcc file1.o file2.o -o executable
gcc -c file1.c -o file2.o

, where the commands are just listed in whatever order you want, and make would just know which ones to run and in which order, automatically, without having any markup declaring pre-reqs or post-reqs.

Why didn't it? Well, mostly historical circumstances.

cc (and all of its derivatives, as well as most programs written for the terminal) take in input as files in their argument list. Specifically, filenames. There would have been no good way for make to tell which arguments were files or not, and of those files which were inputs and which were outputs.

You could imagine an alternate version of unix, where instead of passing in strings, you passed in structured data, like

gcc "-c" readonly(file1.c) writeonly(file1.o)

If the syntax were nice enough for Feldman to have just written a parser, and cc would have been able to tell the difference between input and output files by the permissions, then maybe the goal of infering the build pipeline automatically would have felt more doable.

Or, you could imagine an alternate universe where, instead of taking in filenames, it took in source in its stdin:

gcc -c < file1.c > file1.o

, or maybe with an imaginary, less-cursed syntax:

file1.c |> gcc -c |> file1.o

This may have been enough structure to be able to infer the pipeline.

Of course, both of these would necessarily require that the C language were different as well. In C, you're suppose to define dependancies between files _in the .c files themselves_ with #include. So you would have to do something like

# file1.c has no #include directives. gcc takes in all files as inputs
# (I guess it has multiple stdin's? idk)
# and gcc will proceed to compile into file1.o every symbol in all of the input sources
file1.c header1.h header2.h |> gcc -c |> file1.o

Here in the future-times, I understand these changes are probably not the way to go. It would make source files much less self-contained and make compilcated build-time logic more cumbersome than it already is, for not a whole lot of benefit. In my eyes, the millions-of-c-files model of software is a relic of the past.

But who knows! There are probably still a lot of domains out there that could benefit from heavy pipelines. And it's been 50 years, lots of time to make improvements.

Fast-forward 50 years, and what do we have? Some things are slightly better, like Dockerfiles. Most other things are worse, like Jenkins, GH Actions, and Travis CI.

djtaylor.me

Making up facts about history