Untitled Blog.

Object Files, Part 1: History

2025-06-11

Recently, I dug into object files, and I took some notes along the way. I thought it was pretty interesting, so I've decided to collect some of those notes to share. Let’s take a look.

This is part 1, where I will go over the history of object file formats across some different systems.

What are object files?

An “object file” is a file that contains machine code, typically generated by a compiler or assembler. An object file is the output of a command like cc -c -o main.o main.c. Usually, it contains machine code and data separated into linear blocks of memory called sections or segments, a symbol table, and some other metadata that’s needed during the linking process. (Some of the data, like the symbol table, might be stored inside of sections.)

But object files are not the only files that contain machine code, so what’s the difference between a simple object (.o/.obj), and an executable (.exe) or shared library file (.dll/.so/.dylib)? There aren't actually that many distinctions. Here are a few key differences:

Why are they called “object files”, anyways? It seems to originate from the term “object code”, which is used in opposition to “source code”. Source code is human-readable code, whereas object code is machine readable. Wikipedia demonstrates that the term goes back to at least the 1950s, citing a 1959 publication A primer of programming for digital computers.

The history of object file formats

Today, ELF is one of the most common object formats, seeing use in most UNIX and UNIX-like operating systems as well as a myriad of other systems. Microsoft Windows uses the Portable Executable format, and Apple's XNU-based operating systems use the Mach-O format. Let’s find out how we got here.

UNIX object formats

On the UNIX side of things, one of the earliest object formats is the a.out format, a very simple format with fixed sections. This originated from the original UNIX and was extended and adapted widely informally. Among other problems, a.out was not well-suited to dynamic linking and was hard to extend owing to its simplistic fixed structure.

AT&T’s UNIX System Laboratories (USL) later developed COFF, a more advanced object format designed to be portable across architectures, introduced in AT&T UNIX System V. The COFF structures were laid out in the a.out.h header in the first release of System V UNIX, suggesting that it was built to directly extend the a.out format; in fact, the COFF “Optional Header” began as the “Auxillary Header Information” structure, a structure only present in executable COFF objects, which in fact contains the same fields as the original a.out format. Because COFF was not well-specified, other vendors had difficulty implementing COFF as-specified, and as such extended variations, such as ECOFF and XCOFF, were born.

COFF was superceded by ELF in AT&T UNIX SVR4, which was then specified in the System V Application Binary Interface 1st Edition, published in 1990. It was further specified by the TIS (Tool Interface Standards) Committee, an industry group with representatives from companies such as Microsoft, SCO, Intel, IBM and others, in a document called the “Tool Information Standard (TIS) Portable Format Specification”, which aimed to streamline development by rigorously specifying standard object formats to improve interoperability between software and reduce the number of incompatible formats in circulation. The Portable Format Specification was considered the canonical specification of the ELF format, though it was last updated in 1995 and so more recent ELF specifications were released in later versions of the System V Application Binary Interface document. The Linux Foundation’s “Referenced Specifications” page lists a draft of the System V ABI from 2001 as the chronologically-last ELF format specification1.

Microsoft object formats

On the Microsoft side of things, the story begins with the 8086 Relocatable Object Module Format, sometimes known as OMF. This format was created by Intel, and widely used by 16-bit DOS toolchains. The OMF format was designed for the original 16-bit 8086 architecture, but Intel adapted it into OMF286 and OMF386 for the Intel 80286 and Intel 80386 processors.

In 1988, a group of former DEC engineers led by Dave Cutler would join Microsoft and form the Portable Systems Group, with the goal of creating a 32-bit successor to the original OS/2. (Up to that point, OS/2 was essentially DOS with multitasking; indeed, one early version of the OS/2 lineage was actually sold as MS-DOS 4.00—not to be confused with MS-DOS 4.0.) As part of the NT OS/2 project, the Portable Systems Group developed new tools and standards for 32-bit software development. When choosing an object format for this endeavor, several factors were considered, including the speed of loading executable images, the ability to share as many memory pages between processes as possible, and the ability to extend the format in the future.

The Portable Systems Group would ultimately choose COFF as the object format for NT OS/2, citing many technical reasons to prefer it over the contemporary Cruiser Linear Executable Format. Cruiser was the codename for OS/2 2.0, IBM’s own effort to produce a 32-bit version of OS/2, which is believed to have started development at around the same time as Microsoft’s NT OS/2 project2. Ultimately, COFF was preferred in part because COFF was already supported by tools for Intel’s i860 microprocessor.

Tangent: The Intel i860 was a short-lived RISC-based microprocessor from Intel. In the late 80s, it was commonly assumed that RISC was the future of microprocessor design, following the then-common wisdom that CISC was inefficient, with IBM having discovered yet another instance of the Pareto Principle:

“In the early 1970s, IBM examined the kinds of operations most commonly carried out by a complex instruction set machine, and found that only about 20% of the simplest instructions represented about 80% of the processing time.”

The Computer Chronicles - Reduced Instruction Set Computer (1986)

Intel wasn't alone: AMD found some moderate success with their AMD Am29000 RISC microprocessor. However, the breakout success of RISC never really happened, and the sentiment slowly died down. In the end, these RISC processors mostly wound up in embedded systems and networking equipment, failing to meaningfully displace x86.

Though the lines are blurred nowadays with regards to RISC vs CISC, it is still possible that we may finally see a point when other optimizations run dry and simpler instruction sets win out. In particular, some believe that it is possible AArch64 will prove to eventually scale better than AMD64 due to having simpler fixed-size instructions, allowing for a more efficient and smaller-footprint frontend. As of this writing, the verdict is still out!

NT OS/2’s choice of COFF as a format may seem surprising, given COFF was a UNIX standard, and Portable Systems Group leader Dave Cutler allegedly had a strong distaste for UNIX. That said, while Windows NT would indeed ultimately choose COFF as the object format, it was not the executable or shared library format. Instead, a new format, the Portable Executable format, sometimes called PE/COFF, would be created instead. The PE format would inherit some small parts of COFF, but added many improvements specific to executables and shared libraries. Gone is the COFF symbol table, replaced with the Import Directory and Export Directory. Imports are scoped to a specific library, and symbol interposition is absent from the platform, improving on a number of UNIX footguns while losing some admittedly-useful but pretty ugly tricks. COFF relocations are replaced with the Relocation Directory, designed to be as efficient as possible for fast runtime loading and to allow maximum sharing of memory pages between processes. Ultimately, the COFF lineage of PE/COFF is relatively insignificant, and is mostly of historical interest (even then, in terms of historical interest, I've always felt it is a bit underrated—it’s quite interesting!)

Apple object formats

Apple developed the Preferred Executable Format (PEF) for use in the original Mac OS. PEF was fairly advanced for its time, supporting features like arbitrary named sections earlier than UNIX.

Apple had been working on a successor to the classic Mac OS for quite some time, without much success. Eventually, when Apple famously acquired NeXT, they would subsume with it the NeXTSTEP operating system, whose Mach-based kernel would serve as a basis for XNU. Apple eventually released Mac OS X, an XNU-based UNIX operating system to finally succeed classic Mac OS. Preferred Executable Format remained supported for some time to allow for Carbon applications that could support both classic Mac OS and Mac OS X simultaneously. However, the new Mac OS X native format was Mach-O, the object, executable and shared object format of the Mach kernel.

Mach-O notably resolves symbols as being namespaced per-object, somewhat resembling PE’s approach, making ELF-based platforms some of the only remaining platforms where symbol names conflict between unrelated shared libraries in a process space.

If you want more information about object file formats and their history, here are some other articles from around the Internet that discuss the subject.