Sorry, Wrong Number: Debugging a Crash under Wine

Sorry, Wrong Number: Debugging a Crash under Wine
Photo by Jos Speetjens / Unsplash

On December 3rd, 2021, a friend of mine needed help: their program was crashing on Wine, and they wanted to know how to fix it.

(And I'm only getting around to publishing this now, several months later. I'm a bit disorganized...)

Normally, the answer to this question is not very complicated, because a lot of stuff works just fine in Wine provided the environment is setup correctly, and some stuff simply doesn't work. Usually problems fall into one of those two categories; however, in this case, there really wasn't any obvious reason (at least to me) why it shouldn't work.

The program in question is compiled with MSys2's MinGW-w64 package, using GCC 10.3. It contains a few other libraries also compiled with the same toolchain, as DLLs, including libpng and zlib, which I am about to become very familiar with.

Some debugging had already been done, and they knew which call the application was failing in: a call to png_read_info. I ran it under Wine using the WINE_DEBUG=+all option and generated a gigantic log file of mostly useless information. After a bit of correlation, I found the last API call before the crash: an msvcrt._read, returning successfully... and then we crash.

After a bit of misdirection, I realized a crucial detail that I had been glancing over for a bit: the access violation was an execute, not a read or write. That means that RIP is landing in the middle of a page that is not executable. Hmmm. Stack corruption, somehow?

Something interesting about Wine is that you can run it under Valgrind, which, with a few flags, does actually work correctly. But upon doing this, I discovered nothing particularly interesting, certainly nothing that would suggest stack corruption, so I moved on.

At this point I decided to break out rr, a special debugger that can record and replay program execution. Honestly, it's a bit overkill here, but it does make it easier to analyze crashes, and this seemed like a good excuse to pull it out. There is a bit of trickiness with using rr on top of Wine, but it works more or less just fine; it's just a bit of a pain to get the replay working. I never quite figured out how to get debug symbols to map correctly with this Wine-under-GDB setup going on, so I had to manually explore the address space to figure out what I was looking at.

After much ado, the program crashes into... nowhere. It crashes at 0x2'fe8f'2910. Nothing is mapped here. Hmm.

Using the magic of rr, I can replay to some point directly before the crash and then step into it. A few hundred stepis later, and I found the culprit: e8 20 45 7e 96, at the address 0x3'6810'e3eb. AKA, CALL 0x2fe8f2910 . In other words: there is an explicit CALL to nowhere.

At this point, I threw libpng in a disassembler, and found the instruction at 0x36810e3eb. The instruction?

A screenshot showing a CALL instruction, CALL near ptr crc32 (E8 78 9E 03 00)
A call to... crc32?

Bizarre. That CALL has a completely different address. It is e8 78 9e 03 00, not e8 20 45 7e 96 which is non-sense and points backwards to before the entire module.

So who's modifying the CALL? Is it the program? Is it libpng? Is it Wine?

A screenshot of Fred Jones from Scooby Doo imminently unmasking a perpetrator.

Tracing the CALL

One thing we do know about the .text segment is that it's read-only. Of course, you should at least verify this in your disassembler, but I did, and indeed, it's read-only. That means that in order to modify the segment, someone would need to deliberately mark it writable. On UNIX-like platforms, you would use a syscall like mprotect, whereas Windows provides VirtualProtect in kernel32. Thankfully, there's really no way that libpng would link to VirtualPro-

A screenshot of the IDA imports panel, showing VirtualProtect and VirtualQuery being imported from KERNEL32 by the libpng DLL.
Goddammit.

...what exactly is libpng doing calling this?

A call graph showing VirtualProtect being called by sub_36812E420, which is called by sub_36812E590, which is called by sub_3680F1200 (which is, effectively, the DLL's entrypoint.)

Apparently, it reaches back to sub_3680F1200, which is just the entry point of the DLL–there's a stub over at the "true" entry point, but IDA does not count the jmp in the call graph, so you can't see it here.

In order to try to identify what this code was, I used the tried and true strategy of looking for interesting strings, and quickly found a few, but the most interesting was this one: "Unknown pseudo relocation bit size %d" – hrm, what's a pseudo relocation?

What Is A Pseudo Relocation?

I've mostly glossed over many of the lower level details in this post, but I think this one merits some more attention.

What's a normal relocation?

Before answering what a pseudo relocation is, I'd like to discuss regular relocations. When a linker links a program module, it has to pick some arbitrary "base address" to use for position-dependent code and data. What does that mean? Let's say you have a global, statically-initialized variable that is a pointer to another global variable. This is allowed. The pointer written into the executable file during compilation (specifically linking) is the address that would be correct if the program module was loaded into its preferred base address. Much code and data is position-independent, and thus does not need relocations, but any place where an absolute offset into the address space must be written, such as static pointers, relocations will be needed.

However, being loaded at your preferred base address is somewhat rare these days. For one thing, almost all executable loaders have to support relocating the module to a different base address, because otherwise, it'd be impossible to simultaneously load two modules whose preferred base addresses lead to an overlap, and these cannot be coordinated ahead of time in most cases. In addition, modern AMD64 machines have plenty of address space, so for security reasons, a mitigation called ASLR is almost always used, which essentially just randomizes the base address of program modules even if they are not initially overlapping. (This is a bit of an oversimplification.)

If we move (that is, change the location of) the program module in memory, the addresses that the linker had to write based off of the preferred base address don't line up, as the module is now at a different address, and all offsets are now shifted by some value. In order to adjust this pointer, the linker stores a relocation entry in the binary during compilation for each instance of position-dependent code or data, such as our pointer. At runtime, the executable loader or runtime linker will read each relocation entry and adjust it based on the type of relocation and the offset; adding the offset between the preferred base and the actual base directly to the value present at the address. As long as there is no inadvertent position-dependent code not accounted for by relocations, everything will work perfectly fine.

Modern Windows uses the Portable Executable format. The PE format contains ~9 or so different kinds of relocation entries, some of which vary depending on CPU architecture. The main reason this is necessary is to handle different relocations that modify CPU instructions, where the address may be encoded into the CPU instruction in different ways that the relocation needs to be aware of. PE handles, for example, special cases for the MIPS, ARM (32-bit), and RISC-V instruction sets. (UEFI uses the PE binary format for its binaries, so in order for RISC-V to be able to support UEFI, the PE binary format needed to add support for RISC-V, too. Fun fact!)

Microsoft Windows vs. Everything Else

The thing is, though, in many other operating systems, especially UNIX-likes, relocations and symbols are different. Whereas Windows binaries have explicit Imports and Exports, and a vector of pointers called the IAT (Import Address Table,) ELF binaries have a single unified symbol table. And when it comes to relocations, ELF has a lot more types of relocations than PE.

Why does this matter? The answer has everything to do with linkage.

When you compile some C code, and it references a symbol which is not defined in that translation unit, it is treated as an external symbol. Later, during linking, when the linker resolves that symbol, it can place the address of the symbol where needed, and thus, the symbol is resolved.

This becomes a problem when linking to other libraries and modules; the address of the symbol is not actually known until runtime, when those libraries and modules are loaded in. Because of that, you need to generate different code; code that resolves the address at runtime, then uses that address. At least on Windows, it would not be typical to generate code like this for any external symbol; it would be slow and wasteful.

Thankfully, there is a workaround for function calls: the linker can generate a thunk; a small routine that forwards the call through the IAT, then that thunk can be used as the address to write in for the CALL instruction.

But what if you reference a data symbol from another library or module? Or, if you try to get the address of a function symbol from another library or module? That's a problem. You need to generate the aforementioned code which resolves the address first, and the compiler, not knowing that this is the case, will generate the wrong code, and the linker will not be able to deal with it.

With ELF, you actually can do this, using symbol-relative relocations. With PE, you are S.O.L. (Simply Out of Luck.) Or at least, you would normally be.

What a pseudo-relocation is.

Of course, it is possible to link to data symbols on Windows. This is what all of that __declspec(dllimport) business is for: you can specify it on declarations of external symbols, and that way the compiler can generate code which is appropriate for linking to an external symbol in another library.

So what's the problem?

Well, a lot of code written for UNIX-likes doesn't mark their symbols with __declspec(dllimport), given that it is a Windows-only feature. MinGW wants to support compiling these programs, and in order to support that, it has invented the concept of pseudo-relocations. (Or perhaps Cygwin has invented this concept; I'm not sure.) A pseudo-relocation is a "fake" relocation handled at runtime, by the library itself, after the real relocations are done. It does this by, at the entrypoint, walking through a list of pseudo-relocations and adjusting the pointers with some offset relative to an IAT entry. These pseudo-relocations import symbol-relative imports, just like ELF systems.

In other words... Pseudo-relocations are a MinGW feature that implements a special kind of "relocation" where a pointer in code or data is replaced with an address relative to an imported symbol.

(Truth be told, it's unclear why MinGW has decided that a call to crc32 needs a pseudo-relocation, but it seems like it can happen when you pass flags to prevent thunks from being generated, and the wrong definitions to zlib to prevent it from using the proper attributes on its symbols.)

Hopefully, you have at least as good an understanding as I do about why pseudo-relocations exist, and what problem they're meant to solve... but a problem remains:

Why does this code, which works under Windows, break under Wine? If you are particularly keen, you may already have an idea what's going on, but just to make sure we're going to dive into exactly what's happening.

Digging Deeper

In order to get more insight into what's going on, we can debug the pseudo-relocation implementation. I don't have debug symbols for it, but that's not a big deal, since the pseudo-relocation code compiles down to relatively succinct and understandable machine code.

Wine provides a GDB server, so you can connect a number of different debuggers. However, I hit a crucial limitation with winedbg right away: it seems to execute the loader before we have a chance to insert breakpoints, which means the libpng entrypoint has already ran before we get the chance to break on it. This leaves us with a couple of different options:

  • We could fudge the was_init variable. It's a global that the pseudo-reloc code uses to determine if it has run already; the DLL entrypoint runs for new threads, so we could just break on that, then adjust was_init so that it runs again anyways. This probably works in this case, but it's not very versatile.
  • We can forgo winedbg and simply run Wine itself under GDB.

Wine under GDB

When I used rr earlier, I was basically already doing this. However, rr is pretty overkill for this problem, and actually introduces some complexity of its own, so it's probably easier to just forgo it for now.

I'm on NixOS, where the WINE binary on the $PATH is actually a shell script. We'll tell GDB to execute bash first.

 gdb --args bash wine Game.exe

This won't work just yet; we need to adjust the behaviors on fork and exec; then we can go.

(gdb) set follow-fork-mode child
(gdb) set follow-exec-mode new
(gdb) catch fork
Catchpoint 1 (fork)
(gdb) catch exec
Catchpoint 2 (exec)
(gdb) run

We can continue a couple of times until we're finally in our target binary, then flip follow-fork-mode back to parent and follow-exec-mode back to same. I want to set a breakpoint at the point at which the was_init variable is flagged. Because WINE doesn't implement ASLR, our libraries end up at their preferred base addresses, so lacking symbols in GDB, I can just enter the raw addresses as determined by digging around in IDA:

(gdb) break *0x36812E5C8
Breakpoint 3 at 0x36812e5c8

GDB doesn't give us a whole lot of feedback. Something useful you can do is have GDB print the disassembly at the instruction pointer for you:

(gdb) display/i $pc
1: x/i $pc
=> 0x36812e5c8: movl   $0x1,0x14b0e(%rip)        # 0x3681430e0
(gdb) 

Now, as we step through, we can see what instruction we are on.

In this case, the +0x14b0e address is was_init. This instruction may look strange since pseudo-reloc has ++was_init rather than was_init = 1, but I think we can assume that the compiler has optimized it to assume was_init is zero due to the conditional beforehand. Neat.

This could get pretty boring if we tried to understand and explain each instruction. I've already analyzed the function and found the relocations, so I should be able to set a conditional breakpoint that gets me into the exact spot I want to be.

IDA Pro screenshot showing a number of runtime_pseudo_reloc_item_v2 structures, highlighting the one that covers the offset of interest for us.

IDA Pro annoyingly shows the same address for all relocations because it's marked as an array. That's OK – we just need to calculate it out, and make a quick breakpoint:

(gdb) break *0x36812E69D if $rbx == 0x36813DF4C
Breakpoint 5 at 0x36812e69d

And now we can continue. Once we're there, we can step around:

(gdb) stepi
0x000000036812e69f in ?? ()
1: x/i $pc
=> 0x36812e69f: mov    0x4(%rbx),%esi
(gdb)
0x000000036812e6a2 in ?? ()
1: x/i $pc
=> 0x36812e6a2: movzbl 0x8(%rbx),%edx
(gdb)
0x000000036812e6a6 in ?? ()
1: x/i $pc
=> 0x36812e6a6: add    %r13,%rax
(gdb)
0x000000036812e6a9 in ?? ()
1: x/i $pc
=> 0x36812e6a9: add    %r13,%rsi
(gdb)
0x000000036812e6ac in ?? ()
1: x/i $pc
=> 0x36812e6ac: mov    (%rax),%r15
(gdb)
0x000000036812e6af in ?? ()
1: x/i $pc
=> 0x36812e6af: cmp    $0x20,%edx
(gdb)
0x000000036812e6b2 in ?? ()
1: x/i $pc
=> 0x36812e6b2: je     0x36812e7a8
(gdb)

This is just a switch statement compiled into a conditional tree. As luck would have it, all of our relocations are 32-bit (...), so this first conditional hits immediately.

(gdb)
0x000000036812e7a8 in ?? ()
1: x/i $pc
=> 0x36812e7a8: mov    (%rsi),%edx
(gdb)
0x000000036812e7aa in ?? ()
1: x/i $pc
=> 0x36812e7aa: mov    %rdx,%rcx
(gdb) nexti
0x000000036812e7ad in ?? ()
1: x/i $pc
=> 0x36812e7ad: or     %r14,%rdx
(gdb)
0x000000036812e7b0 in ?? ()
1: x/i $pc
=> 0x36812e7b0: test   %ecx,%ecx
(gdb)
0x000000036812e7b2 in ?? ()
1: x/i $pc
=> 0x36812e7b2: cmovns %rcx,%rdx
(gdb)
0x000000036812e7b6 in ?? ()
1: x/i $pc
=> 0x36812e7b6: mov    %rsi,%rcx
(gdb)
0x000000036812e7b9 in ?? ()
1: x/i $pc
=> 0x36812e7b9: sub    %rax,%rdx
(gdb)
0x000000036812e7bc in ?? ()
1: x/i $pc
=> 0x36812e7bc: add    %rdx,%r15
(gdb)
0x000000036812e7bf in ?? ()
1: x/i $pc
=> 0x36812e7bf: call   0x36812e420
(gdb)
0x000000036812e7c4 in ?? ()
1: x/i $pc
=> 0x36812e7c4: mov    %r15d,(%rsi)
(gdb)
0x000000036812e7c7 in ?? ()
1: x/i $pc
=> 0x36812e7c7: jmp    0x36812e694
(gdb)

This code right here is the culprit. It just wrote %r15d to the memory at the address pointed to by %rsi. What are those values?

(gdb) i r r15d
r15d           0x967e4520          -1770109664
(gdb) i r rsi
rsi            0x36810e3ec         14630839276

We have, without a doubt, located the culprit. It's writing the exact same sequence of incorrect bytes we saw earlier. What the heck is wrong with it? Why isn't it getting the correct address for crc32? Why is it 0x1'0000'0000 too far forward?

If you haven't figured it out yet, this should do it:

# At this point, %rax points to where the IAT entry is,
# and %r15 points to the actual value in it.
rax            0x368148268         14631076456
r15            0x1fe8f2910         8565762320

# Read the value at the target into %edx.
# This is a pointer into the IAT.
mov    (%rsi),%edx

rsi            0x36810e3ec         14630839276
*rsi           0x39e78             237176
edx            0x20 -> 0x39e78     32 -> 237176

# Copy %rdx into %rcx.
mov    %rdx,%rcx
rdx            0x39e78             237176
rcx            0x0 -> 0x39e78      0 -> 237176

# Perform sign extension on %rdx copy.
or     %r14,%rdx
r14            0xffffffff00000000  -4294967296
rdx            0x39e78 -> 0xffffffff00039e78  237176 -> -4294730120

# Test %ecx for flags.
test   %ecx,%ecx
eflags         0x286               [ PF SF IF ]
ecx            0x39e78             237176

# This undoes the sign extension if the sign bit is not set.
cmovns %rcx,%rdx
eflags         0x206               [ PF IF ]
rcx            0x39e78             237176
rdx            0xffffffff00039e78 -> 0x39e78  -4294730120 -> 237176

# At this point we've undone the sign extension.

# Move relative offset of IAT into rcx, for mark_section_writable.
mov    %rsi,%rcx
rsi            0x36810e3ec         14630839276
rcx            0x39e78 -> 0x36810e3ec  237176 -> 14630839276

# %rax is the absolute address of the IAT entry. Subtract it from %rdx.
sub    %rax,%rdx
rax            0x368148268         14631076456
rdx            0x39e78 -> 0xfffffffc97ef1c10  237176 -> -14630839280

# Add %rdx to %r15.
add    %rdx,%r15
rdx            0xfffffffc97ef1c10  -14630839280
r15            0x1fe8f2910 -> 0xfffffffe967e4520  8565762320 -> -6065076960

# Call mark_section_writable
call   0x36812e420

# Write the pseudo-relocation back.
mov    %r15d,(%rsi)
rsi            0x36810e3ec         14630839276
*rsi           0x39e78 -> 0x967e4520  237176 -> -1770109664
r15d           0x967e4520          -1770109664

Did you catch it? The distance between the instruction is greater than what can be stored in a 32-bit value. The E8 CALL instruction can only jump between [-231,231) bytes away from the RIP as of execution because it can only store a 32-bit signed offset. Unfortunately, the pseudo reloc code simply failed silently back when I was debugging this, but I believe it has been fixed and now outputs an error when this occurs, so it shouldn't be so puzzling to future generations.

One last thing...

There is one more weird thing though. This program works on Windows, reliably. Obviously, it isn't loading libraries at their preferred base addresses, or it would crash. So why is this happening?

Well, simple: Wine doesn't support ASLR, and the libraries, at their preferred addresses, wind up too far away for the pseudo-relocations.

However, the fact that it works seemingly reliably on Windows is very interesting. Maybe an interesting exploration would be to see exactly why Windows ASLR seems to consistently choose addresses that are unproblematic. Perhaps it's because the first time after bootup that these particular modules load is in quick succession?

Regardless, now knowing how the problem can be fixed, it's hard to be motivated to dig too much deeper. It might be nice if Wine could have similar ASLR behavior to Windows, so that these problems are less likely to crop up only on Wine, but these problems could also occur on Windows with ASLR disabled, so it's probably not that important.

Overall, I had fun debugging this issue. I'm also really happy with how far Wine has come, and I do not think it is a coincidence that the issue we hit was not reasonably Wine's fault. I am, however, a bit sad that I didn't get an opportunity to track down and fix a nasty Wine bug, but all the more happy that the reason for this is because it simply didn't exist.

Maybe next time. :)