Unicorn Devblog: setjmp/longjmp on Windows

Introdution

This post is a longer explanation why we need a wrapper for setjmp on Windows x86_64 for Unicorn.

For the corresponding pull request, see this.

Story

The story starts with Qiling Framework. When I ran tests for Qiling on native Windows someday, the whole python process exited silently. After some investigating and debugging, I’m sure that the crash happens in unicorn, not Qiling. Thus, I submitted an issue to Unicorn. However, recently we’d like to make Qiling run on Windows so I decided to solve the issue.

What happened?

At first look, the stacktrace shows it is in uc_version:

1
2
3
4
5
6
7
8
9
10
0:000> kv
# Child-SP RetAddr : Args to Child : Call Site
00 00000042`215eb400 00007ff9`dee33033 : 00000194`6bf86fb0 00007ff9`f5ebe2d9 00000042`215f0000 00000042`00000000 : _ctypes!DllCanUnloadNow+0x7c38
01 00000042`215eb440 00007ff9`f5f4124f : 00007ff9`dee3f000 00007ff9`dee20000 000012d8`00023000 00000042`215ebc10 : _ctypes!DllCanUnloadNow+0x8bb3
02 00000042`215eb470 00007ff9`f5ebd9b2 : 00007ff9`dee3ff78 00000042`215ec420 00007ff9`dee3f8a0 00007ff9`dee30025 : ntdll!_chkstk+0x19f
03 00000042`215eb4a0 00007ff9`84c7f87a : 00000000`00000000 000001d6`49b3bee8 00000000`00000004 00000000`00000004 : ntdll!RtlUnwindEx+0x522
04 00000042`215ebbb0 00007ff9`84955670 : 00000000`00000000 0000395c`1d8654ed 000001d6`4762fdf0 00000000`00000000 : unicorn!uc_version+0x3757aa
05 00000042`215ec0f0 00007ff9`849561de : 000001d6`49b8a030 00007ff9`ae0ba97d ffffffff`00000000 000001d6`49b82360 : unicorn!uc_version+0x4b5a0
06 00000042`215ec120 000001d6`4c1069b0 : 00000000`00000000 00000000`00000000 00000042`215ec270 00007ff9`849f7bba : unicorn!uc_version+0x4c10e
07 00000042`215ec150 00000000`00000000 : 00000000`00000000 00000042`215ec270 00007ff9`849f7bba 000001d6`4c8f28a3 : 0x000001d6`4c1069b0

But that is quite confusing since the crashed context is in a hook callback which never calls uc_version. The stacktrace above is not so helpful so a minimum reproduction code is required, as @aquynh also suggested.

Reproduction

The good news is that this bug also exists in real mode when I implement it for Qiling Framework. The crash often happens when a hook callback is called multiple times but the exact time when crash happens differs. Thus, I write a snippet of reproduction code in C:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#define _CRT_SECURE_NO_WARNINGS
#include <cstdio>
#include "unicorn.h"

// addr:
// INT 21h;
// jmp addr;
const char* cmd = "\xcd\x21\xeb\xfc";
int count = 1;

void cb(uc_engine *uc, uint32_t intno, void *user_data) {
printf("Callback count: %d\n", count);
count += 1;
}

int main() {
uc_engine* uc;
uc_err err;
printf("Start\n");
err = uc_open(UC_ARCH_X86, UC_MODE_16, &uc);
if (err != UC_ERR_OK) {
printf("Failed to open uc\n");
return -1;
}
err = uc_mem_map(uc, 0x7000, 4 * 1024, UC_PROT_ALL);
if (err != UC_ERR_OK) {
printf("Failed to allocate memory with %d\n", err);
return -1;
}
int ip = 0x100;
int cs = 0x75a;
int address_to_load = cs * 16 + ip;
err = uc_mem_write(uc, address_to_load, cmd, 4);
if (err != UC_ERR_OK) {
printf("Failed to write memory with %d\n", err);
return -1;
}
err = uc_reg_write(uc, UC_X86_REG_IP, &ip);
if (err != UC_ERR_OK) {
printf("Failed to write register with %d\n", err);
return -1;
}
err = uc_reg_write(uc, UC_X86_REG_CS, &cs);
if (err != UC_ERR_OK) {
printf("Failed to write register with %d\n", err);
return -1;
}
uc_hook hook;
err = uc_hook_add(uc, &hook, UC_HOOK_INTR, (void*)cb, nullptr, 0, -1);
if (err != UC_ERR_OK) {
printf("Hook failed with %d\n", err);
return -1;
}
printf("Before emulation.\n");
err = uc_emu_start(uc, address_to_load, address_to_load + 4, 0, 0);
if (err != UC_ERR_OK) {
printf("Emulation error %d\n", err);
return -1;
}
printf("After emulation.\n");
uc_close(uc);
return 0;
}

The exact crash point is inside RtlUnwindEx function which is part of setjmp implementation. However, this code only reproduces the bug with SEH enabled and Debug build. If it is built in Release mode or the SEH is disabled, no crash would happen. At this time, I guess that it’s highly likely related to some undefined behavior so I turn to docs.

MSDN & ctypes hint

After some googling, MSDN gives me a hint.

In Microsoft C++ code on Windows, longjmp uses the same stack-unwinding semantics as exception-handling code. It is safe to use in the same places that C++ exceptions can be raised.

After checking the property of VS project, I find that unicorn disable exceptions indeed so calling longjmp is not safe. In addition, ctypes docs states:

On Windows, ctypes uses win32 structured exception handling to prevent crashes from general protection faults when functions are called with invalid argument values.

Wow! Looks like it’s the root cause, right? Unfortunately, after enabling exceptions for Unicorn, the program still crashes.

I’m really lost in thought… Why the setjmp/longjmp is not safe with exceptions disabled? Why are these two library functions tied to a platform-dependent mechanism?

RtlUnwindEx

The questions above bring me to RtlUnwindEx for an answer. The document of the function is pretty simple on MSDN:

Initiates an unwind of procedure call frames.

After some analysis, I find RtlUnwindEx will identify previous frames firstly and locate where the exception (in our case, setjmp) is called. So, the question is: How RtlUnwindEx parses previous frames? As we know, on x86_64 Windows, all functions share the same calling convention so it’s easy to identify such frames… Wait, Unicorn supports JIT so how about those generated codes?

Bingo! The jit-ed code doesn’t follow any existing calling conventions and that confuses RtlUnwindEx and results in a crash. To confirm this, I write a PoC in following steps:

  • Create a normal DLL which calls setjmp.
  • Then use IDA Pro to insert some instructions like the generated code above and call longjmp.
  • Write another program loads the DLL and runs our target function.

As expected, the program crashes in RtlUnwindEx, which proves my guess.

Musl implementation

Since the native implementation is not compatible with JIT, I consider replace it with some other implementation. Musl implementation is an extremely simple one:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/* Copyright 2011-2012 Nicholas J. Kain, licensed under standard MIT license */
.global __setjmp
.global _setjmp
.global setjmp
.type __setjmp,@function
.type _setjmp,@function
.type setjmp,@function
__setjmp:
_setjmp:
setjmp:
mov %rbx,(%rdi) /* rdi is jmp_buf, move registers onto it */
mov %rbp,8(%rdi)
mov %r12,16(%rdi)
mov %r13,24(%rdi)
mov %r14,32(%rdi)
mov %r15,40(%rdi)
lea 8(%rsp),%rdx /* this is our rsp WITHOUT current ret addr */
mov %rdx,48(%rdi)
mov (%rsp),%rdx /* save return addr ptr for new rip */
mov %rdx,56(%rdi)
xor %rax,%rax /* always return 0 */
ret

After translating the code with intel grammar, Unicorn works like a charm and the problem seems to be really resolved. However, some comments in qemu suggest that this bug can be resolved in a more proper way:

1
2
3
4
5
6
7
8
#if defined(_WIN64)
/* On w64, setjmp is implemented by _setjmp which needs a second parameter.
* If this parameter is NULL, longjump does no stack unwinding.
* That is what we need for QEMU. Passing the value of register rsp (default)
* lets longjmp try a stack unwinding which will crash with generated code. */
# undef setjmp
# define setjmp(env) _setjmp(env, NULL)
#endif

In Unicorn, it was patched since _setjmp(env NULL) doesn’t exist in MSVC while qemu is built by mingw cross-compilation. But wait, if qemu can work with native longjmp implementation, why not Unicorn?

setjmp/longjmp implementation

Finally, after debugging qemu-2.2.0-win32 for about 1 hour, I find the reason. Below is the disassembly of longjmp.

Microsoft puts two longjmp implementations in one fucntion and the first field of jmp_buf (rcx in the figure above) decides which implementation to use. Where does this flag come from?

Look into setjmp implementation and the answer is quite simple.

Yes, qemu is right. If the second parameter is NULL(0), longjmp will use the common implementation instead of unwinding stack with RtlUnwindEx. Okay, make sense and how can we call _setjmp(env, NULL) in msvc? After reading numerous MSDN docs and Windows SDK headers, my answer is:

NO WAY.

Microsoft writes a function with two parameters but only gives you a signature which has one arguments. Nice work again, Microsoft. :/

So the only way is to write a wrapper in a standalone assembly file since Microsoft also removes support for x64 inline assembly.

Conclusion

I HATE WIN32.

Reference