A guide to reverse engineering the ESP8266 internal ROM

This guide assumes you're somewhat familiar with:

The esp open sdk
The GCC toolchain
Linux
Microcontrollers?

The first step in understanding the nature of the ESP8266 is looking at how all the functionality is distributed. As seen in the not so well documented memory map, the system has different memory sections which contain all kinds of useful information.

The ESP8266 has a Tensilica Xtensa IP core which is responsible for the code execution. The code is separated into three parts:

The internal ROM
The SDK provided libraries
The user code

All three parts of the code interact with each other in a manner that is outside the scope of this article. I will document them once I finish reverse engineering all of the basic registers.

All of the low level functionality is located in the internal ROM, and most of the functions available there can be seen in the ROM linker script. Here is an excerpt:

esp-open-sdk/sdk/ld/eagle.rom.addr.v6.ld
...
PROVIDE ( ets_timer_handler_isr = 0x40002da8 );
PROVIDE ( ets_timer_init = 0x40002e68 );
PROVIDE ( ets_timer_setfn = 0x40002c48 );
PROVIDE ( ets_uart_printf = 0x40002544 );
PROVIDE ( ets_update_cpu_frequency = 0x40002f04 ); 
PROVIDE ( ets_vprintf = 0x40001f00 );
PROVIDE ( ets_wdt_disable = 0x400030f0 );
PROVIDE ( ets_wdt_enable = 0x40002fa0 );
PROVIDE ( ets_wdt_get_mode = 0x40002f34 );
...

This simply means that all of these symbols have a predefined address pointing to the internal ROM (0x40000000 - 0x40000FFFF).

Now on the hands on part; The first thing is dumping the internal ROM with the SDK provided esptool.py (You can skip this part if you already have the dump):

[ratzzo@lprt0334 esptest]$ esptool dump_mem 0x40000000 0x10000 rom_dump.bin

We chose the function that we want to reverse engineer, for this example let's go with a simple one such as:

void ets_update_cpu_frequency(int freqmhz);

We look for the address of the function in the ROM ld script which in this case is 0x40002f04 as seen on the excerpt before. With this information, we can get into disassembling the dump.

For this we have to use the sdk provided objdump (xtensa-lx106-elf-objdump)

[ratzzo@lprt0334 esptest]$ xtensa-lx106-elf-objdump -bbinary -mxtensa --adjust-vma=0x40000000  --start-address=0x40002f04 -D rom_dump.bin | less

Objdump will disassemble the rom and adjust the address to 0x4000000 which is the start address of the ROM, and start the disassembly at 0x40002f04 which is the address of our function.

The output will look something like:

40002f04:  31f1ff      l32r    a3, 0x40002ec8
40002f07:  2903        s32i.n  a2, a3, 0
40002f09:  0df0        ret.n
40002f0b:  0021ef      excw
40002f0e:  ff          .byte 0xff
40002f0f:  2802        l32i.n  a2, a2, 0
40002f11:  0df0        ret.n
40002f13:  002632      excw
40002f16:  0e          .byte 0xe
40002f17:  26620f      beqi    a2, 6, 0x40002f2a
40002f1a:  42c2f4      addi    a4, a2, -12
40002f1d:  0cd3        movi.n  a3, 13
40002f1f:  0c02        movi.n  a2, 0
40002f21:  402383      moveqz  a2, a3, a4
40002f24:  0df0        ret.n
40002f26:  0cb2        movi.n  a2, 11
40002f28:  0df0        ret.n

objdump output will be in the following format:

address:   opcode       mnemonic

The complete reference (and a good read if you like this sort of thing) is the Xtensa ISA.

Going back to the assembly we check the code until the return:

40002f04:  31f1ff      l32r    a3, 0x40002ec8; Load 32-bit PC-Relative
40002f07:  2903        s32i.n  a2, a3, 0     ; Narrow Store 32-bit
40002f09:  0df0        ret.n;                ; Narrow Non-Windowed Return

So basically, it loads what is in 0x40002ec8 (the address 0x3FFFC704), and stores it in a3; then stores a2 (The argument) in a3 (0x3FFFC704) with offset 0, and returns. We've just seen that all what the function does is storing its argument in 0x3FFFC704.

Pretty simple huh :)?

Edit: I've taken the time to disassemble the entire rom with the available symbols corrected check it out here. It's provided as a .txt because it's huge!