This post is about ELF(Executable and Linkable Format) python parser but I will briefly go through ELF specs first. Funny story, I once gave couple of presentations about DPI and I thought it’s funny to have few slides about GCC and ELF. I called it “The short sort of ELF” and as expected, the joke didn’t land. Good thing I am a not comedian :)

The ELF Link to heading

ELF is UNIX standard for executable format supported by toolchains(compilers/linkers) and loaders. The figure,is from the specs, shows the two different views of linking and execution(loader) of ELF.

Example image

The ELF Header contains important fields that parse uses to parse the following:

  • section headers
  • program headers
  • string table

Example image

For implementation, I used OrderDict to represent the fields and created generic parse function to use attr_size_map to populate the fields.

class Elf64Hdr(BinResource):
    def __init__(self):
        pass
    def size_map(self):
        attr_size_map = collections.OrderedDict()
        attr_size_map["e_ident"       ] =  BIT64_DATA_TYPE.Elf64_Char.value * 16
        attr_size_map["e_type"        ] =  BIT64_DATA_TYPE.Elf64_Half.value
        attr_size_map["e_machine"     ] =  BIT64_DATA_TYPE.Elf64_Half.value
        attr_size_map["e_version"     ] =  BIT64_DATA_TYPE.Elf64_Word.value
        attr_size_map["e_entry"       ] =  BIT64_DATA_TYPE.Elf64_Addr.value
        attr_size_map["e_phoff"       ] =  BIT64_DATA_TYPE.Elf64_Off.value
        attr_size_map["e_shoff"       ] =  BIT64_DATA_TYPE.Elf64_Off.value
        attr_size_map["e_flags"       ] =  BIT64_DATA_TYPE.Elf64_Word.value
        attr_size_map["e_ehsize"      ] =  BIT64_DATA_TYPE.Elf64_Half.value
        attr_size_map["e_phentsize"   ] =  BIT64_DATA_TYPE.Elf64_Half.value
        attr_size_map["e_phnum"       ] =  BIT64_DATA_TYPE.Elf64_Half.value
        attr_size_map["e_shentsize"   ] =  BIT64_DATA_TYPE.Elf64_Half.value
        attr_size_map["e_shnum"       ] =  BIT64_DATA_TYPE.Elf64_Half.value
        attr_size_map["e_shstrndx"    ] =  BIT64_DATA_TYPE.Elf64_Half.value
        return attr_size_map

The spec defines enum values for header fields. For that, I used Enum to match these enums.

class E_TYPE(Enum):
    ET_NONE     = 0
    ET_REL      = 1
    ET_EXEC     = 2
    ET_DYN      = 3
    ET_CORE     = 4
    ET_LOOS     = 0xfe00
    ET_HIOS     = 0xfeff
    ET_LOPROC   = 0xff00
    ET_HIPROC   = 0xffff

class E_MACHINE(Enum):  # TODO: x86 and x86-64 for now
    EM_NONE     = 0
    EM_386      = 3
    EM_X86_64   = 62

class E_VERSION(Enum):
    EV_NONE     = 0
    EV_CURRENT  = 1

Sections Link to heading

section header table is array of Elf32_Shdr. The section header is defined as follows

Example image

Similar to ELF header, I defined the section header with fields for the binary parser segment

class Elf64Shdr(BinResource):
    def __init__(self,data):
        data_dict =  common.segment_bin(data,self.size_map() ,0,'lsb')
        common.append_attr(self,data_dict)
    def size_map(self):
        attr_size_map = collections.OrderedDict()
        attr_size_map["sh_name"      ]  =  BIT64_DATA_TYPE.Elf64_Word.value
        attr_size_map["sh_type"      ]  =  BIT64_DATA_TYPE.Elf64_Word.value
        attr_size_map["sh_flags"     ]  =  BIT64_DATA_TYPE.Elf64_Xword.value
        attr_size_map["sh_addr"      ]  =  BIT64_DATA_TYPE.Elf64_Addr.value
        attr_size_map["sh_offset"    ]  =  BIT64_DATA_TYPE.Elf64_Off.value
        attr_size_map["sh_size"      ]  =  BIT64_DATA_TYPE.Elf64_Xword.value
        attr_size_map["sh_link"      ]  =  BIT64_DATA_TYPE.Elf64_Word.value
        attr_size_map["sh_info"      ]  =  BIT64_DATA_TYPE.Elf64_Word.value
        attr_size_map["sh_addalign"  ]  =  BIT64_DATA_TYPE.Elf64_Xword.value
        attr_size_map["sh_entsize"   ]  =  BIT64_DATA_TYPE.Elf64_Xword.value
        return attr_size_map

The following fields of ELF header defines how to get the section header table

  • e_shoff : offset of section header
  • e_shnum: number of section header
  • e_shentsize: size of section header
            start = common.bytearray_to_int(self.ehdr.e_shoff)
            for x in range(0, common.bytearray_to_int(self.ehdr.e_shnum)):
                end = start + common.bytearray_to_int(self.ehdr.e_shentsize)
                sh = Elf64Shdr(self.file_bin[start:end])
                start = end
                self.sh_tbl.append(sh)

Program Header Link to heading

Same as section header, the program header is parsed.

        ## parse program table if applicable
        self.ph_tbl = []
        if(common.bytearray_to_int(self.ehdr.e_phnum) > 0):
            start = common.bytearray_to_int(self.ehdr.e_phoff)
            for x in range(0, common.bytearray_to_int(self.ehdr.e_phnum)):
                end = start + common.bytearray_to_int(self.ehdr.e_phentsize)
                ph = Elf64Phdr(self.file_bin[start:end])
                start = end
                self.ph_tbl.append(ph)

String Table Link to heading

e_shstrndx is the index of string table section. So, we get that section header and parse it using unpack_str_table.

        ## parse e_shstrndx and back annotate the sh headers (sh_tbl)
        sym_sh = self.sh_tbl[common.bytearray_to_int(self.ehdr.e_shstrndx)]
        start = common.bytearray_to_int(sym_sh.sh_addr) + common.bytearray_to_int(sym_sh.sh_offset)
        end   = common.bytearray_to_int(sym_sh.sh_addr) + common.bytearray_to_int(sym_sh.sh_offset) +common.bytearray_to_int(sym_sh.sh_size)
        strtab = common.unpack_str_table(self.file_bin[start:end])
        for sh,nm in zip(self.sh_tbl,strtab):
            sh.real_name = nm

Or we can just use readelf like a normal person.