Location>code7788 >text

Analyzing elf file structure based on constant value extraction

Popularity:85 ℃/2024-11-08 14:24:45

Elf (executable and linkable files) is a never-ending topic, as long as we are still using Android phones/linux servers, we need to understand some aspects of elf, now let's start from a small demand for constant value extraction, step by step to parse the elf file structure it!

 

I. Purpose of writing:

Network on the elf file structure description of the article is not a few, but can be specific to the binary analysis of a handful of, always give a person a kind of eight-legged feeling, and recently happened to encounter a need to get through the symbol table for its representation of the constant value of the demand, after the completion of the realization of the process of summarizing the process of refining the elf structure of the article written so that the follow-up learning to review the introduction of the elf structure.

 

II. Demand:

There are many operations for assigning and using constants in C++, and now that we have access to a dynamic link library .so file compiled from C++, we want to backtrack on the possible symbols in it and the values they represent.

 

III. Basics

 

① .Elf file type:

Elf file types are divided into three, .o\.so\.exe, ordinary .exe executable file I believe we are not unfamiliar with, here is the main introduction to the .o and .so file. .o is a relocatable target file, .so is a shared target file, the difference between the two is that .o is static, .so is dynamic, static means that it will be linked by the linker in the compilation of the merged into the executable file, while the dynamic is in the executable file to be used when it is loaded. In addition to the different uses, its file structure and file structure of the various data types are the same, elf file data types roughly for the figure in several categories of signed unsigned 1/2/4 size of the address offset integer and so on (we parse only need to remember that char is 1 byte half is 2 bytes the rest of the 4 bytes can be).

② .Elf file link view and execution view

The file structure of elf is a little more complex, it is divided into two perspectives: link view and execution view. The reason why it is called perspective is because it is the same as the different perspectives from which a person looks at an object, although it looks different, but it is the same object in essence. The linking view refers to the linker's point of view of the elf file, which focuses on the elf file's sections, i.e., using the header section fetch table to locate each section and then linking, whereas the execution view looks at the elf file from the point of view of the program's execution, which focuses on how to use the header table of the program to locate each section and then load it into the memory. The correspondence between the two views is also shown in the figure above, the .text section corresponds to the code section, the .rodata/.data/.bbs and other sections containing data correspond to the data section, and the rest of the special links, such as the dynamic symbol table, etc., will not be loaded into memory.

 

IV. Realization

After understanding some of the above basics, we also know that to get the symbols and their corresponding values, we can not start from the execution view, because the symbol table may not even be recognized by the program head table, so we start from the link view, according to the head section table to locate the data section and the symbol table section, according to the index of the relationship between the completion of the match, the specific implementation of the process is shown below.

 

Next, let's go through it step by step

① Parses the elf file header:

First post the structural definition of the elf file header

 

Must be proficient in C / C + + of all the big brother must be a second to understand, here will not be too much to explain the construction of its ELF32 data type specific representation of the meaning of the above has been demonstrated, and here also do not say more. The key data here are the following:

e_ident:hexadecimal array

First of all, the magic number, look at the file first look at the magic number, here the 16-bit e_ident of the first 4-bit data can only be 0x7F454C46, converted to ascii code, that is, 0x7F ELF. and then in order to indicate the system (1 for 32-bit / 2 for the 64-bit), the size of the end (1 for the LSB / 2 for the MSB), the version of the information (1 for the current version), the running system (0 for UNIX / 3 for Linux ...) is UNIX/3 is Linux...) , operating system ABI, 7-bit padding data.

e_type:two-byte target file type

1 for relocatable files, 2 for executable files, 3 for shared object files, 4 for core dump files

 

Follow this pattern to interpret the example shown above, i.e., this is a 32-bit/LSB/current version/.so file running on UNIX.

After determining the file type, we can follow the process described above and then analyze...

Section header table offset, table entry size, number of table entries

Based on the elf header structure we can easily know the information we want above

e_shoff(32-35)\e_shentsize(46-47)\e_shnum(48-49)

 

This test file has a header section table offset of 2896 (LSB), a table entry size of 40, a table entry count of 22

Section Header Table Name String Table Index

The name string to be used for each table entry in the section header table, corresponding to the string table, which will be used when parsing the section header table entries, in this case 21(0X15)

validate (a theory)

We know that from the link view, elf file header section table after the end of the file is also read, so we 2896 + 40 * 22 should be the size of the file 3776 (really, it seems that the above analysis of the work of all right)

 

② Obtain a table of head section areas:

In elf header, we have obtained the header section table offset, table entry size, and number of table entries, and now we can read the section according to the header table entries in order, the section table entry structure is shown in the following figure

 

Among the things we need to focus on are the following:

Table entry number (index), section name (sh_name), section type (sh_type), section offset (sh_offset), section length (sh_size), and additional information (sh_info).

Table entry serial number:

Obtained by counting when loading

Sectional Offset:

The 17th-20th bytes within the table entry, indicating the section data offset within the file

Section length:

The 21st-24th bytes within the table entry, indicating the section data offset within the file

Name of the section area:

The node area name is indexed in the corresponding (string table index in the header of the file) string table, and then a string is obtained ending in \0.

Type of section area:

Bits 5-8, located in individual table entries, indicate section usage, commonly:

SHT_PROGBITS (0x1): contains program-defined data such as code, read-only data, read-write data, etc.

SHT_SYMTAB (0x2): contains symbol table information for linking or debugging.

SHT_STRTAB (0x3): contains a table of strings, typically used to represent names in a symbol table or section table.

...

SHT_DYNSYM (11): contains the dynamically linked symbol table for runtime symbol resolution.

...

③Get the section string table

In the file header above, we've got the index of the string table entry used in the section as 15, and the offset of the section table is 2896 and the size of the table entry is 40, so the offset of the string table entry is 2896 + 21*40 = 3736

 

From the section table entry of the string table, we can get the offset of its actual string table is 2691(0xA83), and the length is 202(CA).

 

The section name of a table entry that is the first 4 bytes of the table entry, the internal index of the corresponding string table, the section name of the string table index is 178 (0XB2), and then according to the \0 end breaks, that is, the head of the section table corresponds to the string table name is .shstrtab

(We can also get a rough idea of what is in the elf file - e.g., whether it contains certain specific sections - by using the string table.shstrtab that corresponds to the section header table entries.)

Bits 5-8 of the above string table are 0x03000000, i.e., it contains the string table (other sections may also contain the string table, but the usage is not quite the same)...

④Get the symbol table

(From the above section name string table, we can learn that there is a dynamic symbol table .dynsym, there is no static symbol table .symtab, so when traversing the section table entries, we can not only through the name string ".dynsym" but also through the section type 11/0x0B to locate dynamic symbol table entries by not only the name string ".dynsym" but also the section type 11/0x0B)

 

Based on the information of the symbol table section entries, we can know where the symbol table is stored and the size of individual items.

sh_addr (section location in memory): 13th-16th byte, value 524 (0x0C020000)

sh_offset (offset in section data file): 17th-20th bytes, value 524 (0x0C020000)

sh_size (section length): 21st-24th bytes, value 304 (0x30010000)

sh_link (section header table index): bytes 25-28, value 7

sh_entsize (size of individual items in section): 37th-40th bytes, value 16 (0x10000000)

 

In the above symbol table, 19 symbol structures are actually stored (304 / 16)

 

The single symbol term is shown above, which has these values

Symbol Name:

st_name, bytes 1-4, is the index in the string table pointed to by sh_link in the symbol table, and the symbol name string is also obtained by index +\0 ending.

Symbol value:

st_value, the 5th-8th byte, obtains the meaning according to the specific situation, for example, when the symbol indicates a function, the value is the starting address of the function in memory, and if the symbol indicates a global or static variable, it indicates the location of the memory in the variable.

Symbol value value size:

st_size, 9th-12th bytes, length of variable or number of bytes occupied by function code

Symbol Type:

st_info, 13th byte, characterizes symbols based on 1 byte of octet bits labeled as flag, with the high 4 bits indicating the binding attribute (Binding) and the low 4 bits indicating the symbol type (Type)

Type:

STT_OBJECT (1): data object, usually a variable

STT_FUNC (2): function or other executable code

...

Binding:

STB_LOCAL (0): local symbol, visible only in the current module

STB_GLOBAL (1): global symbol, visible in all modules

...

Section header index:

st_shndx, bytes 15-16, obtain meaning on a case-by-case basis

 

⑤ Get symbol name string table

First we have to get the symbolic name, which is the variable name/function name...

Based on the sh_link value of 7 in the symbol table entry in the section header, we can calculate the start address of the corresponding string table

2896+40*7

 

(Based on the section type of 3 in the section table entry, we can also be sure that the section is the string section we are looking for.)

Read the section information:

Name: 93 (0x5D), plus name symbol table offset 2691, to get that name string.dynstr

 

Offset: 1180 (0x9C04)

Size: 341 (0x0155)

The following is a list of symbolic name strings

 

(vi) Traversing symbols

Get symbol type

Iterate through the symbol table (repeated with the figure in 4) each symbol (4 has been briefly described in each symbol structure), to obtain its symbol name and symbol type, st_info of the lower four bits of 1, then the symbol for the OBJECT variable, if 4size may be a string pointer

 

Such as these six symbols, its symbol value size is 4, st_info(0x11) is 00010001 that is, the global data object

Get symbolic value node

After locating the symbol value storage section header table entry offset 2896 + 18*40 = 3616 based on st_shndx value 18 (0x12)

 

The name value is 77+2691 i.e. .

 

Address in memory is 14620 (0x1C39)

File offset is 2332 (0x1C09)

Section length is 24 (0x18)

 

Get the symbolic value and the corresponding constant

Combined with the six symbols mentioned above (the list of corresponding strings for the symbol names is given above), we get the following information

(ad_value is the symbolic variable value-address obtained by calculating st_value with the above memory offset 14620) (Note: the address value here still requires LSB conversion)

(value is the constant value corresponding to the variable obtained by address value offset)

st_name-1: 1426 = 1180 + 246(0xF6) -> “global_var2”

st_value-1: 14628(0x2439)

ad_value-1: 1831(0x2707)

Value: "Measurement"

 

 

st_name-2: 1385 = 1180 + 205(0xCD) -> “a”

st_value-2: 14636(0x2C39)

ad_value-2: 0xFFFFFF7F

Value-2: 0x7FFFFFFF(INT_MAX)

 

st_name-3: 1438 = 1180 + 258(0x0201) -> “global_var3”

st_value-3: 14632(0x2839)

ad_value-3: 1743(0xCF06)

Value: ”abc”

 

 

st_name-4: 1401 = 1180 + 221(0xDD) -> “b”

st_value-4: 14640(0x3039)

ad_value-4: 0xFFFFFF7F

Value-4: 0x7FFFFFFF(INT_MAX)

 

st_name-5: 1403 = 1180 + 223(0xDF) -> “global_var”

st_value-5: 14620(0x1C39)

ad_value-5: 1747(0xD306)

Value: "doGlobalVarTest test"

 

 

st_name-6: 1414 = 1180 + 234(0xEA) -> “global_var1”

st_value-6: 14624(0x2039)

ad_value-6: 1652(0x7406)

Value: "measurement aaa"

 

 

IV. Summary

Through the above steps, we analyzed elf's file architecture step by step, relying on the need to locate symbolic constants. And according to the following test program we experimented what data will be kept symbolic (global non-static variables) after compiled into .so file, and how to get their variable values