Location>code7788 >text

Implement ELF file parsing, support -h, -S, -s

Popularity:260 ℃/2024-12-09 16:01:00

ELF file

Compilation and linking

ELF RepresentativeExecutable and Linkable Format, is the most common binary file format for Unix-like platforms. The following three file formats are all ELF.

  • target document.o
  • Dynamic library files.so
  • .orespond in singing.soLinking the resulting binary executable

The file conversion during compilation linking and execution is shown below.

image-20241130162807177

file structure

According to Von Neumann's principle, a program consists of instructions and data, so the ELF file stores the contents of the program, i.e., code (instructions) + data + other meta-information.

The ELF file serves as a bridge for the conversion of static programs to processes, and the structure is outlined in the following figure:

  1. FILE HEDER: Describe the organization of the entire document.
  2. Program Header Table: Describes how to map segments in an ELF file into memory, and it provides information to the operating system's loader about which segments need to be loaded into memory, their permissions, and how to map them.
  3. Section Header Table: Describes the individual sections in an ELF file, providing detailed information about each section, such as name, size, type, and location.
  4. Section / Segment: Section describes the elf file from a linking perspective and paragraph describes the elf file from a memory loading perspective.

image-20241130141031157

ELF files are used for both linking and loading phases and have two views, the linking view and the execution view.

view Stored content data structure Use Stage file format
link view Static programs, organized in sections. Section Header Table Compile, Links .o ,.so
implementation view After loading to the memory distribution, organized in segments (Segment) Program Header Table (of cargo etc) load executable program

Section and Segment areLogical to physical mapping relationships

  • A Segment corresponds to multiple Sections
  • A Section can only correspond to a Segment

Typical correspondence:

Name of operative paragraph (Segments) Contained Sections scope of one's jurisdiction
PT_LOAD (code segment) .text,.rodata R E
PT_LOAD (data segment) .data,.bss RW
PT_DYNAMIC .dynamic,.got,.plt RW
PT_INTERP .interp R
PT_NOTE .note R
PT_SHLIB .shstrtab,.symtab RW
PT_TLS .tbss,.tdata R,RW

During the loading phase, the loader organizes the virtual memory according to Segment, constructs a memory space for a process, and completes the conversion of a static file to a process.

ELF file parsing

Basic idea: according to the meta information in the ELF Header, jump to the corresponding part for parsing.

image-20241201124307267

readelf -l fileName

Parsing the header of the ELF file, the data structure is as follows, just parse it one by one:

typedef struct
{
  unsigned char	e_ident[EI_NIDENT];	/* Magic number and other info */
  Elf64_Half	e_type;			/* Object file type */
  Elf64_Half	e_machine;		/* Architecture */
  Elf64_Word	e_version;		/* Object file version */
  Elf64_Addr	e_entry;		/* Entry point virtual address */
  Elf64_Off	e_phoff;		/* Program header table file offset */
  Elf64_Off	e_shoff;		/* Section header table file offset */
  Elf64_Word	e_flags;		/* Processor-specific flags */
  Elf64_Half	e_ehsize;		/* ELF header size in bytes */
  Elf64_Half	e_phentsize;		/* Program header table entry size */
  Elf64_Half	e_phnum;		/* Program header table entry count */
  Elf64_Half	e_shentsize;		/* Section header table entry size */
  Elf64_Half	e_shnum;		/* Section header table entry count */
  Elf64_Half	e_shstrndx;		/* Section header string table index */
} Elf64_Ehdr;
members hidden meaning note
e_ident Document Information Subscript:
[0.3]: Magic numbers
4: Documentation category
5: Data coding
6: Document version
7: Completion
e_type Document type ET_NONE,ET_REL,ER_EXEC,ET_DYN,ET_CORE
e_machine machine architecture EM_NONE,EM_M32,EM_SPARC,EM_386,EM_68K,EM_88K,EM_860,EM_MIPS
e_version Target file version EV_NONE,EV_CURRENT
e_entry entry point address The Entry Point pointer in the diagram above
e_phoff Program header table offset Program Header Table Offset
e_shoff section header table offset Section Header Table Offset
e_flags Flags in the file that are associated with a specific processor
e_ehsize Byte length of ELF file header ELF Header Size
e_phentsize Byte length of each table entry in the program header table Program Header Entry Size
e_phnum Number of entries in the program header table Program Header Entry Number
e_shentsize Byte length of section header Section Header Entry Size
e_shnum Number of entries in the section header table Section Header Number
e_shstrndx Index value of the table entry in the section header table associated with the section name string table Section Header Table Index Related With Section Name String Table

readelf -S fileName

Parses the section header table of an ELF file.

Get the section header table according to the header information: (elf_header is a pointer of type Elf64_Ehdr)

  • Get the section header table address:elf_header + elf_header->e_shoff

  • Traversing the section header table: table size:elf_header->e_shnum

  • The data structure of each element in the section header table is as follows

typedef struct
{
  Elf64_Word	sh_name;		/* Section name (string tbl index) */
  Elf64_Word	sh_type;		/* Section type */
  Elf64_Word	sh_flags;		/* Section flags */
  Elf64_Addr	sh_addr;		/* Section virtual addr at execution */
  Elf64_Off	sh_offset;		/* Section file offset */
  Elf64_Word	sh_size;		/* Section size in bytes */
  Elf64_Word	sh_link;		/* Link to another section */
  Elf64_Word	sh_info;		/* Additional section information */
  Elf64_Word	sh_addralign;		/* Section alignment */
  Elf64_Word	sh_entsize;		/* Entry size if section holds table */
} Elf64_Shdr;

readelf -s fileName

Parses the symbol table in an ELF file.

Symbol table is stored as a section, traverses all sections, determines if it is a symbol table based on ``elf_shdr->sh_type``, and parses the section if it is (elf_shdr is a pointer to Elf64_Shdr type)

  • The number of elements in the section:elf_shdr->sh_size / elf_shdr->sh_entsize

  • The symbol table is structured as the elements in the section as follows:

typedef struct
{
  Elf64_Word	st_name;		/* Symbol name (string tbl index) */
  Elf64_Addr	st_value;		/* Symbol value */
  Elf64_Word	st_size;		/* Symbol size */
  unsigned char	st_info;		/* Symbol type and binding */
  unsigned char	st_other;		/* Symbol visibility */
  Elf64_Section	st_shndx;		/* Section index */
} Elf64_Sym;

Program execution results

Compile:gcc -o elf_reader elf_reader.c

  1. ./elf_reader -h

image-20241201125827776

  1. ./elf_reader -S

image-20241201131018524

  1. ./elf_reader -s

image-20241201131159022

source code (computing)

#include <>
#include <>
#include <>
#include <>
#include <>
#include <>
#include <sys/>

// Main Functions:carry outELFfile parsing
void parse_elf_header(const Elf64_Ehdr *elf_header);        //-h: Parsing headers
void parse_section_headers(const Elf64_Ehdr *elf_header);   //-S: Explanatory section header table
void parse_symbol_table(const Elf64_Ehdr *elf_header);      //-s: Explanatory Symbol Table

// helper function:Formatted output
const char* get_elf64_st_type_name(unsigned char info);
const char* get_elf64_st_bind_name(unsigned char info); 
const char* get_elf64_st_visibility_name(unsigned char other);
const char* get_section_type_name(Elf64_Word type);
const char* get_section_flags_name(Elf64_Xword flags);
const char* get_class_name(unsigned char class_value);
const char* get_data_name(unsigned char data_value);
const char* get_version_name(unsigned char version_value);
const char* get_os_name(unsigned char os_value);
const char* get_type_name(unsigned char type_value);
const char* get_machine_name(unsigned char machine_value);
void print_symbol_table(const char *strtab, Elf64_Sym *symbols, int count, const char *symtab_name);


int main(int argc, char * argv[])
{
    // Getting Program Parameters
    if (argc != 3) {
        fprintf(stderr, "Usage: %s <elf-file> <-h|-S|-s>\n", argv[0]);
        exit(EXIT_FAILURE);
    }
    const char *option = argv[1];
    const char *filename = argv[2];

    // Open file
    int fd = open(filename, O_RDONLY);
    if (fd == -1) {
        perror("Failed to open ELF file");
        exit(EXIT_FAILURE);
    }

    // mmapmap (math.)
    off_t file_size = lseek(fd, 0, SEEK_END);
    char *map = mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
    if (map == MAP_FAILED) {
        perror("Memory mapping failed");
        close(fd);
        exit(EXIT_FAILURE);
    }
    Elf64_Ehdr *elf_header = (Elf64_Ehdr *)map;

    // Handling of cases corresponding to different parameters
    if(strcmp(option, "-h") == 0)
    {
        parse_elf_header(elf_header);
    }
    else if(strcmp(option, "-S") == 0)
    {
        parse_section_headers(elf_header);
    }
    else if(strcmp(option, "-s") == 0)
    {
        parse_symbol_table(elf_header);
    }

    // Close file
    munmap(map, file_size);
    close(fd);

    return 0;
}

// -h: Parsing headers
void parse_elf_header(const Elf64_Ehdr *elf_header) {
    printf("ELF Header:\n");
    printf("  Magic:   ");
    for (int i = 0; i < EI_NIDENT; ++i) {
        printf("%02X ", elf_header->e_ident[i]);
    }
    printf("\n");

    printf("  Class:                             %s\n", get_class_name(elf_header->e_ident[EI_CLASS]));
    printf("  Data:                              %s\n", get_data_name(elf_header->e_ident[EI_DATA]));
    printf("  Version:                           %s\n", get_version_name(elf_header->e_ident[EI_VERSION]));
    printf("  OS/ABI:                            %s\n", get_os_name(elf_header->e_ident[EI_OSABI]));
    printf("  Type:                              %s\n", get_type_name(elf_header->e_type));
    printf("  Machine:                           %s\n", get_machine_name(elf_header->e_machine));
    printf("  Version:                           %d\n", elf_header->e_version);
    printf("  Entry point address:               %#lx\n", elf_header->e_entry);
    printf("  Start of program headers:          %ld (bytes into file)\n", elf_header->e_phoff);
    printf("  Start of section headers:          %ld (bytes into file)\n", elf_header->e_shoff);
    printf("  Flags:                             %#lx\n", elf_header->e_flags);
    printf("  Size of this header:               %d (bytes)\n", elf_header->e_ehsize);
    printf("  Size of program headers:           %d (bytes)\n", elf_header->e_phentsize);
    printf("  Number of program headers:         %d\n", elf_header->e_phnum);
    printf("  Size of section headers:           %d (bytes)\n", elf_header->e_shentsize);
    printf("  Number of section headers:         %d\n", elf_header->e_shnum);
    printf("  Section header string table index: %d\n", elf_header->e_shstrndx);
}

// -S: Explanatory section header table
void parse_section_headers(const Elf64_Ehdr *elf_header) {
    // Find the section header table and the string table
    Elf64_Shdr *sections = (Elf64_Shdr *)((char *)elf_header + elf_header->e_shoff);
    const char *strtab = (char *)elf_header + sections[elf_header->e_shstrndx].sh_offset;

    printf("Section Headers:\n");
    printf("  [Nr] Name              Type             Address           Offset\n");
    printf("       Size              EntSize          Flags  Link  Info  Align\n");

    // (math.) ergodic node
    for (int i = 0; i < elf_header->e_shnum; i++) {
        // printhead、name (of a thing)、typology、address、misalignment
        printf("  [%2d] %-17s %-16s %016lx  %08lx\n",
               i,
               &strtab[sections[i].sh_name],
               get_section_type_name(sections[i].sh_type),
               sections[i].sh_addr,
               sections[i].sh_offset);

        // Print Section Size、Entry size、symbolize、Link Index、text、line up in correct order
        printf("       %016lx  %016lx  %-6s %4u %4u %5lu\n",
               sections[i].sh_size,
               sections[i].sh_entsize,
               get_section_flags_name(sections[i].sh_flags),
               sections[i].sh_link,
               sections[i].sh_info,
               sections[i].sh_addralign);
    }
}

// -s: Explanatory Symbol Table
void parse_symbol_table(const Elf64_Ehdr *elf_header) {
    // Finding section header table pointers and string tables
    Elf64_Shdr *sections = (Elf64_Shdr *)((char *)elf_header + elf_header->e_shoff);
    const char *strtab = (char *)elf_header + sections[elf_header->e_shstrndx].sh_offset;

    // Iterate through each section
    for (int i = 0; i < elf_header->e_shnum; i++) {
        // If it is a symbol table
        if (sections[i].sh_type == SHT_SYMTAB) {  // .symtabsymbol table
            Elf64_Shdr *symtab = &sections[i];
            Elf64_Sym *symbols = (Elf64_Sym *)((char *)elf_header + symtab->sh_offset);
            int count = symtab->sh_size / symtab->sh_entsize;  // Number of symbols
            const char *symstrtab = (char *)elf_header + sections[symtab->sh_link].sh_offset;
            print_symbol_table(symstrtab, symbols, count, ".symtab");
        }
        // 如果是动态symbol table
        else if (sections[i].sh_type == SHT_DYNSYM) {  // .dynsymsymbol table
            Elf64_Shdr *dynsymtab = &sections[i];
            Elf64_Sym *dynsymbols = (Elf64_Sym *)((char *)elf_header + dynsymtab->sh_offset);
            int dynsym_count = dynsymtab->sh_size / dynsymtab->sh_entsize;  // 动态Number of symbols
            const char *dynsymstrtab = (char *)elf_header + sections[dynsymtab->sh_link].sh_offset;
            print_symbol_table(dynsymstrtab, dynsymbols, dynsym_count, ".dynsym");
        }
    }
}


// gainClass字段text
const char* get_class_name(unsigned char class_value) {
    switch(class_value) {
        case ELFCLASS32: return "ELF32";
        case ELFCLASS64: return "ELF64";
        default: return "Unknown";
    }
}

// gainData字段text
const char* get_data_name(unsigned char data_value) {
    switch(data_value) {
        case ELFDATA2LSB: return "2's complement, little endian";
        case ELFDATA2MSB: return "2's complement, big endian";
        default: return "Unknown";
    }
}

// gainVersion字段text
const char* get_version_name(unsigned char version_value) {
    switch(version_value) {
        case 0: return "Invalid Version";
        case 1: return "1 (current)";
        default: return "Invalid Version";
    }
}

// gainOS字段text
const char* get_os_name(unsigned char os_value) {
    switch(os_value) {
        case ELFOSABI_NONE:    return "UNIX - System V";
        case ELFOSABI_LINUX:    return "Linux";
        case ELFOSABI_SOLARIS: return "Solaris";
        case ELFOSABI_FREEBSD: return "FreeBSD";
        default: return "Others";
    }
}

// gainType字段text
const char* get_type_name(unsigned char type_value) {
    switch(type_value) {
        case ET_NONE: return "NONE (None)";
        case ET_REL:  return "REL (Relocatable file)";
        case ET_EXEC: return "EXEC (Executable file)";
        case ET_DYN:  return "DYN (Shared object file)";
        case ET_CORE: return "CORE (Core file)";
        default: return "Unknown";
    }
}

// gainMachine字段text
const char* get_machine_name(unsigned char machine_value) {
    switch(machine_value) {
        case EM_386:    return "Intel 80386";
        case EM_ARM:    return "ARM";
        case EM_X86_64: return "AMD x86-64";
        case EM_AARCH64: return "ARM AARCH64";
        default: return "Unknown";
    }
}



// 解析节typology
const char* get_section_type_name(Elf64_Word type) {
    switch (type) {
        case SHT_NULL:       return "NULL";
        case SHT_PROGBITS:   return "PROGBITS";
        case SHT_SYMTAB:     return "SYMTAB";
        case SHT_STRTAB:     return "STRTAB";
        case SHT_RELA:       return "RELA";
        case SHT_HASH:       return "HASH";
        case SHT_DYNAMIC:    return "DYNAMIC";
        case SHT_NOTE:       return "NOTE";
        case SHT_NOBITS:     return "NOBITS";
        case SHT_REL:        return "REL";
        case SHT_SHLIB:      return "SHLIB";
        case SHT_DYNSYM:     return "DYNSYM";
        default:             return "UNKNOWN";
    }
}

// 解析节symbolize
const char* get_section_flags_name(Elf64_Xword flags) {
    static char flag_str[64];
    flag_str[0] = '\0';

    if (flags & SHF_WRITE) strcat(flag_str, "W");
    if (flags & SHF_ALLOC) strcat(flag_str, "A");
    if (flags & SHF_EXECINSTR) strcat(flag_str, "X");
    if (flags & SHF_MERGE) strcat(flag_str, "M");
    if (flags & SHF_STRINGS) strcat(flag_str, "S");

    return flag_str[0] == '\0' ? "None" : flag_str;
}

// gain符号typology
const char* get_elf64_st_type_name(unsigned char info) {
    switch (ELF64_ST_TYPE(info)) {
        case STT_NOTYPE: return "NOTYPE";
        case STT_OBJECT: return "OBJECT";
        case STT_FUNC: return "FUNC";
        case STT_SECTION: return "SECTION";
        case STT_FILE: return "FILE";
        default: return "UNKNOWN";
    }
}

// gain符号绑定
const char* get_elf64_st_bind_name(unsigned char info) {
    switch (ELF64_ST_BIND(info)) {
        case STB_LOCAL: return "LOCAL";
        case STB_GLOBAL: return "GLOBAL";
        case STB_WEAK: return "WEAK";
        default: return "UNKNOWN";
    }
}

// gain符号可见性
const char* get_elf64_st_visibility_name(unsigned char other) {
    switch (ELF64_ST_VISIBILITY(other)) {
        case STV_DEFAULT: return "DEFAULT";
        case STV_INTERNAL: return "INTERNAL";
        case STV_HIDDEN: return "HIDDEN";
        case STV_PROTECTED: return "PROTECTED";
        default: return "UNKNOWN";
    }
}

void print_symbol_table(const char *strtab, Elf64_Sym *symbols, int count, const char *symtab_name) {
    printf("Symbol table '%s' contains %d entries:\n", symtab_name, count);
    printf("   Num:    Value          Size Type    Bind   Vis      Ndx Name\n");
    
    for (int i = 0; i < count; i++) {
        // symbol table内容输出,on the basis ofreadelf -s格式carry outline up in correct order
        printf("%6d: %016lx  %-5lu %-7s %-6s %-8s %-4d %s\n", 
               i,
               symbols[i].st_value,
               symbols[i].st_size,
               // 解析符号typology
               get_elf64_st_type_name(symbols[i].st_info),
               // Resolving Symbol Bindings
               get_elf64_st_bind_name(symbols[i].st_info),
               // resolving symbol visibility
               get_elf64_st_visibility_name(symbols[i].st_other),
               symbols[i].st_shndx,
               &strtab[symbols[i].st_name]);
    }
}

consultation

  • Related concepts

  • ELF file data structure

  • code implementation