NJU ICS2024 PA Assignment Tips (II)

Since I'm not a student at NJU 2024, I'm posting this tip "in a dignified manner" and only for the purposes ofNot taking this course as your current academic year's guaranteed course of studyfor the reference of the students. If this is your current course of study for the conservatory, please close this page quickly!!!!

RTFSC problem solving

This part must be veryVery, very carefully.of reading the RISCV's manual, or you'll suffer later.

We here provide some brief additional information on the operations before finger fetching and after execution from the perspective of the framework, so as to facilitate the reader to understand the overall flow of NEMU in conjunction with the help documentation.

detachable finger
1. The length of each instruction in risv32 is 4 bytes, and these instructions are stacked in bin files as binary file reads and writes, and a location parameter is defined in nemu to obtain the required bin file path.
2. In the Interactive Console use thesiorcfulfillment$NEMU_HOME/src/cpu/file under thecpu_execfunction, which calls theexecuteAfter the function executes the instruction a number of times.
3. executefunction creates a structureDecode，pcrefers to the address of the instruction to be decoded.snpcbe used in place ofpcThe next address after that.dnpcAfter the execution of the band-only instruction, thepcthe address to which it is traveling.instwith only the binary code of the instruction.logbufIt is used to store the disassembly result of the instruction.
  
  Then by calling a number of timesexec_onceThe function executes the sequence a number of instructions.
4. exec_oncefunction maintains the structureDecodeThe content of the call to theisa_exec_onceTo fetch, decode, and execute
Implementation (update)pc(after)
1. existexec_oncefunction to record instruction information and disassembly results to theCenter.
2. existexecutefunction that calls thetrace_and_difftestfunction checks the behavior of our decoding in NEMU and prints the trace log (all XTRACE should log results here). Each time we call theexec_onceThe function checks the status of the NEMU (an exception stops the execution of the instruction)
3. existcpu_execfunction prints the result of the execution.

How riscv32 should load 32-bit constants due to instruction word length limitations

Both use segmented loading, where the high bits are loaded first, and then the low parts are merged into registers using immediate number instructions, thus completing the loading operation of the 32-bit constants. This approach effectively utilizes the limited space of 32-bit instructions and avoids the limitation of encoding the entire 32-bit constant directly in a single instruction, and we give the solution for RISC-V32 below.

lui(Load Upper Immediate) command.：
- utilizationluiinstruction loads the high 20 bits of a 32-bit constant into the high portion of a register. For example.lui t0, 0x12345would0x12345000Load to registert0Center.
addi(Add Immediate) Directive：
- utilizationaddiinstruction loads the lower 12 bits of the immediate number of theluiThe loaded highs are summed. For exampleaddi t0, t0, 0x678will set the registert0The value in the0x12345678。

How the riscv32 command is implemented

This section is the focus of this subsection, and there are a number of easy pitfalls to step into (which will require you to carefully study the official documentation), the

Be sure to note whether the operator is signed or unsigned!
Be sure to follow the documentation description according to the type of operator (64-bit OR 32-bit) developed!
Be sure to implement the exception as described in the documentation!

In this section in order to be able to pass all instruction test sets (if the instruction test set passes, it proves that there should be no problem with a particular instruction implementation, and if you encounter a bug after that, please have sufficient confidence in the instruction you have implemented without having to trace back to here), we need to implement a total of 6 types of instructions R, I, S, B, U, and J. We need to implement 6 types of instructions R, I, S, B, U, J, and we need to add them in thedecode_operandincreaseTYPE_*and the way they immediately count to get it (refer to the chart below).

RISCV32指令格式

To get immediate numbers, the NEMU framework gives us some great macro definitions:

BITS(x, hi, lo)Get Binary Number$x$(prefix indicating ordinal number, e.g. first, number two etc)$[lo,hi]$bit content
SEXT(x, len)Symbolic Expansion$x$to binary$len$Bits. The implementation here uses bit fields, cleverly enough to accept signed int by limiting the length of the$x$of signed bit information, which can then be converted into unsigned types to realize signed bit expansion.

The next implementation is to find the missing instruction based on the error message, according to theopcode,funct*Just lock the specific command implementation.

Programs, Runtime Environment and AM Problem Handling

How to run NEMU in batch mode

Since the Makefile in the AM folder must have compiled nemu somehow and passed in the parameters, we need to find this step and then figure out how to batch parameterize the-bIncoming.

By reading$AM_HOME/Makefile line 97content-include $(AM_HOME)/scripts/$(ARCH).mkIt can be noticed that here the introduction of the$(AM_HOME)/scripts/$(ARCH).mkAs a result of ourARCH=riscv32-nemuSo check again.$(AM_HOME)/scripts/It was found that he quoted$AM_HOME/scripts/platform/。

In line 31 of this document$(MAKE) -C $(NEMU_HOME) ISA=$(ISA) run ARGS="$(NEMUFLAGS)" IMG=$(IMAGE).binIt was found that the parameter it passed to nemu wasNEMUFLAGSThis variable. If you're careful, you'll notice that in the Makefile of our test file, we'll put the command-lineARGSParameters passed to MakefileARGSvariable, we need to pass this variable to the Makefile'sNEMUFLAGSThat's right.

But soon you'll realize that if you change this step, it will cause NEMU to fail to output theSo we tried to find-l xxx/The location of the parameter definition reveals that he is in the$AM_HOME/scripts/platform/but this variable is declared in theNot addedoverrideKeywords.. This causes the += operation to not take effect on variables passed on the command line because command line variables with the same name overwrite variables with the same name defined in the Makefile file, requiring the addition of theoverrideSpecializes in arithmetic operations against command-line variables of the same name.

How to implement sprintf

sprintf: Take the list of variable parameters and provide them to thevsprintf, just output the result to a string.
printf: Take the list of variable parameters and provide them to thevsprintf, outputs the result to a string, and then utilizes theputchA single character output is sufficient

So if you want to implement thesprintfThe most critical step is to realizevsprintf

vsprintf is a formatted output function for outputting formatted data to a string. The design idea of this function is based on processing different placeholders in the formatted string and converting them to the corresponding string form according to the type of the variable parameters, and finally splicing the result into the output bufferout in. The following is the idea behind the design of the function:

1. Input parameters

char *out: Pointer to the output buffer to store the formatted string.
const char *fmt: a formatted string containing normal characters and formatting placeholders (e.g.%d，%s (etc.).
va_list ap: A list of variable parameters containing data to be formatted for output.

2. Iterate over formatted strings

utilizationwhile (*fmt != '\0') Iterating over formatted stringsfmt, processed character by character. The processing is divided into two cases:

If the current character is not%, which is copied directly into the output buffer as a normal character, and updates the buffer pointer and total output length.
If the current character is%, then you need to enter the formatting processing logic.

3. Formatting placeholder parsing

run into% When the function enters parsing mode, it parses the various formatting modifiers in turn:

Flag Bit Resolution：
- pass (a bill or inspection etc)switch statement handles common flag bits such as-(left-aligned),+(showing plus and minus signs),0(Filled with zeros),#(special format), etc. Each flag bit is mapped to a binary flag bit and stored in theflags in the variable.
Width Analysis：
- If the width is specified in the format specifier (e.g.%10d (indicating that the output width is at least 10), which is either a number given directly or via the* through (a gap)va_list Dynamically obtained in the parameter.
Precision Analysis：
- If a precision modifier is present (e.g.%.2f (indicating 2 decimal places), parses the precision value. Again, the precision value can be a number specified directly or through the* through (a gap)va_list Get it in the parameter.
Type specifier parsing：
- The last character of the format specifier is the type specifier (e.g.d, s, x etc.) for determining the data type of the current parameter. Depending on the type specifier, different handler functions are called to format the parameters.

4. Different types of formatting

integer type (d, i, u, x, o): callint_to_str function that converts an integer value to a string (supporting decimal, octal, hexadecimal, etc.) according to different binaries, and formats the output according to the specified width and precision.
character type (c): Directly replace thechar Type of parameter to output to the buffer.
string type (s): process the string to the specified width and precision, supports left or right alignment.
Pointer type (p): converts the pointer address to hexadecimal format for output with the0x Prefix.
special category：
- %: Direct Output% Symbol.
- n: Stores the length of the current output character to the corresponding pointer location.

5. Splicing of outputs

After processing each placeholder, the formatted result is spliced into the output buffer via pointer operationsout and update the current buffer pointerbuf and total lengthtotal_len。

6. Return results

Adds a string terminator to the end of the output buffer when the formatting string has been processed\0and returns the length of the output buffer.

How to implement the string library

utilizationman 3 <function_name>Just implement it carefully according to the manual, otherwise there may be problems, and here's a bit of an easy pitfall to step into, as well

strnpyin the event that$n$exceeded$det$length, you don't need to add a string terminator.\0(modal particle intensifying preceding clause)
strcmpNote that the characters in the string are generally unsigned char, here if you want to read the characters of unsigned conversion, otherwise the size of the characters in the ASCII code in 128 and later comparison will be a problem.

How to write a test program for the klib library

Try to use multiple nested for loops to enumerate the start and end positions of the given two strings, as well as the length of the operation.$n$(if any), so that it is generally possible to cover all cases (including the boundary cases mentioned in the previous pit point).

How itrace's ring buffer is implemented

My implementation was to create a new$NEMU_HOME/src/cpu/file, in which a ring queue is maintained and the interface exposed to the outside world is a functionvoid record_inst(uint64_t pc, const char *asm_code)Passed in the pc pointer and disassembly string for the current recorded instruction, and thevoid info_inst_records()function prints information about the current queue.

exist$NEMU_HOME/src/cpu/(used form a nominal expression)trace_and_difftestfunction changes the location of the original print command to record the command into a buffer. In the same file under theassert_fail_msgfunction in the NEMU frameworkASSERTOutput call on interruptinfo_inst_records。

How mtrace is implemented

exist$NEMU_HOME/src/memory/centerpaddr_readcap (a poem)paddr_writeIt is sufficient to log the information after the function is successfully called.

How ftrace is implemented

First we need to understand the structure of the elf file, and here we need to first use theriscv64-linux-gnu-readelf -aGet information about the elf file, then pass thehdcommand to verify the elf file information we obtained, we can clearly understand the elf file architecture, we have a brief introduction below.

Overall composition of the ELF file structure

The ELF file consists of three main parts:

ELF Header(ELF header): describes the organization of the entire file, including basic information such as file type, target architecture, and entry points.

The ELF header is located at the top of the file and contains metadata information about the entire ELF file. It has a fixed structure, usually 64 bytes in size (52 bytes in 32-bit format), and mainly contains the following fields:
- Magic Number: identifies this as an ELF file, the first 4 bytes are fixed values0x7F、E、L、F。
- Class: architecture type of the file, 32-bit (ELFCLASS32) or 64-bit (ELFCLASS64）。
- Data: data encoding, indicating the arrangement of byte order (little end)LSBor big endMSB）。
- Version: The version of the file format, usually1。
- OS ABI: The ABI (Application Binary Interface) of the target operating system interface.
- File type: specifies that this is an executable file (ET_EXEC), relocatable documents (ET_REL) or shared libraries (ET_DYN）。
- Target architecture: e.g. x86 (EM_386) or ARM (EM_ARM), etc.
- Entry point address: the virtual address at which program execution begins.
- Program Header table offset: an offset to the program header table.
- Section Header table offset: Offset to the section header table.
- Flags: flags associated with the processor.
- Head size: the size of the ELF head.
- The size and number of entries in the Program Header table.
- The size and number of entries in the Section Header table.
- Section header string tableIndex in section header table
Program Header Table(program header table): describes how files are mapped to different segments in memory for loading executables at execution time.

The program header table describes how data segments in a file are mapped into memory for program execution. Each program header entry describes the attributes of a memory segment. The program header table is used at execution time to determine which segments are loaded into memory.
Section Header Table(Section Header Table): Describes the attributes of individual sections (Sections) in a file for linking and relocating.

Each entry in the section header table (section header) is used to describe individual sections (sections) in the file that play a role in the linking and relocation phases. Each section has its own specific role, such as storing code, data, symbol tables, relocation information, and so on. Each section header entry typically contains the following information:
- Section Name: indexed by the section name table (Section header string).
- Section Type: describes the content of the section, such as relocatable data (SHT_RELA), table of symbols (SHT_SYMTAB), string tables (SHT_STRTAB), etc.
- Section Flags: describes the attributes of a section, such as writable, executable, allocated to memory, etc.
- Section Address: the virtual address of the section if it is loaded into memory.
- Section Offset: the offset of the section in the file.
- Section Size: the size of the section in the file.
- Link: association with other sections, e.g., the symbol table is associated with the string table.
- Info: additional information, the meaning of this field is different for different types of sections.
- Address Alignment: The alignment requirements of the section in a file or in memory.
- Entry Size: This field specifies the entry size if the section contains fixed-size entries (e.g., symbol tables).
A typical ELF file will contain the following sections:
- .text: code segment.
- .data: initialized data.
- .bss: uninitialized data.
- .symtab: a symbol table containing information about all symbols and their addresses.
- .strtab: string table storing symbol names and section names.
- . or . : Relocation table with relocation information for the code segment.

We begin by clarifying the following information:

We get the address of the function and the offset of the function's name in the string table through the symbol table, and the function name through the string table decoding offset.
Section name tables, symbol tables and string tables, although called tables in Chinese, but in fact they are a section (section).
Section header information for the section name table, symbol table, and string table is stored in the section header table. The section header table is an array which is contiguous in memory with the exact location (offset information) and size given by the ELF header.
The ELF header information is located at the very beginning of the file at an offset of zero.
The information we need is already in the structural composition described abovebold (typeface)Up.

Flow to get the mapping of function name to function start address

Open and read ELF files

pass (a bill or inspection etc)fopen Opens the incoming ELF file. If the file cannot be opened, outputs an error message and terminates the program.

utilizationfread Read the ELF header (Elf32_Ehdr) to get basic information about the file, such as the offset of the section header table and the number of sections.
Verify that the file is in ELF format

Check the magic number field in the ELF header (e_ident), make sure it is a legal ELF file. If the magic numbers don't match, output an error message and terminate the program.
Reading the Section Headers table (Section Headers)

Allocate enough memory to store the section header table based on the section header table offset in the ELF header, the number of sections and the size of each section header table (Elf32_Shdr) and use thefseek cap (a poem)fread Reads the section header table from a file.
Read the section name table

The section name table is a special section for storing the names of all sections. In the section header table, the section name table is found by indexing (shstrtab) and read its contents.
Finding symbol and string tables

Iterate through all section header entries, looking for.symtab(symbol table) and.strtab(String tables) have section headers for storing symbols and symbol names, respectively. (If one wishes, one can secondarily confirm that we have found the correct table based on the section type information provided by the section header entries.)

If the two sections are not found, an error message is output and the program is terminated.

Reading symbol and string tables

according to.symtab Offset and size in the section header, read the symbol table.

according to.strtab Offset and size in the section header to read the symbol name string table.
Adding function symbols to the symbol table

Iterate through each symbol in the symbol table (Elf32_Sym), for each symbol, check if its type isSTT_FUNC(i.e., function type). If it is a function symbol, callingadd_symbol Adds the name, address, and size of the symbol to the symbol table in memory.

Realization of FTRACE

defineconst char *find_symbol(uint64_t addr)function finds the function corresponding to the memory address from the saved mapping table
commands involving function jumps.jalis a direct jump to the target address for a function call, and his jump address is an immediate number, so it is a specific offset that is already known at compile time, and is therefore a direct call.jalrThe target address of a function is provided by a register, which is an offset that can only be determined at runtime, and is therefore an indirect call. Also, since we don't know at compile time which address each function returns to when it returns, we need to store the current pc in the return address register before each call.raIn this case, the use ofjalr ra, 0(ra)Realize the return.
So we know.jalis called directly.jalrWith an incoming immediate number of 0 and a call to the return address registerrawhen it is returning a function, and in other cases when it is calling it indirectly.

How to implement DiffTest

sensingref_rIs the register information in the global variablecpuThe same in the pc register can be forgotten.

Input-output problem solving

The relationship between AM and NEMU is that NEMU emulates a series of peripheral interfaces by calling a library of C functions, so we need to go read the$NEMU_HOME/src/devicedown the file to see how NEMU does MMIO so we can go in AM and read and write the correct information for NEMU to process.

What is volatile used for?

For example, suppose thatp A register pointing to a hardware device. If you don't use thevolatile modifier, the compiler may assume that the value of this register will not be changed externally while the program is running, so it will put the value of the register that reads thep operation is optimized to read only once, caching the value in a register and not re-reading it from that address as often. This is a reasonable optimization for normal variables, but for device registers this optimization can cause problems. The reason for this is that the value of the device register may change at any time (e.g., by updating the state via an external hardware device), and the value cached by the compiler is no longer up-to-date, causing the program to read the device state incorrectly, resulting in erroneous behavior.

How to realize the clock

Read about it in NEMUsource code, you can see that the NEMU simulation is implemented by calling the<>Gets the current time, stored to thertc_port_basein the 16 bytes of the Also through which thertc_io_handlerYou can see that if we read the high byte, NEMU gets the latest time. So when we implement theAlways read the high byte first, in order to refresh the device cache and get the latest time.

How to implement malloc

Pay attention to the address of a good application to be aligned, here because my computer is 64-bit I set the 8-byte alignment, that is, the size of each application space must be aligned to an integer multiple of 8.

How to implement DTRACE

From reading the documentation, we know that NEMU has been able to get a lot of information through themap_readcap (a poem)map_writePerform MMIO operations, so just log in these two functions.

How to implement the keyboard

Read about it in NEMUin the filesend_keyfunction can see that the message for each keystroke of the keyboard is to set thekeymapThe key information and mask in the AM is recorded by a per-bit or. We parse out these two pieces of information in AM, write them to thekbdReady to go.

How to realize VGA

Read about it in NEMUYou can see that the length and width of the display are in thevgactl_port_baseThe order in which they are stored in the In AM we need to extract this information and record it in config, because by reading theam-testin the test program, it can be found that the program needs to get this information.

__am_gpu_fbdrawIt is necessary to include the providedctlThe printed information in the VGA is correctly written to the VGA's registers, and here is the simple mapping of the two-dimensional arrays that we need in thex, yas the upper-left corner endpoint of the rectangle, write aw*hof the rectangular image information. If the control information is required to request synchronization at this point, we must also write this information to the VGA's registers in a timely manner.

How to implement a sound card

In NEMU since we useSDLlibrary to emulate the sound card, every time we call the sound card information, we also need to initialize a new SDL with the information we have just used (because the configuration information may change) and set up his callback function. the SDL callback function defines how we handle the audio information provided in the registers, and since we have a limited buffer, we use a circular queue to write data into the buffer. So we need to read the data from the buffer and write it to the stream correctly.

In AM's__am_audio_playin which we need to properly maintain the circular queue buffer to which we write data.

source code (computing)

/*  Stick some representative，incomplete paste */ 
INSTPAT_START();
  INSTPAT("0000000 ????? ????? 000 ????? 01100 11", add    , R, R(rd) = (int32_t)((int32_t)src1 + (int32_t)src2));
  INSTPAT("0000001 ????? ????? 000 ????? 01100 11", mul    , R, R(rd) = (int32_t)((int64_t)(int32_t)src1 * (int64_t)(int32_t)src2));
  INSTPAT("0000001 ????? ????? 001 ????? 01100 11", mulh   , R, R(rd) = (int32_t)(((int64_t)(int32_t)src1 * (int64_t)(int32_t)src2) >> 32));
  INSTPAT("0000001 ????? ????? 010 ????? 01100 11", mulhsu , R, R(rd) = ((int64_t)(int32_t)src1 * (uint64_t)(uint32_t)src2) >> 32);
  INSTPAT("0000001 ????? ????? 011 ????? 01100 11", mulhu  , R, R(rd) = (uint32_t)(((uint64_t)(uint32_t)src1 * (uint64_t)(uint32_t)src2) >> 32));
  INSTPAT("0100000 ????? ????? 000 ????? 01100 11", sub    , R, R(rd) = src1 - src2);
  INSTPAT("0000000 ????? ????? 010 ????? 01100 11", slt    , R, R(rd) = (int32_t)src1 < (int32_t)src2);
  INSTPAT("0000000 ????? ????? 011 ????? 01100 11", sltu   , R, R(rd) = (uint32_t)src1 < (uint32_t)src2);
  INSTPAT("0000000 ????? ????? 100 ????? 01100 11", xor    , R, R(rd) = src1 ^ src2);
  INSTPAT("0000001 ????? ????? 100 ????? 01100 11", div    , R, R(rd) = src2 ? (src1 == (1 << 31) && src2 == -1) ? (1 << 31) : (int32_t)((int32_t)src1 / (int32_t)src2) : -1);
  INSTPAT("0000001 ????? ????? 101 ????? 01100 11", divu   , R, R(rd) = src2 ? (uint32_t)((uint32_t)src1 / (uint32_t)src2) : (uint32_t)((1ll << 32) - 1));
  INSTPAT("0000000 ????? ????? 001 ????? 01100 11", sll    , R, R(rd) = (uint32_t)src1 << BITS(src2, 4, 0));
  INSTPAT("0000000 ????? ????? 101 ????? 01100 11", srl    , R, R(rd) = (uint32_t)src1 >> BITS(src2, 4, 0));
  INSTPAT("0100000 ????? ????? 101 ????? 01100 11", sra    , R, R(rd) = (int32_t)src1 >> BITS(src2, 4, 0));
  INSTPAT("0000000 ????? ????? 110 ????? 01100 11", or     , R, R(rd) = src1 | src2);
  INSTPAT("0000001 ????? ????? 110 ????? 01100 11", rem    , R, R(rd) = src2 ? (src1 == (1 << 31) && src2 == -1) ? 0 : (int32_t)src1 % (int32_t)src2 : src1);
  INSTPAT("0000001 ????? ????? 111 ????? 01100 11", remu   , R, R(rd) = src2 ? (uint32_t)src1 % (uint32_t)src2 : src1);
  INSTPAT("0000000 ????? ????? 111 ????? 01100 11", and    , R, R(rd) = src1 & src2);

  INSTPAT("??????? ????? ????? 000 ????? 00000 11", lb     , I, R(rd) = SEXT(Mr((uint32_t)((int32_t)src1 + (int32_t)imm), 1), 8));
  INSTPAT("??????? ????? ????? 001 ????? 00000 11", lh     , I, R(rd) = SEXT(Mr((uint32_t)((int32_t)src1 + (int32_t)imm), 2), 16));
  INSTPAT("??????? ????? ????? 010 ????? 00000 11", lw     , I, R(rd) = SEXT(Mr((uint32_t)((int32_t)src1 + (int32_t)imm), 4), 32));
  INSTPAT("??????? ????? ????? 100 ????? 00000 11", lbu    , I, R(rd) = Mr((uint32_t)((int32_t)src1 + (int32_t)imm), 1));
  INSTPAT("??????? ????? ????? 101 ????? 00000 11", lhu    , I, R(rd) = Mr((uint32_t)((int32_t)src1 + (int32_t)imm), 2));
  INSTPAT("??????? ????? ????? 000 ????? 00100 11", addi   , I, R(rd) = (int32_t)((int32_t)src1 + (int32_t)imm));  // mv
  INSTPAT("??????? ????? ????? 010 ????? 00100 11", slti   , I, R(rd) = (int32_t)src1 < (int32_t)imm);
  INSTPAT("??????? ????? ????? 011 ????? 00100 11", sltiu  , I, R(rd) = (uint32_t)src1 < (uint32_t)imm);
  INSTPAT("??????? ????? ????? 100 ????? 00100 11", xori   , I, R(rd) = src1 ^ imm);
  INSTPAT("000000? ????? ????? 001 ????? 00100 11", slli   , I, R(rd) = (uint32_t)src1 << BITS(imm, 4, 0));
  INSTPAT("000000? ????? ????? 101 ????? 00100 11", srli   , I, R(rd) = (uint32_t)src1 >> BITS(imm, 4, 0));
  INSTPAT("010000? ????? ????? 101 ????? 00100 11", srai   , I, R(rd) = (int32_t)src1 >> BITS(imm, 4, 0));
  INSTPAT("??????? ????? ????? 110 ????? 00100 11", ori    , I, R(rd) = src1 | imm);
  INSTPAT("??????? ????? ????? 111 ????? 00100 11", andi   , I, R(rd) = src1 & imm);
  // INSTPAT("??????? ????? ????? 000 ????? 00011 11", fence  , I, R(rd) = src1 & imm);
  INSTPAT("??????? ????? ????? 000 ????? 11001 11", jalr   , I, R(rd) = s->snpc; s->dnpc = (uint32_t)((int32_t)src1 + (int32_t)imm) & -2ull; check_jalr(rd, imm, s)); // ret

  INSTPAT("??????? ????? ????? 000 ????? 01000 11", sb     , S, Mw((uint32_t)((int32_t)src1 + (int32_t)imm), 1, src2));
  INSTPAT("??????? ????? ????? 001 ????? 01000 11", sh     , S, Mw((uint32_t)((int32_t)src1 + (int32_t)imm), 2, src2));
  INSTPAT("??????? ????? ????? 010 ????? 01000 11", sw     , S, Mw((uint32_t)((int32_t)src1 + (int32_t)imm), 4, src2));

  INSTPAT("??????? ????? ????? 000 ????? 11000 11", beq    , B, s->dnpc = src1 == src2 ? (uint32_t)((int32_t)s->pc + (int32_t)imm) : s->dnpc);
  INSTPAT("??????? ????? ????? 001 ????? 11000 11", bne    , B, s->dnpc = src1 != src2 ? (uint32_t)((int32_t)s->pc + (int32_t)imm) : s->dnpc);
  INSTPAT("??????? ????? ????? 100 ????? 11000 11", blt    , B, s->dnpc = (int32_t)src1 < (int32_t)src2 ? (uint32_t)((int32_t)s->pc + (int32_t)imm) : s->dnpc);
  INSTPAT("??????? ????? ????? 101 ????? 11000 11", bge    , B, s->dnpc = (int32_t)src1 >= (int32_t)src2 ? (uint32_t)((int32_t)s->pc + (int32_t)imm) : s->dnpc);
  INSTPAT("??????? ????? ????? 110 ????? 11000 11", bltu   , B, s->dnpc = (uint32_t)src1 < (uint32_t)src2 ? (uint32_t)((int32_t)s->pc + (int32_t)imm) : s->dnpc);
  INSTPAT("??????? ????? ????? 111 ????? 11000 11", bgeu   , B, s->dnpc = (uint32_t)src1 >= (uint32_t)src2 ? (uint32_t)((int32_t)s->pc + (int32_t)imm) : s->dnpc);

  INSTPAT("??????? ????? ????? ??? ????? 00101 11", auipc  , U, R(rd) = s->pc + imm);
  INSTPAT("??????? ????? ????? ??? ????? 01101 11", lui    , U, R(rd) = imm);

  INSTPAT("??????? ????? ????? ??? ????? 11011 11", jal    , J, R(rd) = s->snpc; s->dnpc = s->pc + imm; check_jal(s));


  INSTPAT("0000000 00001 00000 000 00000 11100 11", ebreak , N, NEMUTRAP(s->pc, R(10))); // R(10) is $a0
  INSTPAT("??????? ????? ????? ??? ????? ????? ??", inv    , N, INV(s->pc));

void check_jal(Decode *s) {
  if (elf_loaded) {
    const char *func_name = find_symbol(s->dnpc);
    if (func_name != NULL) {
        call_depth++;
        log_write("F\t%#010x\t%*sCall to %s @ %#010x\n", s->pc, call_depth * 2, "", func_name, s->dnpc);
    }
  }
}

void check_jalr(word_t rd, word_t imm, Decode *s) {
  uint32_t i = s->;
  int rs1 = BITS(i, 19, 15);
  if (elf_loaded) {
    if (rd == 0 && rs1 == 1 && imm == 0) {
        // Function return
        const char *func_name = find_symbol(s->pc);
        if (func_name != NULL) {
            log_write("F\t%#010x\t%*sReturn from %s\n", s->pc, call_depth * 2, "", func_name);
        }
        if (call_depth > 0) call_depth--;
    }
    else {
        // Indirect function call
        const char *func_name = find_symbol(s->dnpc);
        if (func_name != NULL) {
            call_depth++;
            log_write("F\t%#010x\t%*sIndirect call to %s @ %#010x\n", s->pc, call_depth * 2, "", func_name, s->dnpc);
        }
    }
  }
}

/*  Stick some representative，incomplete paste */ 

int vsprintf(char *out, const char *fmt, va_list ap) {
    char* buf = out;
    int total_len = 0;

    while (*fmt != '\0') {
        if (*fmt != '%') {
            *buf++ = *fmt++;
            total_len++;
        } else {
            fmt++;
            // Parse flags
            int flags = 0;
            int width = 0;
            int precision = -1;
            // int length = 0;
            int specifier = 0;

            while (*fmt == '-' || *fmt == '+' || *fmt == ' ' || *fmt == '#' || *fmt == '0') {
                switch (*fmt) {
                    case '#': flags |= 0x02; break; // Hash
                    case '+': flags |= 0x04; break; // Plus sign
                    case ' ': flags |= 0x08; break; // Space
                    case '-': flags |= 0x10; break; // Left align
                    case '0': flags |= 0x20; break; // Zero padding
                }
                fmt++;
            }

            // Parse width
            if (*fmt == '*') {
                width = va_arg(ap, int);
                fmt++;
            } else {
                while (*fmt >= '0' && *fmt <= '9') {
                    width = width * 10 + (*fmt++ - '0');
                }
            }

            // Parse precision
            if (*fmt == '.') {
                fmt++;
                precision = 0;
                if (*fmt == '*') {
                    precision = va_arg(ap, int);
                    fmt++;
                } else {
                    while (*fmt >= '0' && *fmt <= '9') {
                        precision = precision * 10 + (*fmt++ - '0');
                    }
                }
            }

            // // Parse length modifier
            // if (*fmt == 'h' || *fmt == 'l') {
            //     length = *fmt++;
            // }

            // Parse specifier
            specifier = *fmt++;

            // Handle specifier
            if (specifier == 'd' || specifier == 'i') {
                int value = va_arg(ap, int);
                char temp[65];
                int len = int_to_str(value, temp, 10, 0, width, precision > 0 ? precision : 0, flags);
                for (int i = 0; i < len; i++) {
                    *buf++ = temp[i];
                }
                total_len += len;
            } else if (specifier == 'o') {
                unsigned int value = va_arg(ap, unsigned int);
                char temp[65];
                int len = int_to_str(value, temp, 8, 1, width, precision > 0 ? precision : 0, flags);
                for (int i = 0; i < len; i++) {
                    *buf++ = temp[i];
                }
                total_len += len;
            } else if (specifier == 'u') {
                unsigned int value = va_arg(ap, unsigned int);
                char temp[65];
                int len = int_to_str(value, temp, 10, 1, width, precision > 0 ? precision : 0, flags);
                for (int i = 0; i < len; i++) {
                    *buf++ = temp[i];
                }
                total_len += len;
            } else if (specifier == 'x' || specifier == 'X') {
                unsigned int value = va_arg(ap, unsigned int);
                char temp[65];
                if (specifier == 'X') flags |= 0x01; // Uppercase
                int len = int_to_str(value, temp, 16, 1, width, precision > 0 ? precision : 0, flags);
                for (int i = 0; i < len; i++) {
                    *buf++ = temp[i];
                }
                total_len += len;
            } else if (specifier == 'c') {
                char c = (char)va_arg(ap, int);
                *buf++ = c;
                total_len++;
            } else if (specifier == 's') {
                char *str = va_arg(ap, char*);
                int len = precision >= 0 ? precision : strlen(str);
                if (width > len && !(flags & 0x10)) { // Right align
                    for (int i = 0; i < width - len; i++) {
                        *buf++ = ' ';
                        total_len++;
                    }
                }
                for (int i = 0; i < len && str[i] != '\0'; i++) {
                    *buf++ = str[i];
                    total_len++;
                }
                if (width > len && (flags & 0x10)) { // Left align
                    for (int i = 0; i < width - len; i++) {
                        *buf++ = ' ';
                        total_len++;
                    }
                }
            } else if (specifier == 'p') {
                void *ptr = va_arg(ap, void*);
                unsigned long value = (unsigned long)ptr;
                char temp[65];
                flags |= 0x02; // Force '0x'
                int len = int_to_str(value, temp, 16, 1, width, precision > 0 ? precision : 0, flags);
                for (int i = 0; i < len; i++) {
                    *buf++ = temp[i];
                }
                total_len += len;
            } else if (specifier == 'f' || specifier == 'F' || specifier == 'e' || specifier == 'E' || specifier == 'g' || specifier == 'G') {
                assert(0);
                // double value = va_arg(args, double);
                // char temp[320];
                // int len = double_to_str(value, temp, precision >= 0 ? precision : 6, specifier, flags, width);
                // for (int i = 0; i < len; i++) {
                //     /*buf++ = temp[i];
                // }
                // total_len += len;
            } else if (specifier == '%') {
                *buf++ = '%';
                total_len++;
            } else if (specifier == 'n') {
                int *ptr = va_arg(ap, int*);
                *ptr = total_len;
            } else {
                // Invalid specifier
                assert(0);
            }
        }
    }

    *buf = '\0';
    return buf - out;
}

int sprintf(char *out, const char *fmt, ...) {
    va_list args;
    va_start(args, fmt);
    int total_len = vsprintf(out, fmt, args);
    va_end(args);
    return total_len;
}

int snprintf(char *out, size_t n, const char *fmt, ...) {
  panic("Not implemented");
}

int vsnprintf(char *out, size_t n, const char *fmt, va_list ap) {
  panic("Not implemented");
}

static int int_to_str(int value, char *buffer, int base, int is_unsigned, int width, int precision, int flags) {
    char temp[65];
    int i = 0;
    unsigned int uvalue = value;
    int is_negative = 0;

    if (!is_unsigned && value < 0) {
        is_negative = 1;
        uvalue = -value;
    }

    if (uvalue == 0) {
        temp[i++] = '0';
    } else {
        while (uvalue != 0) {
            int digit = uvalue % base;
            if (digit < 10) {
                temp[i++] = digit + '0';
            } else {
                if (flags & 0x01) { // Uppercase for X
                    temp[i++] = digit - 10 + 'A';
                } else {
                    temp[i++] = digit - 10 + 'a';
                }
            }
            uvalue /= base;
        }
    }

    // Handle precision
    while (i < precision) {
        temp[i++] = '0';
    }

    // Add prefix for '#' flag
    if (flags & 0x02) {
        if (base == 8 && temp[i-1] != '0') {
            temp[i++] = '0';
        } else if (base == 16) {
            temp[i++] = flags & 0x01 ? 'X' : 'x';
            temp[i++] = '0';
        }
        else {
          assert(0);
        }
    }

    int len = i;
    // Handle zero-padding and width
    int padding = width - len - 1; // 1-width for the sign

    if (padding > 0 && (flags & 0x20) && !(flags & 0x10)) { // Right align
      while (padding-- > 0) {
          temp[i++] = '0';
      }
    }

    if (is_negative) {
        temp[i++] = '-';
    } else if (flags & 0x04) { // '+' flag
        temp[i++] = '+';
    } else if (flags & 0x08) { // ' ' flag
        temp[i++] = ' ';
    }

    if (padding > 0 && !(flags & 0x20) && !(flags & 0x10)) { // Right align
      while (padding-- > 0) {
          temp[i++] = ' ';
      }
    }

    reverse_str(temp, i);

    // Copy to buffer
    for (int j = 0; j < i; j++) {
        *buffer++ = temp[j];
    }

    // Right padding if left align
    if (padding > 0 && (flags & 0x10)) {
        while (padding-- > 0) {
            *buffer++ = ' ';
        }
    }

    return i;
}

/* Stick some representative，incomplete paste */

void record_inst(uint64_t pc, const char *asm_code) {
    inst_buf[inst_buf_pos].pc = pc;
    // Safely copy the assembly code to prevent buffer overflow
    strncpy(inst_buf[inst_buf_pos].asm_code, asm_code, sizeof(inst_buf[inst_buf_pos].asm_code) - 1);
    inst_buf[inst_buf_pos].asm_code[sizeof(inst_buf[inst_buf_pos].asm_code) - 1] = '\0'; // Ensure null-termination
    inst_buf_pos = (inst_buf_pos + 1) % INST_BUF_SIZE; // Update write position to create a ring buffer
    if (inst_buf_count < INST_BUF_SIZE) {
        inst_buf_count++; // Increment the count of recorded instructions
    }
}

/*  */ 

#include <>

typedef struct {
    char *name;
    uint64_t addr;
    uint64_t size;
} FunctionSymbol;

static FunctionSymbol *symbol_table = NULL;
static int symbol_count = 0;
static int symbol_capacity = 0;
bool elf_loaded = false;

void add_symbol(const char *name, uint64_t addr, uint64_t size) {
    if (symbol_count >= symbol_capacity) {
        symbol_capacity = symbol_capacity == 0 ? 1024 : symbol_capacity * 2;
        symbol_table = realloc(symbol_table, symbol_capacity * sizeof(FunctionSymbol));
    }
    symbol_table[symbol_count].name = strdup(name);
    symbol_table[symbol_count].addr = addr;
    symbol_table[symbol_count].size = size;
    symbol_count++;
}

const char *find_symbol(uint64_t addr) {
    if (!elf_loaded) return NULL;
    for (int i = 0; i < symbol_count; i++) {
        if (addr >= symbol_table[i].addr && addr < symbol_table[i].addr + symbol_table[i].size) {
            return symbol_table[i].name;
        }
    }
    return NULL;
}

void init_elf(const char *elf_file) {
    if (elf_file == NULL) {
        elf_loaded = false;
        return;
    }

    FILE *fp = fopen(elf_file, "rb");
    Assert(fp, "Cannot open '%s'", elf_file);

    elf_loaded = true;

    // Read ELF header
    Elf32_Ehdr ehdr;
    if (fread(&ehdr, sizeof(Elf32_Ehdr), 1, fp) != 1) {
        printf("Failed to read ELF header\n");
        exit(1);
    }

    // Verify ELF magic number
    if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) != 0) {
        printf("Not an ELF file\n");
        exit(1);
    }

    // Read section headers
    Elf32_Shdr *shdrs = malloc(ehdr.e_shentsize * ehdr.e_shnum);
    fseek(fp, ehdr.e_shoff, SEEK_SET);
    if (fread(shdrs, ehdr.e_shentsize, ehdr.e_shnum, fp) != ehdr.e_shnum) {
        printf("Failed to read section headers\n");
        exit(1);
    }

    // Read section header string table
    Elf32_Shdr shstr_shdr = shdrs[ehdr.e_shstrndx];
    char *shstrtab = malloc(shstr_shdr.sh_size);
    fseek(fp, shstr_shdr.sh_offset, SEEK_SET);
    if (fread(shstrtab, shstr_shdr.sh_size, 1, fp) != 1) {
        printf("Failed to read section header string table\n");
        exit(1);
    }

    // Find .symtab and .strtab sections
    Elf32_Shdr *symtab_shdr = NULL;
    Elf32_Shdr *strtab_shdr = NULL;
    for (int i = 0; i < ehdr.e_shnum; i++) {
        char *section_name = &shstrtab[shdrs[i].sh_name];
        if (shdrs[i].sh_type == SHT_SYMTAB && strcmp(section_name, ".symtab") == 0) {
            symtab_shdr = &shdrs[i];
        } else if (shdrs[i].sh_type == SHT_STRTAB && strcmp(section_name, ".strtab") == 0) {
            strtab_shdr = &shdrs[i];
        }
    }

    if (symtab_shdr == NULL || strtab_shdr == NULL) {
        printf("Failed to find .symtab or .strtab in ELF file\n");
        exit(1);
    }

    // Read symbol table
    int sym_count = symtab_shdr->sh_size / symtab_shdr->sh_entsize;
    Elf32_Sym *symtab = malloc(symtab_shdr->sh_size);
    fseek(fp, symtab_shdr->sh_offset, SEEK_SET);
    if (fread(symtab, symtab_shdr->sh_entsize, sym_count, fp) != sym_count) {
        printf("Failed to read symbol table\n");
        exit(1);
    }

    // Read string table
    char *strtab = malloc(strtab_shdr->sh_size);
    fseek(fp, strtab_shdr->sh_offset, SEEK_SET);
    if (fread(strtab, strtab_shdr->sh_size, 1, fp) != 1) {
        printf("Failed to read string table\n");
        exit(1);
    }

    // Store function symbols
    for (int i = 0; i < sym_count; i++) {
        Elf32_Sym sym = symtab[i];
        char *name = &strtab[sym.st_name];
        if (ELF32_ST_TYPE(sym.st_info) == STT_FUNC) {
            add_symbol(name, sym.st_value, sym.st_size);
        }
    }

    // Free allocated memory
    free(shdrs);
    free(shstrtab);
    free(symtab);
    free(strtab);
    fclose(fp);
}

/* Stick some representative，incomplete paste */

void __am_gpu_config(AM_GPU_CONFIG_T *cfg) {
  uint32_t data = inl(VGACTL_ADDR);
  int width = (data >> 16) & 0xffff;
  int height = data & 0xffff;
  int vmemsz = width * height * sizeof(uint32_t);
  *cfg = (AM_GPU_CONFIG_T) {
    .present = true, .has_accel = false,
    .width = width, .height = height,
    .vmemsz = vmemsz
  };
}

void __am_gpu_fbdraw(AM_GPU_FBDRAW_T *ctl) {
  int x = ctl->x, y = ctl->y, w = ctl->w, h = ctl->h;
  int width = inl(VGACTL_ADDR) >> 16, height = inl(VGACTL_ADDR) & 0xffff;

  uint32_t *pixels = ctl->pixels;
  uint32_t *fb = (uint32_t *)(uintptr_t)FB_ADDR;

  for (int i = y; i < y + h && i < height; i++) {
    for (int j = x; j < x + w && j < width; j++) {
      fb[i * width + j] = pixels[(i - y) * w + (j - x)];
    }
  }

  if (ctl->sync) {
    outl(SYNC_ADDR, 1);
  }
}

/* Paste some representative, not all */

void __am_audio_play(AM_AUDIO_PLAY_T *ctl) {
  uint8_t *audio_data = (ctl->buf).start;
  uint32_t sbuf_size = inl(AUDIO_SBUF_SIZE_ADDR);
  uint32_t len = (ctl->buf).end - (ctl->buf).start;

  uint8_t *ab = (uint8_t *)(uintptr_t)AUDIO_SBUF_ADDR;
  for(int i = 0; i < len; i++){
    ab[sbuf_pos] = audio_data[i].
    sbuf_pos = (sbuf_pos + 1) % sbuf_size;
  }
  outl(AUDIO_COUNT_ADDR, inl(AUDIO_COUNT_ADDR) + len);
}

/* am/ Paste some representative, not all */

void __am_audio_play(AM_AUDIO_PLAY_T *ctl) {
  uint8_t *audio_data = (ctl->buf).start;
  uint32_t sbuf_size = inl(AUDIO_SBUF_SIZE_ADDR);
  uint32_t len = (ctl->buf).end - (ctl->buf).start;

  uint8_t *ab = (uint8_t *)(uintptr_t)AUDIO_SBUF_ADDR;
  for(int i = 0; i < len; i++){
    ab[sbuf_pos] = audio_data[i].
    sbuf_pos = (sbuf_pos + 1) % sbuf_size;
  }
  outl(AUDIO_COUNT_ADDR, inl(AUDIO_COUNT_ADDR) + len);
}

/* nemu/ Stick some representative，incomplete paste */

void init_sound();

static void audio_io_handler(uint32_t offset, int len, bool is_write) {
  if(audio_base[reg_init] == 1){
    init_sound();
    audio_base[reg_init] = 0;
  }
}

void sdl_audio_callback(void *userdata, uint8_t *stream, int len){
  SDL_memset(stream, 0, len);
  uint32_t used_cnt = audio_base[reg_count];
  len = len > used_cnt ? used_cnt : len;
  
  uint32_t sbuf_size = audio_base[reg_sbuf_size];
  if( (sbuf_pos + len) > sbuf_size ){
    SDL_MixAudio(stream, sbuf + sbuf_pos, sbuf_size - sbuf_pos , SDL_MIX_MAXVOLUME);
    SDL_MixAudio(stream + (sbuf_size - sbuf_pos), sbuf, len - (sbuf_size - sbuf_pos), SDL_MIX_MAXVOLUME);
  }
  else
    SDL_MixAudio(stream, sbuf + sbuf_pos, len , SDL_MIX_MAXVOLUME);
  sbuf_pos = (sbuf_pos + len) % sbuf_size;
  audio_base[reg_count] -= len;
}

void init_sound() {
  SDL_AudioSpec s = {};
   = AUDIO_S16SYS;
   = NULL;
   = audio_base[reg_freq];
   = audio_base[reg_channels];
   = audio_base[reg_samples];
   = sdl_audio_callback;
  SDL_InitSubSystem(SDL_INIT_AUDIO);
  SDL_OpenAudio(&s, NULL);
  SDL_PauseAudio(0);
}