Hand Rub Big Model Task01: LLama3 Model Explanation

preamble

The main Qwen model architecture is explained.

Overall Introduction

The overall architecture of Qwen is similar to that of Llama2, as shown below:

The tokenizer converts the text to a value inside the word list.
The values are embedding to get one-to-one corresponding vectors.
Attention_mask is used to see left, right, both directions, etc. to set.
Various downstream tasks, Casual, seqcls, etc., are basically the base model model followed by the corresponding Linear layer, as well as different loss functions.

2. Learning records

In this course, I have deeply studied the principles of Transformer and Qwen2, and mastered their code implementation process through practice. Through careful reading of the source codes, I realized the connection between the position encoding (PE) in Transformer and the relative position encoding (RoPE) in Qwen2 and their unique features. This learning experience has greatly enriched my knowledge base and improved my technical understanding.