Location>code7788 >text

Recently worthy of the AI ​​technology report and Agent review!

Popularity:623 ℃/2025-01-30 21:10:23

As the title, the recent outstanding models have emerged endlessly. As a technician, you need to read high -quality AI technology reports or papers and master future application trends. This article will recommend some high -quality AI technology reports and Agent Smart review.

 

Large model technology report

DeepSeek-V3 Technical Report

author:DeepSeek

time:2024.12.27

Summary:It mainly introduces the DeepSeek-V3 model. This is an expert mixed (MOE) language model with 671 billion parameters, and each token activates 37 billion parameters. Through the coordinated design of algorithms, frameworks and hardware, this model overcomes the communication bottlenecks in cross-node MOE training, realizes the complete computing-communication overlap, significantly improves training efficiency and reduces training costs. In the case of spending only 2.664 million H800 GPU hours, Deepseek-V3 completed the pre-training of 14.8 trillion token, becoming the strongest open source basic model at present. In addition, this model also introduces innovative methods to extract reasoning capabilities from the DeepSeek-R1 series model, and has performed well in multiple benchmark tests such as knowledge, code, mathematics, and reasoning. The performance is equivalent to the leading source model.

Link:/pdf/2412.19437

 

DeepSeek_R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

author:DeepSeek

time:2025.01.23

Summary:R1 is the in -depth model model of the entire network in the near future. The article introduced the first-generation reasoning model DeepSeek-R1-Zero and Deepseek-R1 developed by the DeepSeek-AI team through strengthening learning (RL). Among them Sexual issues, and DeepSeek-R1 further improved the reasoning performance of the reasoning by introducing cold startup data and multi-stage training, reaching the same level as Openai-O1-1217; the article also shows the distillation technology to migrate the reasoning ability to the small model through the distillation technology to the small model through the distillation technology to the small model. Successful practice has significantly improved the reasoning performance of small models, and open source multiple models for research communities. At the same time, the advantages and disadvantages of distillation and RL are discussed. Task performance.

Link:/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

 

DeepSeek MoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

author:DeepSeek

time:2024.01.11

Summary:The design of the DeepSeek-Moe model is introduced in detail, and the Fine-Grained Expert Segmentation and Shared Expert Isolation technology are proposed to solve the problem of redundant and generalization of traditional MOE model experts. Only with 13B activation parameters to reach the same performance as the LLAMA2 70B, the training cost is reduced by 80%.

Link:/pdf/2401.06066

 

Kimi k1.5

author:Moonshot

time:2025.01.22

Summary:Kimi, as always, thinks that long text is the core. Among them, Kimi K1.5 is a multi -mode large -scale language model (LLM) training through strengthening learning (RL) training. Kimi K1.5 has achieved the most advanced reasoning performance in multiple benchmark tests through extended context windows and improved strategic optimization methods, which is equivalent to OPENAI's O1 model. In addition, the article also proposes the Long2short method to improve the performance of the short -chain reasoning model through the long chain push (COT) technology, which has achieved significant performance improvement. These methods not only improve the derived ability of the model, but also enhance their performance in multi -mode tasks.

Link:/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.

 

Extending Context Window of Large Language Models via Semantic Compression

author:Department of Mathematical Sciences, Tsinghua University, Theory Lab, 2012 Labs, Huawei Technologies

time:2023.12.15

Summary:This article proposes a novel semantic compression method to expand the context window of the large-scale language model (LLMS), so that it can handle text with 6-8 times longer than the original model without the need to fine-tune or fine-tune the pre-training model or Increase calculation costs. This method uses the source coding concept in the information theory to reduce the semantic redundancy of the long input before passing the input to LLMS. The experimental results show that this method effectively expands the context window of LLMS in many tasks including Q&A, Summary, Little Sample Learning, and Information Retrieval, and reduce computing expenses while maintaining the smoothness of text.

Link:/pdf/2312.09571

 

Agent review

Agent AI: Surveying the Horizons of Multimodal Interaction

author:Li Feifei team of Stanford University

time:2024.01.25

Summary of content: This 80 -page review systematically summarizes the development of multi -mode AI smart parties, discusses its application in interactive and transient tasks, and how to combine large language models (LLM) and visual language. Model (VLM) builds more complex smart body systems. The paper also proposes the concept of "unlimited agent" to support multi -mode generation and editing across physical and virtual environments.

Link:/pdf/2401.03568

 

Google Whiterpaper Agents2

author:Google

time:2024.09

Summary:AGENTS white paper produced by Google. The core architecture of the AI ​​agent is introduced in detail, including the model layer, the tool layer, and the orchestration layer, and discussed the differences, learning ability, practical application, future development of traditional language models, and future development. It aims to promote the extensive application of AI agents in various fields.

Link:/file/d/1oEjiRCTbd54aSdB_eEe3UShxLBWK9xkt/view

Reference implementation:/alibaba/spring-ai-alibaba/