Thanks to the nightingale to provide such a high-quality platform to do face-to-face exchanges with the industry's top technology gurus, and learned a lot of interesting and in-depth content in this meeting, which provides me with some new guiding direction on the road of exploration in the future. At the same time, I would like to thank the nightingale community for inviting me to do another communication article on AI, since I am also an explorer on the AI track, I hope you will be lenient if there are unprofessional places, and I hope to get to know more peers, and do some more advanced and interesting things on the AI track together.
At the conference site, I shared some practical effects of Zenlayer in the direction of AI, some basics, selection thinking, etc., did not unfold at the conference site, and here I will make some additions in this article, hoping to bring you some inspiration. More is to provide a kind of thinking and need to understand the logic of the implementation of the background, rather than give a fixed way of implementation, but also hope to be able to have more active thinking.
Building the Body of Knowledge Needed for AIization
Semantic Kernel
Semantic Kernel is an open source framework from Microsoft designed to help developers build and deploy AI applications, especially those that need to understand and generate natural language. It provides a structured way to define and manage Skills, which can be simple function calls or complex AI model interactions.
core component
- Kernel: the core of the Semantic Kernel, responsible for skill management and execution.
- Skills: defines a set of operations that the application can perform, either as local functions or as remote service calls.
- Prompt Templates: Templates for generating and modifying natural language with support for variables and function calls.
- Memory: provides the ability to store and retrieve application state, either as simple key-value pairs or as complex graph databases.
LangChain
LangChain is an open source framework focused on building applications that utilize Large Language Models (LLMs) to perform a variety of tasks such as answering questions, generating text, and executing code. It provides a flexible way to combine and invoke different LLMs, as well as manage interactions with these models.
core component
- Chains: defines the logical flow of model invocations, which can be simple single-step invocations or complex multi-step processes.
- Prompts: Templates used to guide the model in generating specific types of output.
- Memory: provides the ability to store and retrieve application state that can be used for contextual understanding and history.
- Agents: Entities that can automate tasks, based on given goals and constraints.
summarize
Semantic Kernel and LangChain are both designed to simplify the development of AI applications, but they have different focuses.Semantic Kernel focuses more on the definition and management of skills, while LangChain focuses more on the combination and invocation of large-scale language models. Which framework to choose depends on the specific application scenario and requirements.
In our scenario we are more considering the use of semantic kernel way to build, not that langchain is not good, just langchain's code side of the abstraction is too powerful, the architecture itself is also relatively heavy, for the later development of the operation and maintenance and iterative cost is relatively high, we are now the volume is too small, I feel that their own play is not too move.
Application Architecture for Large Models
Typical Business Architecture
technological infrastructure
Pure Prompt
It's like having a conversation with someone, you say one thing, TA says something back, you say something else, TA says something back again
agent + FC (Function calling)
- Agent: AI Unsolicited Requests
- Function Calling: The AI requests the execution of a function.
Scene example: you ask where to go to play on New Year's Eve, ta first ask you how many days off you have.
RAG(Baseline) = Embeddings + Vector Database
- Embeddings: converts text into an encoding that makes it easier to compute similarity. This encoding is called vector
- Vector database: store vectors for easy searching
- Vector search: find the most similar vectors based on the input vectors
- Scenario example: when taking a test, you see a question, go to the book to find the relevant content, and then combine the questions to form the answer. Then, forget it all.
Currently we also use a rerank model to reorder the results of the RAG to get more accurate answers
Fine-Tuning
Study hard for the exam, memorize it for a long time, and live to learn it
At present, the traditional FT for in the operation and maintenance system, especially the abstract object training can not reach a good effect, so we are also trying to do based on DeepKe's abstract way of doing the operation and maintenance system in the data, the text to do FT, to see if it is not able to abstract the direct relationship between the object can be understood clearly
Prompt's Engineering: Improving LLM Understanding and Responsiveness
Prompt Design Principles
Why to say Prompt, in fact, there is a structure, but how to make LLM understand the basis of your reasoning, then you need to Prompt prompt project to solve, different LLM chat_template template is also completely different, which will lead to different models you use the same kind of Prompt can not get the same answer, or even the same model many times over! The same question will also have differences in the phenomenon.
In terms of my personal practice, I summarize the main principles as follows:
- Write clear instructions
- Provide reference text
- Split complex tasks into simpler subtasks
- Give the model time to "think"
- Use external tools
- Test changes systematically
Specific ways of realizing
1. Give details
Provide as much detailed information and context as is important, and, frankly, make it clearer, not one too general. For example, instead of saying, "Summarize the minutes," say, "Summarize the minutes in one paragraph. Then write a Markdown list of the speakers and each of their key points. Finally, list the speaker's suggested next steps or action items, if any."
2. Let the model act as a role
You can imagine the big model as an actor who is more professional and clearer when you have to tell him what role to put him in, for one thing. For example: acting as a comedian who likes to tell jokes, whenever I ask for help in writing something, you will reply with a document in which every paragraph contains at least one joke or funny comment.
3. Use separators to clearly indicate different parts of the input
Delimiters such as triple quotes, XML tags, section headings, etc. can help delineate sections of text to be treated differently. It can help the larger model understand the text content better. My favorite way to frame the content is with """. For example: summarize text separated by triple quotes in 50 characters. """Insert text here""""
4. Specify the steps required to complete the task
Some tasks can be broken down as much as possible, and are best specified as a series of steps. Explicitly writing out these steps can make it easier for the model to implement them. For example: use the following step-by-step instructions for responding to user input. Step 1 - The user will provide you with the text in three quotes. Summarize this text in a sentence and prefix it with "Summary:". Step 2 - Translate the summary from Step 1 into Spanish and add the prefix "Translation:".
5. Provide examples
It's also the classic less-sample prompt, the FEW-SHOT prompt, which throws the big model examples first and lets the big model output according to your examples. For example: write XX articles in the style of this sentence:""""The falling sun and the lonely ducks fly together, the autumn water shares the color of the long sky. The fishing boat sings in the evening, ringing poorly on the shore of Peng Li.""""
6. Specify the length of the output
The model can be asked to generate output of a given target length. The target output length can be specified based on the count of words, sentences, paragraphs, bullet points, etc. The target output length can be specified by the model. The effect of Chinese is not obvious, while the length you give is just an approximation, how many words such as this will certainly be imprecise, but like how many paragraphs this effect is better. For example: two paragraphs, 100 characters summarize the text separated by three quotation marks. """Insert text here""""
Cue frame application
Is not to follow a set of ways can be all the way shuttles it, obviously not, for different tasks in the background in fact also need to use different cue word framework to do the realization of the specific task, as it relates to the specifics is too lengthy, I also give here directly what framework and the realization of the framework of logic
TAG Framework
- Task: Describe the specific task you are requesting to be completed.
- Action: A detailed description of the action to be taken.
- Goal: Define the ultimate goal you are pursuing.
SPAR framework
- Context (Scenario): sketches out the background blueprint.
- Problem: Explain the difficulty faced.
- Action: Detailed description of the strategy to be implemented.
- Result: depicts the desired outcome.
TRACE Framework
- Task: Identify and clarify specific tasks.
- Request: expresses the specifics of the desired request.
- Action: describes the action that must be implemented.
- Background (Context): Provide relevant background or context.
- Example: Use examples to illustrate your insights.
SCOPE Framework
- Scenario: Describes the current situation or scenario.
- Complications: Discuss any potential complications.
- Objective: describes the desired goal.
- Plan: describes the strategies needed to achieve the goal.
- Evaluation (Evaluation): tells how success criteria are evaluated.
APE Framework
- Action: Describe the specific elements of the work accomplished.
- Purpose: Explain the intention or goal behind the action.
- Expectation: articulates the desired outcome or criteria for success.
SAGE Framework
- Situation: describes the background or current situation.
- Action: Detailed description of the action to be performed.
- Goal: Make it clear where the goal is.
- Expectation: State what you expect to achieve.
RTF framework
- Role: Defines the role position of LLM.
- Task: details the content of a particular task.
- Format: describes the form you expect the answer to take.
ROSES model
- Role: defines the role played by the GPT.
- Objective: Make your intentions clear.
- Scenario: Describes a specific situation and environment.
- Solution: Set the desired result.
- Steps: Counseling specific steps to solve a problem.
CARE Framework
- Background (Context): defines the scene or contextual environment of the discussion.
- Action: Describe the action that is expected to be accomplished.
- Result: The desired outcome is clarified.
- Example: Provide an example to illustrate your point.
Above different tips framework for specific practical application scenarios need to be flexible to achieve, the world does not have a trick of martial arts, to use a good big model to enhance the power, the underlying logic of the implementation and understanding of the framework is essential, otherwise LLM is just a chat tool, and can not bring qualitative improvement for your work!
Making Logical Reasoning Understandable for LLM: From CoT to ReAct
The above several KeyPoint explains the main techniques or ways to realize the application in LLM, but really to make LLM exist as an AGENT or Copilot, there also needs to be a key point, that is, how to let LLM know your reasoning, in fact, LLM solves just the problem of technology gaps, but it can't solve the source of the problem raised, so in fact in the LLM today, for everyone who has an idea and clear logic, with LLM may really soar, if you can ask good questions, then you can get a good answer.
So what are the specifics of reasoning architectures, I'll just mention here some of the relatively more used ones, especially in O&M operations scenarios where it's easier to get off the ground.
CoT(chain-of-thought prompting)thought chain
Prompts enable complex reasoning capabilities through intermediate reasoning steps. You can combine this with sample less hints to get better results for more complex tasks where reasoning is performed before answering. For solving specific landed problems such as data, you can significantly improve the reasoning aspect of large models.
Unlike traditional Prompts that map directly from input to output<input——>output>
way, the CoT accomplishes the mapping from input to chain of thought to output, i.e., the<input——>reasoning chain——>output>
。
For example, if the question is "What is the distance from New York to Los Angeles?" , the model might first retrieve the coordinates of New York and Los Angeles, then calculate the distance between the two points, and finally provide the answer. In this process, the model not only provides the answer, but also demonstrates its reasoning process, enhancing the credibility of the answer.
Auto-CoT Automated Chain of Thought
That is, the LLMs "Let's think step by step" prompts are utilized to generate inference chains one after the other. This automated process can still make errors in the generated chains. To mitigate the effects of errors, variety in presentation is important. This work proposes Auto-CoT, which samples problems with diversity and generates chains of reasoning to construct demonstrations.
Auto-CoT consists of two main phases:
- Stage 1: Problem clustering: dividing the given problem into several clusters
- Stage 2: Demonstration sampling: select a representative problem from each array and generate its inference chain using Zero-Shot-CoT with a simple heuristic
For example, if the question is "If one apple weighs 150 grams, what is the total weight of 10 apples?" , the Auto-COT model might generate a chain of thought that reads, "Total weight of 10 apples = 10 * 150 grams = 1500 grams." In this way, the user not only gets the answer, but also understands how the model arrived at that answer.
In the operation and maintenance of the alarm source judgment to do to assist, or troubleshooting recommendations, etc. can produce good results, but also reduces the investment of new manual skills training, easier for operation and maintenance personnel to unify the perspective and standards.
TOT (Tree of Thought) Tree of Thought
Here I may need to say a few words about the Thinking Tree framework in particular. "TOT Thinking Tree" is not a widely recognized or standardized term, so its exact definition may vary in different contexts or domains. However, we can understand its possible meanings based on the concept of "thinking tree".
Tree of Thoughts (TT) is a structured method for representing and organizing the thinking process, which shows the levels and branches of thinking in the form of a tree diagram. In scenarios such as decision making, problem solving, and idea generation, Tree of Thoughts can help people systematically explore possibilities and evaluate different options to make more informed decisions.
In the thinking tree:
- Root node: usually represents the starting point of a problem or decision, i.e. the core problem to be solved.
- Branches: Starting from the root node, each branch represents a possible thinking direction or solution. Branches can be further subdivided to form more detailed sub-branches that represent more specific thinking steps or sub-problems.
- Leaf node: the end of the tree, representing the end result or conclusion of the thinking process.
By constructing a thought tree, one can:
- Explore systematically: make sure all possible directions of thinking are considered to avoid missing important information or solutions.
- Evaluate and Compare: Evaluate the advantages and disadvantages of various options by comparing the results of different branches to make a more rational decision.
- Enhanced Understanding: Enhance your understanding of problems by visualizing your thought process and bring clarity to complex decision-making processes.
At present, we haven't gotten particularly good results for TOT, probably because there are still irrational definitions or imprecise parsing problems in the construction. But from the rational investment of resources, supply chain management, improve the quality of decision-making and efficiency it should have a natural advantage exists, if there are any big brother on the TOT depth of attempts and have a rationalization of the proposal, please give more good advice, thank you in advance.
ReAct (Retrieval-Augmented Generation for Thinking and Acting)
In fact, for this framework, my personal summary view, can be understood as a combination of reasoning and action of a new artificial intelligence framework, mainly used to enhance the decision-making ability of AI systems in complex environments and the efficiency of the execution. the core idea of the ReAct framework is to assist the AI system to make more accurate reasoning and decision-making through the real-time retrieval of relevant information and the execution of actions based on this information.
In the ReAct framework, the AI system not only relies on its pre-trained knowledge, but also actively retrieves external information (e.g., databases, web resources, etc.) when it encounters a new situation and integrates this information into its decision-making process. This process can be viewed as a cycle of AI systems between Reasoning and Acting, in which:
- Thinking (Reasoning): the AI system reasons and plans based on the current state and goals to determine the next action to be taken or information to be retrieved.
- Action (Acting): according to the reasoning results, the AI system performs corresponding actions, such as retrieving information and performing tasks.
- Feedback: the AI system updates its state and knowledge based on the results of its actions and then enters the thinking phase again, forming a closed loop.
The strength of the ReAct framework is that it enables AI systems to adapt to changing environments and deal with previously unseen situations, rather than just relying on pre-trained data. By retrieving and integrating new information in real time, AI systems can make more accurate and flexible decisions, improving their performance in complex tasks.
To summarize: ReAct is Reason + Action, while Cot and ToT are just Reason. the essential difference between ReAct and CoT and ToT is that ReAct is not only reasoning, but also using external tools to achieve the goal, I don't know if the explanation here is understandable to all...
Operation and Maintenance Scenario Application
- Alarm Analysis and Troubleshooting: Utilizes CoT and Auto-CoT to assist in troubleshooting and provide decision support.
- Resource Management and Optimization: The TOT framework helps to systematically analyze resource allocation and improve O&M efficiency.
- Dynamic Decision Making and Execution: ReAct framework realizes decision making and action based on real-time information in complex operation and maintenance scenarios.
Through in-depth exploration and practice, we are gradually building an O&M system based on LLM, aiming to improve O&M efficiency and observability. In the future, we will continue to explore more innovative scenarios, promote the widespread application of AI technology in the field of operation and maintenance, and look forward to joining hands with more peers to create a new chapter of operation and maintenance intelligence.
The purpose of this paper is to share the practice and thinking of AI in the field of O&M, exploring how to effectively improve the efficiency and observability of O&M through Semantic Kernel, LangChain, RAG, Fine-Tuning and other technologies, combined with Prompt engineering and reasoning architecture. We look forward to working with more technology explorers and practitioners to promote the innovation and development of AI in O&M.
This article originated from the second CCF-Nightingale Developer Innovation Forum, Nightingale is an open source monitoring system, nearly 10,000 github star, project address:/ccfos/nightingale Favorites