Location>code7788 >text

Standing Out from the Mass of Information: Workflow Intelligent Analytics Solution, Big Language Modeling to Create an Accurate Summary Scoring System for AI Tech Articles (General)

Popularity:745 ℃/2024-08-22 11:19:05

Standing Out from the Mass of Information: Workflow Intelligent Analytics Solution, Big Language Modeling to Create an Accurate Summary Scoring System for AI Tech Articles (General)

1. Introduction

The program integrates the best content from multiple fields such as programming, AI, product design, business technology and personal growth, originating from top technology companies and communities. With the help of advanced language modeling technology, the selected articles are efficiently summarized, professionally scored, and translated into multiple languages, realizing a fully automated process from initial evaluation to in-depth analysis and dissemination. Through the introduction of the Workflow platform, the project has significantly improved the speed and quality of content processing, bringing readers a more convenient, accurate and diversified reading experience, and satisfying the information needs of learners and professionals with different backgrounds and needs.

Its main principle is to collect high-quality blog posts from various fields through RSS feeds and crawlers, and filter and evaluate them through a large language model to improve the quality and efficiency of the content. Its core features include:

  • Accurate core summaries for efficient information access: Using cutting-edge big language modeling technology, the core essence of each article is accurately distilled, so that readers can quickly grasp the key information even in the midst of their busy schedules, and improve the efficiency and quality of their reading.
  • Intelligent multi-dimensional scoring, quality content selection: Strictly screen the sources of articles, relying on the powerful ability of the big language model, and comprehensively evaluating multiple dimensions, such as content depth, writing quality, practical value and relevance, to ensure that what is recommended for readers is all carefully selected quality content.
  • Seamless multi-language translation and global knowledge sharingThe goal is to break down language barriers with industry-leading translation solutions so that developers can easily cross language barriers and freely access and absorb valuable expertise and insights from around the world, facilitating the seamless exchange and sharing of knowledge globally.
  • Workflow Advantage

The original solution adopts a package of big and comprehensive cue word strategies to deal with article abstraction, label generation, scoring and translation, however, this comprehensive approach brings multiple challenges, including the omission of key information in abstracts, label inconsistency, complex scoring mechanism adjustments, stiff translation results, and inefficient modification, testing and deployment during operation and maintenance. The original website adopted a blanket prompt word strategy to handle article abstracts, tag generation, scoring, and translation. However, this comprehensive approach brought multiple challenges, including omission of key information in abstracts, inconsistent tags, complex scoring mechanism adjustment, stiff translation results, and inefficient modification, testing, and deployment during operation and maintenance.

1.0 How Workflow should be selected

Reference Article:RAG+AI Workflow+Agent: How to Choose LLM Framework, Comprehensive Comparison of MaxKB, Dify, FastGPT, RagFlow, Anything-LLM, and More Recommended RAG+AI Workflow+Agent: How to Choose LLM Framework, Comprehensive Comparison of MaxKB, Dify, FastGPT, MaxKB, Dify, FastGPT, RagFlow, Anything-LLM, and more.

When choosing an AI application development platform, it is important to understand the features, community support, and ease of deployment of different platforms. When choosing an AI application development platform, it is important to understand the features, community support, and ease of deployment of different platforms.

1.0.1 Advantages and Disadvantages of MaxKB/Dify

  • dominance

    • Large model access flexibility: Provides a variety of large model access, support for a variety of API interfaces, so that developers can flexibly select and switch models according to demand, which is particularly important for application scenarios that require high-performance models.

    • Powerful Chat Features: The Chat feature not only supports multi-round conversations, but also enhances the user experience through intelligent recommendations and contextual understanding for scenarios that require complex interactions.

    • Rich knowledge base support: Built-in knowledge base management system that supports the import and export of multiple data formats, making it easy for users to manage and utilize knowledge resources.

    • Efficient Workflow DesignWorkflow is designed to be simple and intuitive, and supports drag-and-drop operation, which makes it possible for non-technical people to get started quickly and greatly reduces the threshold of use.

    • Prompt IDE: The Prompt IDE tool provided allows developers to debug and optimize prompts more intuitively, improving development efficiency.

  • inferior

    • learning curve: Although the interface is designed to be more user-friendly, it still takes some time for beginners to familiarize themselves with its workflow and functions.

    • Community Support: Compared to some mature development platforms, the community activity and resourcefulness still needs to be improved, which may affect the speed of developers in solving problems.

    • Degree of customization: While Dify offers a wealth of functionality, further development and tweaking may be required for certain highly customized requirements.

1.0.2 Advantages and Disadvantages of FastGPT/RagFlow

  • dominance

    • Agent Intelligence: Agent intelligences are powerful enough to automate complex tasks, reducing the need for human intervention and making them suitable for scenarios where a large number of tasks need to be automated.

    • LLMOps Support: LLMOps support is provided, making it easier for developers to train, optimize, and deploy models, which is critical for continuous iteration and optimization of AI models.

    • back-end as a service: Provides the function of back-end as a service , simplifies the back-end development process , so that developers can focus more on the front-end and business logic development .

    • Powerful RAG engine: The RAG engine can efficiently process and retrieve large amounts of data for application scenarios that require fast response and high throughput.

  • inferior

    • Functional complexity: The functions of FastGPT are more complicated, for beginners, it may take a longer time to master its usage and skills.

    • Deployment difficulty: Compared to some lightweight development platforms, the deployment process of FastGPT can be more complex and requires some technical background and experience.

    • user: Although FastGPT is powerful, its user interface may not be as intuitive and friendly as some of its competitors, which may affect the user experience.

1.0.3 Selection of platforms based on needs

Choosing the right platform starts with defining your needs.Dify and FastGPT each have their own characteristics and are suitable for different application scenarios.

  • MaxKB/Dify: Suitable for developers who need to build and deploy AI applications quickly, it provides a rich set of pre-built templates and integration tools that enable developers to get started quickly, and is especially suitable for beginners and teams that need to validate ideas quickly.

  • FastGPT/RagFlow: Suitable for enterprise-level users who need highly customized and complex workflows, providing a powerful RAG engine and Workflow orchestration that can handle complex business logic and data processing needs.

  • The following factors should be considered when selecting a platform:

    • Project Size: If it's a small project or startup team, the rapid deployment and ease of use of MaxKB/Dify may be more appropriate. If it's a large enterprise-level project, the power and customization of FastGPT/RagFlow is more appropriate.

    • Technology stack: Consider the team's existing technology stack and members' technical backgrounds. The technical implementation is different, and choosing a platform that matches the team's technology stack can reduce learning costs and development difficulty.

    • Functional requirements: Define the core functions required by the project, such as large model access, Chat function, knowledge base, etc. Dify and FastGPT have their own advantages in these functions, so choose according to the specific needs.

1.0.4 Community vs. Support

Community support and resourcefulness are also critical to platform selection.

  • MaxKB/Dify: has an active community that provides a wealth of documentation, tutorials and sample code. Community members often share tips and solutions, so you can get quick help for any problems you encounter.

  • FastGPT/RagFlow: The community is relatively small, but offers a professional technical support team. For enterprise-level users, FastGPT provides customized technical support and consulting services to ensure the smooth running of the project.

  • The following factors should be considered when selecting a platform:

    • Community Activity: An active community means more resources and faster problem solving. A more active community is good for developers who need to solve problems quickly.

    • Technical Support: For enterprise-level users, professional technical support is crucial. Professional technical support is provided for users who have high requirements for technical support.

1.0.5 Deployment and Ease of Use

Ease of deployment and use directly impacts development efficiency and cost.

  • MaxKB/Dify: provides an easy-to-use interface and one-click deployment features, enabling developers to quickly deploy applications to the cloud or locally. The documentation is detailed and suitable for beginners to get started quickly.

  • FastGPT/RagFlow: Relatively complex to deploy and requires some technical background and configuration. Provides powerful customization capabilities, suitable for users with high requirements for performance and functionality.

  • The following factors should be considered when selecting a platform:

    • Deployment Difficulty: MaxKB/Dify's deployment process is simple and suitable for developers who need to deploy quickly. FastGPT/RagFlow's deployment is relatively complex, but offers more configuration options.

    • Ease of Use: MaxKB/Dify's user interface is friendly and easy to use. FastGPT/RagFlow's user interface is relatively complex, but offers more features and customization options. ## 7.0 Advantages and Disadvantages Selection

1.1 RSS feeds

Site articles are sourced from all of the following RSS feeds (200):

WeChat public number to RSS using wewe-rss project to build, has been supported by the WeChat public number RSS feeds (200) are organized as follows:

Specific information can be found atcode sourcefile

  • For more technical details refer to RSSHUB./DIYgod/RSSHub

  • wewe-rss:/cooderl/wewe-rss

  • Ali technology

<?xml version="1.0" encoding="utf-8"? >
<feed xmlns="http:///2005/Atom">
    <id>/feeds/MP_WXS_3885737868.atom</id>
    <title>AliTech</title>
    <updated>2024-04-18T09:37:25.000Z</updated>
    <generator>WeWe-RSS</generator>
    <author>
        <name>AliTech</name>
    </author>.
    <link rel="alternate" href="/feeds/MP_WXS_3885737868.atom"/>
    <subtitle> Ali technology official number, Ali's hardcore technology, cutting-edge innovation, open source projects are here. </subtitle>;
    <logo>/mmhead/Q3auHgzwzM4bGHHEe4N3y73ILDk0Jv7DPug7bZoBE1lFlYGxbvQJHg/0</logo>
    <icon>/mmhead/Q3auHgzwzM4bGHHEe4N3y73ILDk0Jv7DPug7bZoBE1lFlYGxbvQJHg/0</icon>
    <entry>
        <title type="html"> <! [CDATA["JVM" on AOP: Java Agent Practices]]></title>
        <id>0ReJc-4df5Ga0FacvNbz6Q</id>
        <link href="/s/0ReJc-4df5Ga0FacvNbz6Q"/>
        <updated> 2024-08-16T00:41:14.000Z</updated>
    </entry>
    <entry>
        <title type="html"> <! [CDATA[MySQL 8.0: problem analysis of filesort performance degradation]]></title>
        <id>nvsuJnHXVfP8m08uJSElUg</id>
        <link href="/s/nvsuJnHXVfP8m08uJSElUg"/>
        <updated> 2024-08-13T07:52:16.000Z</updated>
    </entry>
    <entry>
        <title type="html"> <! [CDATA[Front-end online code editor technical miscellany]]></title>
        <id>VEV6RmOdZpAg7RBQzDmfUA</id>
        <link href="/s/VEV6RmOdZpAg7RBQzDmfUA"/>
        <updated> 2024-08-08T14:38:21.000Z</updated>
    </entry>
 </feed>.
  • NIC
<?xml version="1.0" encoding="utf-8"? >
<feed xmlns="http:///2005/Atom">
    <id>/feeds/MP_WXS_3271041950.atom</id>
    <title>New Zeal</title>
    <updated>2024-04-09T05:05:12.000Z</updated>
    <generator>WeWe-RSS</generator>
    <author>
        <name>NIC</name>
    </author>.
    <link rel="alternate" href="/feeds/MP_WXS_3271041950.atom"/>
    <subtitle> Intelligence + China main platform, dedicated to promoting China from the Internet + towards a new era of intelligence +. Focusing on the development of artificial intelligence, robotics and other cutting-edge fields, it pays attention to the impact of human-machine fusion, artificial intelligence and robotics revolution on the evolution of human society and civilization, and navigates China's new intelligent era. </subtitle>;
    <logo>/mmhead/Q3auHgzwzM5Ge0ZibsJqTzd6HdTSHcydlic4TnsmpJicUrIlicD1L9ficFw/0</logo>
    <icon>/mmhead/Q3auHgzwzM5Ge0ZibsJqTzd6HdTSHcydlic4TnsmpJicUrIlicD1L9ficFw/0</icon>
    <entry>
        <title type="html"> <! [CDATA[Millions Online, Return of the Great Sage! Black Myth: Wukong" is a stone's throw from the RTX 4090D flying over the Mountain of Flowers and Fruits]]></title>.
        <id>_b5XI5sTqQmZpmRCRO5p_w</id>
        <link href="/s/_b5XI5sTqQmZpmRCRO5p_w"/>
        <updated> 2024-08-20T04:53:07.000Z</updated>
    </entry>
    <entry>
        <title type="html"> <! [CDATA[Sequoia Capital Partner Foresight: Three Elements of Big Models Are Obsolete, Power, Servers, and Steel Become Key to Winning]]></title>
        <id>7ZI88g5rCHqw3hEr1lI-Pg</id>
        <link href="/s/7ZI88g5rCHqw3hEr1lI-Pg"/>
        <updated> 2024-08-20T04:53:07.000Z</updated>
    </entry>
    <entry>
        <title type="html"> <! [CDATA[Another battleground in the endless struggle between AI and humans: CAPTCHA]]> </title>
        <id>8_k6_6rd36U7MIeWwiezyA</id>
        <link href="/s/8_k6_6rd36U7MIeWwiezyA"/>
        <updated> 2024-08-20T04:53:07.000Z</updated>
    </entry>
    <entry>
        <title type="html"> <! [CDATA[AI anime head startup hiring AI algorithm interns/engineers! Founding team B-site turned in their origins and received nearly 100 million yuan in financing]]></title>
        <id>ig6NVfleqAFgggz2Xhv8qw</id>
        <link href="/s/ig6NVfleqAFgggz2Xhv8qw"/>
        <updated> 2024-08-20T04:53:07.000Z</updated>
    </entry>
    <entry>
        <title type="html"> <! [CDATA[AI designs itself, the Code Creator is here! UBC Chinese one-author first mentioned ADAS, math skills skyrocketed 25.9%]]></title>
        <id>IjNLHLov8UyAiRGDkf_XTA</id>
        <link href="/s/IjNLHLov8UyAiRGDkf_XTA"/>
        <updated> 2024-08-20T04:53:07.000Z</updated>
    </entry>
 </feed>.

1.2 Principles of implementation

  1. Article crawling process: Based on the RSS protocol, crawls article information from all subscription feeds, including title, link, publish time, etc. Crawls full-text content through links and headless browsers. Extract the body text through the body selector defined on the subscription source, and process the HTML, images, etc. of the body text into the list of articles to be processed.
  2. Article Initial Evaluation Process: Initial scoring of articles through language, article content, and other characteristics to eliminate low-quality articles and marketing content and reduce subsequent steps in processing.
  3. Article Analysis Process: Abstract, categorize and rate articles through a large language model, generating one-sentence summaries, article abstracts, main ideas, article golden sentences, domains, tag lists and ratings, etc., making it easy for readers to quickly filter and filter as well as understand the main content of the whole article, and to judge whether to continue reading or not. IncludingSegmentation - Aggregate Analysis - Domain Segmentation and Tag Generation - Article Scoring - Check and Reflect - Optimization and Improvement etc. nodes.
  4. Analysis results translation process: Translation of article analysis results through a large language model. Currently, the website supports both Chinese and English languages, and translates summaries, main ideas, golden sentences of articles, tag lists, etc. according to the language of the original text and the target language. IncludingRecognizing terminology & First-time translators - Checking translations - Intentional translations and other links.

1.2.1 Process of initial article evaluation

Process Description:

  • In order to facilitate testing and interface calls, this flow design takes the article ID of the website as input. Through Workflow's built-in HTTP call node and code node, it can efficiently call the website's API, and then get the article's metadata (including title, source, link, language, etc.) as well as the full text content.
  • Different models and prompt words are used for Chinese and English articles, and this design makes it possible to adjust and optimize the processing flow more flexibly to adapt to the characteristics of articles in different languages.
  • In the LLM node of the initial review of the article, the CO-STAR cue word framework is used to clarify the context, objectives, analysis steps, and input/output formats, and sample outputs are provided. The complete cue word setup can be viewed at the above project address for better understanding and application.
  • The web application passes in the article ID and gets the results of the initial evaluation of the article by calling the open API of Dify Workflow. Based on the ignore and value attributes in the result, you can determine whether you need to continue with subsequent processing of the article.

1.2.2 Article Analysis Process

Process Description:

  • The input to the analysis process is also the article ID of the website, and with Workflow's built-in HTTP call node and code node, we are able to easily call the website's API to obtain the article's metadata (including title, source, link, language, etc.) as well as the full-text content.
  • In order to make sure that we do not miss any key information in the article, the analysis process will first determine the length of the article. If the length of the article is more than 6000 characters, we will perform segmentation; otherwise, the full text will be analyzed directly.
  • The content output of the analysis mainly consists of a one-sentence summary, an article summary, keywords, main ideas, and highlights, which are elements that help the reader quickly understand the core content of the article.
  • In the analysis process, we make full use of the branching, iteration, and variable aggregation nodes in Workflow, which allows us to control the process flexibly. For different branching results, we can use variable aggregation to integrate the full text of the analysis together, which is convenient for subsequent nodes.
  • Next is the domain classification and label generation node. We categorize the content of the articles through a large language model to generate a list of domains and tags to which the articles belong. These tags cover a wide range of topics, technologies, application domains, products, companies, platforms, celebrities, trends, etc., which help organize subsequent articles and enhance the effectiveness of search and recommendation functions.
  • In the article scoring node, we utilize a large language model to evaluate the article content in multiple dimensions, including depth of content, quality of writing, usefulness, and relevance. This generates article ratings that help readers quickly filter out quality articles.
  • The subsequent Check Reflection node asks the Big Language Model to play the role of a technical article reviewer. It will check the aforementioned outputs for comprehensiveness, accuracy, consistency, etc., and output the results of the checks and reflections.
  • Finally, there is an optimization and improvement node based on the results of the inspection and reflection. Here, the Big Language Model analyzes the results of the inspection and analysis and reconfirms the output format and language. Eventually, it will output the optimized analysis results along with the reason for the update.
  • The web application passes in the article ID and gets and saves the analysis results of the article by calling Workflow's open API. Based on the article's score, we can determine whether we need to continue to follow up on the article.

1.2.3 Translation process for analyzing results

Process Description:

  • The input to the translation process is the article ID of the website. through Workflow's built-in HTTP call node and code node, we can call the website's API to get the metadata of the article (including title, source, link, language of the original article, target language, etc.), as well as the full-text content and analysis results.
  • The translation process adopts a three-stage model of "initial translation - checking and reflecting - optimizing and improving, focusing on paraphrasing". This model is designed to ensure that the translation is closer to the expression habits of the target language, and to enhance the accuracy and naturalness of the translation.

2. Initial article evaluation process

Process Description:

  • In order to facilitate testing and interface calls, this flow design takes the article ID of the website as input. Through Workflow's built-in HTTP call node and code node, it can efficiently call the website's API, and then get the article's metadata (including title, source, link, language, etc.) as well as the full text content.
  • Different models and prompt words are used for Chinese and English articles, and this design makes it possible to adjust and optimize the processing flow more flexibly to adapt to the characteristics of articles in different languages.
  • In the LLM node of the initial review of the article, the CO-STAR cue word framework is used to clarify the context, objectives, analysis steps, and input/output formats, and sample outputs are provided. The complete cue word setup can be viewed at the above project address for better understanding and application.
  • The web application passes in the article ID and gets the results of the initial evaluation of the article by calling the open API of Dify Workflow. Based on the ignore and value attributes in the result, you can determine whether you need to continue with subsequent processing of the article.

Article id fetch

2.1 Initial Article Review LLM Node

2.1.1 System Prompt

Below are the prompts for the initial evaluation of the Chinese articles; for the English articles, the prompts are simply translated into English.

(C) Context: you are an advanced content analysis assistant who sifts through articles for a website geared toward technology practitioners, entrepreneurs, and product managers. This site focuses on collecting and sharing high-quality content about software development, artificial intelligence, product management, marketing, design, business, technology, and personal growth.

(O) Goal: Your task is to quickly analyze a given article and decide if the article should be ignored. You need to identify low-value, irrelevant, or poor-quality content while making sure you don't miss potentially high-value articles.

(S) Style: Please analyze and judge articles in the style of an experienced content curator. You should be concise, to the point, and able to quickly identify the core value of the article.

(T) Tone: Maintain a professional, objective tone. Your analysis should be based on facts and clear criteria, not subjective feelings.

(A) Audience: The results of your analysis will be used by the site's content management team, who will need to make quick decisions about whether to include articles in the site's content library.

(R) Response: Please output your analytics in JSON format using Chinese, including the following fields:
- ignore: Boolean indicating whether the article should be ignored or not
- reason: string, briefly explain the main reason for the judgment (limited to 30-50 words)
- value: an integer rating from 0-5 indicating the value of the article (0 means it should be ignored, 1-5 means the value level)
- summary: a one-sentence summary of the article's main content
- language: a string indicating the language of the article (e.g. "Chinese", "English", "Japanese", etc.)

Please analyze the article according to the following criteria:

1. language: whether it is in Chinese or English. If not, just ignore it.
2. Content type: whether it is substantive content, not simply announcements, event previews, advertisements or chit-chat.
3. Topic relevance: whether it is related to the target area (software development, artificial intelligence, product management, marketing, design, business, technology and personal growth, etc.).
4. quality and value:
   - Content depth: whether it provides insights, unique perspectives, or valuable information
   - Technical depth: for technical articles, assess the level of expertise and technical details
   - Practicality: whether it can inspire thinking or provide practical solutions.

Scoring criteria:
- 0: Articles that should be ignored (not in English or Chinese or completely irrelevant)
- 1: Low quality or largely irrelevant, not recommended reading
- 2: Low quality or weakly relevant, but may have a small amount of reference value
- 3: Average quality, relevant and in-depth, but lacking in unique insight or innovation, worth reading
- 4: High quality, provides valuable insights or practical information, recommended reading
- 5: Very high quality, provides in-depth analysis, innovative ideas or important solutions, highly recommended reading

Note: For obviously irrelevant or low-quality articles, you can make a judgment based only on the title and the beginning part, without reading the whole article.

The input format for articles is XML and includes the following fields:
- `<title>`: title of the article
- `<link>`: the link to the article
- `<source>`: source of the article
- `<content>`: content of the article, formatted in Markdown and included in CDATA

Here are some sample outputs for your reference:

Example 1 (high value technical article):
{
  "ignore": false,
  "reason": "An in-depth look at the application of machine learning in recommender systems, with detailed algorithm descriptions and code examples",
  "value": 5,
  "summary": "A detailed introduction to collaborative filtering algorithms for building recommender systems, including theoretical explanations and implementation details", "language": "English": "English": "English": "English": "Chinese
  "language": "English"
}

Example 2 (high quality design article):
{
  "ignore": false,
  "reason": "Showcases examples of excellent UI design, analyzing design features and user experience considerations",
  "value": 4,
  
  "language": "English"
}

Example 3 (related and average quality article):
{
  "ignore": false, "reason": "Discusses pros and cons of remote work, relevant content but lacks insight", {
  "reason": "Discusses the pros and cons of working remotely, relevant content but lacks in-depth insights",
  "value": 3,
  
  "language": "Chinese"
}

Example 4 (articles with excessive marketing tendencies):
{
  "ignore": true,
  "reason": "Over-marketed and lacking in substance and unique insights",
  "value": 0,
  "summary": "Promotes new project management tool, lacks detailed feature analysis and user examples", "value": 0, "summary": "Promotes new project management tool, lacks detailed feature analysis and user examples", "language": "Chinese": "Chinese": "Chinese
  "language": "Chinese"
}

Example 5 (Boundary case: relevant but not professional enough article):
{
  "ignore": false, "reason": "Technically relevant but skewed towards consumer advice, informative for some readers.
  "reason": "Technically relevant but skewed toward consumer advice, informative for some readers",
  "value": 2,
  
  "language": "Chinese"
}

Example 6 (low value article):
{
  "ignore": true,
  "reason": "Simple product release notification that lacks substance",
  "value": 0,
  "summary": "A company will be releasing a new smartphone, containing only information on when and where it will be released", "language": "Chinese": "Chinese": "English", "language": "Chinese": "English": "Chinese
  "language": "Chinese"
}

Example 7 (article in non-target language):
{
  "ignore": true, "reason": "The article is not in Chinese or English", {
  "reason": "The article is not in Chinese or English",
  "value": 0,
  "summary": "The article language does not meet the requirements",
  "language": "Japanese"
}

Note that the value, summary, and language fields should be provided even for articles that are recommended to be ignored. value should reflect the potential value of the article to the target audience, even if this value is low or 0. summary should briefly summarize the main content of the article, whether it is relevant or not. the language field should always indicate the language type of the article.

2.1.2 User Prompt

Please analyze based on the following article as per the requirement and output the JSON string in the specified format.

<article>
  <title>{{##}}</Title>
  <link>{{##}}</Link>
  <source>{{##}}</Source>
  <content>
    <! [CDATA[
        {{##}}
    ]]>
  </content> <!
</article>

2.2 Test examples

Chinese article test results

English article test results

3. Article analysis process

Process Description:

  • The input to the analysis process is also the article ID of the website, and with Workflow's built-in HTTP call node and code node, we are able to easily call the website's API to obtain the article's metadata (including title, source, link, language, etc.) as well as the full-text content.
  • In order to make sure that we do not miss any key information in the article, the analysis process will first determine the length of the article. If the length of the article is more than 6000 characters, we will perform segmentation; otherwise, the full text will be analyzed directly.
  • The content output of the analysis mainly consists of a one-sentence summary, an article summary, keywords, main ideas, and highlights, which are elements that help the reader quickly understand the core content of the article.
  • In the analysis process, we make full use of the branching, iteration, and variable aggregation nodes in Workflow, which allows us to control the process flexibly. For different branching results, we can use variable aggregation to integrate the full text of the analysis together, which is convenient for subsequent nodes.
  • Next is the domain classification and label generation node. We categorize the content of the articles through a large language model to generate a list of domains and tags to which the articles belong. These tags cover a wide range of topics, technologies, application domains, products, companies, platforms, celebrities, trends, etc., which help organize subsequent articles and enhance the effectiveness of search and recommendation functions.
  • In the article scoring node, we utilize a large language model to evaluate the article content in multiple dimensions, including depth of content, quality of writing, usefulness, and relevance. This generates article ratings that help readers quickly filter out quality articles.
  • The subsequent Check Reflection node asks the Big Language Model to play the role of a technical article reviewer. It will check the aforementioned outputs for comprehensiveness, accuracy, consistency, etc., and output the results of the checks and reflections.
  • Finally, there is an optimization and improvement node based on the results of the inspection and reflection. Here, the Big Language Model analyzes the results of the inspection and analysis and reconfirms the output format and language. Eventually, it will output the optimized analysis results along with the reason for the update.
  • The web application passes in the article ID and gets and saves the analysis results of the article by calling Workflow's open API. Based on the article's score, we can determine whether we need to continue to follow up on the article.

Running time: 157.478s, total token consumption: 29114 Tokens

When batch processing a large number of articles, you can pass in the metadata and content of the articles you want to get out directly in the start node entry, instead of going through the HTTP interface to get it.

{
  "oneSentenceSummary": "The Ali Thousand Questions Big Model team has released the world's strongest math big model, Qwen2-Math, which provides user-friendly demos, supports image uploading for solving problems, and plans to combine multimodal capabilities and mathematical reasoning into one model." ,
  "oneSentenceSummaryUpdateReason": "Emphasizes the user-friendliness of the model and future plans for technology integration to make summaries more refined and concrete." ,

  "summary": "Ali Qianqian Big Model team has newly released Qwen2-Math, the world's strongest math big model, and provided an online demo that allows users to upload math problems to solve via images. The model supports Chinese questions, and although it is currently focused on English scenarios, a bilingual version will be released in the future.Qwen2-Math includes three parametric versions, with the flagship model, Qwen2-Math-72B-Instruct, excelling on the MATH dataset and outperforming models such as GPT-4o. In addition, the team plans to combine multimodal capabilities and mathematical reasoning into one model to further enhance performance and user experience." ,
  "summaryUpdateReason": "Added details of the model's performance on the MATH dataset to make the summary more comprehensive and specific." ,

  "domain": "Artificial Intelligence", , "domainUpdateReason".
  "domainUpdateReason": "No update, the original categorization accurately reflects the core content of the article." ,

  "aiSubcategory": "AI Models", "aiSubcategoryReason".
  "aiSubcategoryReason": "No update, the original categorization accurately reflects the core content of the article." ,

  "tags": ["Math Big Models", "Multimodal Models", "Mathematical Reasoning", "AliThousandQuestionsBigModels", "Chinese Applicable", "Image Recognition", "Tech Demo", "Education Technology"], .
  "tagsUpdateReason": "Added the 'Education Technology' tag to reflect the potential application of the model in the education domain." ,

  "mainPoints": [
    {
      "point": "The world's most powerful math megamodel released",
      "explanation": "Qwen2-Math is developed by Aliqianqian big model team, it is the world's strongest math big model, supports image uploading for solving problems, and is easy to operate."
    },.
    {
      "point": "Future plans for combining multimodal and mathematical reasoning.", {
      "explanation": "Ali's senior algorithm expert, Jun Yang Lin, revealed future plans to combine multimodal capabilities and mathematical reasoning into one model to provide more comprehensive functionality."
    },.
    {
      "point": "Model performance and applications", {
      "explanation": "Qwen2-Math-72B-Instruct performs well on the MATH dataset with an accuracy of 84% and has outperformed several well-known models such as GPT-4o and Claude 3.5."
    }
  ],.
  "mainPointsUpdateReason": "Added details on the model's performance on the MATH dataset to make the main points more specific and compelling." ,

  "keyQuotes": [
    "Now the most powerful math megamodel is available for everyone to get started and play!" ,
    "But in the near future, we'll be combining multimodal abilities and mathematical reasoning into one model, yo." ,
    "Qwen2-Math-72B-Instruct handles a wide range of math problems in algebra, geometry, counting and probability, number theory, and more with 84% accuracy."
  ],.
  "keyQuotesUpdateReason": "No update, the original golden quote accurately reflects the core of the article and the model." ,

  "score": 88,.
  "scoreUpdateReason": "Added recognition of the model's innovation and utility in combining mathematical problem solving and multimodal capabilities, improving the score." ,
  "improvements": "Added 'educational technology' tag to reflect potential applications of the model in education; added more specific examples of model performance and applications to the abstract and main ideas to enhance utility and appeal; added recognition to the rating for the model's innovation and utility in combining mathematical problem solving and multimodal competencies combined in recognition of their innovation and usefulness."
}

Due to the length of the article: see the flow of the article analysis:Article Analysis Process

4. Analyze the translation process of the results

Process Description:

  • The input to the translation process is the article ID of the website. through Workflow's built-in HTTP call node and code node, we can call the website's API to get the metadata of the article (including title, source, link, language of the original article, target language, etc.), as well as the full-text content and analysis results.
  • The translation process adopts a three-stage model of "initial translation - checking and reflecting - optimizing and improving, focusing on paraphrasing". This model is designed to ensure that the translation is closer to the expression habits of the target language, and to enhance the accuracy and naturalness of the translation.
{
  "title": "Unlocking the Potential of Structured Data in the Enterprise with Natural Language: Amazon Q Business | Amazon Web Services",
  "oneSentenceSummary": "Amazon Q Business simplifies enterprise access to structured data by leveraging natural language processing technology to transform natural language queries into precise SQL queries." ,
  "summary": "Amazon Q Business solves the problem of accuracy and timeliness of pre-trained base models for processing enterprise-specific data through natural language processing technology. It acts as a bridge between natural language and structured data, converting natural language queries into SQL queries and executing those queries through AWS Athena. This architecture simplifies data access for non-technical users, optimizes workflows for professionals, and supports a wide range of use cases." ,
  "tags": ["Natural Language Processing", "Generative AI", "Large Language Models", "Amazon Q Business", "SQL Query Generation", "AWS Athena", "Data Query Workflows", "Enterprise Data Management", "AWS Services", "Business Intelligence"], .
  "mainPoints": [
    {
      "explanation": "This highlights the key role of Amazon Q Business in streamlining data access processes." ,
      "point": "Amazon Q Business acts as a bridge between natural language and structured data, transforming natural language queries into precise SQL queries."
    },.
    {
      "explanation": "This emphasizes the practical value of the technology and its applicability to different users." ,
      "point": "The architecture simplifies data access for non-technical users, optimizes workflows for professionals, and supports a wide range of use cases."
    }
  ],.
  "keyQuotes": [
    "Amazon Q Business acts as an intermediary to transform natural language into precise SQL queries." ,
    "The workflow consists of the following steps: the user initiates the interaction via the Streamlit app..."
  ]
}

Due to the length of the article: see the article translation process:translated chapters

Link to original article:/sinat_39620217/article/details/141399014

5. Summary and outlook

5.1 Summary

The program integrates the best content from multiple fields such as programming, AI, product design, business technology and personal growth, originating from top technology companies and communities. With the help of advanced language modeling technology, the selected articles are efficiently summarized, professionally scored, and translated into multiple languages, realizing a fully automated process from initial evaluation to in-depth analysis and dissemination. Through the introduction of the Workflow platform, the project has significantly improved the speed and quality of content processing, bringing readers a more convenient, accurate and diversified reading experience, and satisfying the information needs of learners and professionals with different backgrounds and needs.

The author of this project provides more value is that when we face more complex streaming tasks, we can learn from his solution, I have the task in hand to disassemble, and LLM together, quality and quantity to complete the final results!
The model used above is: deepseek's large model, so far the effect feels okay!

5.2 Outlook

  1. Intelligent Search Optimization: Workflow intelligently parses search intent and deeply integrates article domain categorization, keyword matching, tag filtering and summary overview to build an unprecedentedly accurate search engine, allowing readers to quickly locate the knowledge they need in a sea of information and improve search efficiency.
  2. Personalized Content Recommendation Upgrade: Relying on the user's reading history and interest preferences, we have crafted a set of intelligent recommendation algorithms to customize an exclusive list of articles for each user, ensuring that every swipe is a surprise and making reading more intimate and efficient.
  3. Revolutionizing the Interactive Q&A Experience: Introducing advanced artificial intelligence technology, we have constructed an intelligent Q&A platform based on the deep understanding of articles. Readers can directly initiate questions to the system and get precise answers instantly, eliminating reading barriers and promoting knowledge absorption and understanding.
  4. Global Language Borderless Reading: Workflow empowers the full-text translation function, breaking down language barriers and allowing readers to easily swim in a sea of quality technical articles from around the world. Whether in English, Japanese or French, with just one click, you can enjoy an immersive reading experience and broaden your knowledge horizons.

Reference Links

  • RSS Hub
  • wewe-rss
  • Dify
  • BestBlogs-github
  • BestBlogs