The evolution of intelligent customer service: from legacy to a new era of vector databases

The development of domestic databases has made remarkable progress in the early 21st century. According to incomplete statistics, there are now more than 300 different databases on file in China. This phenomenon was almost unimaginable 40 years ago, marking China's great breakthrough and diverse choices in the database field. For those who are interested in the story of the old generation or the history of database development, I highly recommend watching the documentary "The Past and Present Life of Chinese Databases". Although it is in documentary format, the content is lively and interesting, and well worth watching.

Continuing with this topic, we can see that despite the continuous development and growth of domestic databases, the rise of vector databases has attracted widespread attention in recent years. The rapid development of vector databases has not only demonstrated their unique advantages in handling complex and high-dimensional data, but also solved some technical difficulties that traditional databases could not deal with efficiently. The success of vector databases is mainly due to their excellent performance in the fields of large-scale data analysis, real-time retrieval and intelligent recommendation.

This technological advancement has triggered widespread attention and heated discussions in the market, and has also had a considerable impact on traditional databases. Traditional databases excel in handling structured data and transaction management, but are often overwhelmed when dealing with unstructured data, semantic search and machine learning tasks. Therefore, the rise of vector databases has not only driven the innovation of data storage and processing technologies, but also prompted traditional database systems to continuously adapt to new needs and challenges.

We can take the intelligent customer service scenario as an example to review the evolution from traditional databases to the current vector databases and the choices made by domestic enterprises in this process.

Intelligent Customer Service

If we talk about the origin of intelligent customer service, we can trace back to the beginning of the rise of the Internet, when domestic enterprises have begun to explore the construction of customer service systems. Although the traditional manual customer service mode can provide more intimate service, but it requires a lot of human resources and capital investment. Especially in the face of a large number of repetitive problems, manual customer service is not only inefficient, but also costly. Therefore, the market urgently needs a more efficient and economical solution to deal with these repetitive inquiries.

market size

The market for intelligent customer service is very large and growing. Its main technological goal is to automate the processing of high-frequency, simple questions in order to drastically reduce the burden of manual customer service. This automated processing can significantly improve service efficiency and reduce enterprise costs, while ensuring a quick response to basic questions. However, human customer service is still indispensable for complex and difficult issues, which often require a higher level of understanding and judgment.

China's intelligent customer service industry reached a market size of $6.68 billion in 2022 and is expected to reach a size of $18.13 billion in 2027. Empowered by the AI big model, intelligent customer service is able to achieve more accurate and intelligent personalized customer interaction. With the continuous development and application of the A!big model, the market size of China's intelligent customer service industry is expected to continue to grow.

It is for this reason that major organizations have been actively seeking and exploring smart customer service solutions at an early stage. Through the introduction of advanced technologies and systems, enterprises hope to achieve the goal of automated processing, so as to optimize the customer service process and improve the user experience, while maintaining the ability to manually handle complex issues. In this process, the continuous evolution and technological innovation of intelligent customer service systems have become the key to improving service quality and operational efficiency of enterprises.

Intelligent customer service classification

We can start with the intelligent customer service systems that we come into contact with in our daily lives to summarize and analyze several major types of intelligent customer service, and to explore how vector databases have solved the key pain points in intelligent customer service, thus driving its rapid development.

First of all, intelligent customer service systems can be categorized into several main types:

Task Management Class Module：
This type of intelligent customer service system mainly focuses on the processing of specific tasks. For example, functions such as booking air tickets and reserving hotels, these types of systems are similar to Apple's Siri, which is a task-processing intelligent assistant. They are designed to accomplish specific tasks and help users reach their goals efficiently through predefined processes and operations.
Knowledge base question and answer system：
Knowledge base Q&A systems are mainly used to provide answers in the advisory category. They rely on a predefined knowledge base and deal with a variety of advisory questions posed by users. Unlike task management systems, knowledge base Q&A systems do not deal with actual tasks, but only provide information and advice. At the core of such systems is the maintenance of an exhaustive knowledge base that ensures accurate answers to users' questions.
Knowledge Graph Question and Answer System：
Knowledge graph quiz systems, on the other hand, utilize graph structures to provide information. This type of system not only contains question-answer pair structure and tree structure, but also organizes related information in the form of graphs through knowledge graphs. Knowledge graphs allow for a more comprehensive presentation and correlation of a wide range of information, so it is considered a knowledge base Q&A system in a broader sense. This structure enables intelligent customer service to provide accurate information and correlated answers on a larger scale.
chatbot：
Although chatbots are not the primary function of customer service, they still occupy an important position in intelligent customer service systems. The chat function was introduced for two main reasons: first, chatbots can act as a review subject when users do not have input into the knowledge base or need to test the system's technical competence; and second, in certain scenarios, the chat function can make customer service conversations more natural and lively, reducing monotony. Nonetheless, many intelligent customer service systems allow users the option to turn off the chat feature in order to focus on text-based customer service.

It is worth noting that although voice recognition technology is also part of the field of intelligent customer service, we will not discuss it here for the time being due to the complexity of the technology and application scenarios involved.

Working Principle

natural language understanding

Natural Language Understanding (NLU) involves several key tasks: First, when the user's question consists of multiple sentences, we need to perform a "clause-by-clause" process so that each sentence can be understood and answered independently. Ultimately, these separate answers are combined and provided to the user. Secondly, "Segmentation" is a very common processing step, which is the basis for understanding the text. After segmentation, we can carry out further processing, such as labeling, entity recognition, etc.

In addition to this, syntactic analysis is also an important part of the processing which helps us to understand the structure and relationships in the sentence. Referential disambiguation is then used to determine the entities referred to by pronouns in a sentence, thus improving the accuracy of understanding. In addition, word weight calculation and semantic similarity analysis are also key steps, and these analyses provide important data support for the subsequent algorithms. Overall, these steps constitute the preprocessing phase of natural language understanding, laying the foundation for more complex language processing tasks.

Intent recognition

The second part of the preprocessing work is intent recognition. Intent recognition is centered on parsing the user's sentences to reveal the intent behind them. For example, when a user asks "What's the weather like today?", the intent recognition system can recognize that the user's main purpose is to ask about the weather. For example, when the user says "Book me a ticket to Changchun", the intent recognition system will clarify that the intent is to ask for a ticket.

Intent recognition is usually achieved in two main ways: template matching and classifiers. The template matching approach involves creating specific dictionaries, such as a " city" dictionary containing city names (e.g., "Beijing", "Shanghai", "Tianjin"), and a "date" dictionary containing time words (e.g., "today", "tomorrow", "the day after tomorrow"). "date" dictionary. The system builds templates based on these lexicons, e.g., a city name in the "city" lexicon with a date word in the "date" lexicon and a keyword such as "weather", to recognize the intent of asking about the weather. which recognizes the intention of asking about the weather. In this way, when the sentence matches these templates, we can determine the user's intent.

We can use Python code to simply implement basic intent recognition:

import re
# City Dictionary
city_dict = ["Beijing", "Shanghai", "Tianjin", "Guangzhou", "Shenzhen"]

# Date Dictionary
date_dict = ["today", "tomorrow", "the day after tomorrow"] # date_dict = ["Beijing", "Shanghai", "Tianjin", "Guangzhou", "Shenzhen"] # Date Dictionary

# Template
weather_template = ["city", "any string", "date", "weather"]

def match_template(user_input).
    # Define the regular expression
    city_pattern = "|".join(city_dict)
    date_pattern = "|".join(date_dict)
    pattern = rf"({city_pattern}). *({date_pattern}). *weather"

    # Match user input
    match = (pattern, user_input)

    if match.
        return "Intent to ask about weather"
    else.
        return "Unrecognized intent"

# Test
user_input = "What's the weather like in Beijing today?"
print(match_template(user_input)) # Output: Intent to ask about weather

Although the template matching approach is simple to implement, easy to understand and maintain, and suitable for scenarios with clear and structured rules, it is relatively inflexible. It is limited in its ability to handle complex or variable expressions because templates can only recognize sentences that match predefined patterns. In addition, the method is unable to handle words or variant forms of words that do not appear in the dictionary, which may result in a less comprehensive or accurate identification of the user's intent.

The classifier approach is also very effective in intent recognition, where the core idea is to categorize the user's intent through a machine learning model. For concrete implementation, we need to collect a large amount of corpus in a specific domain and manually annotate these corpus to identify the specific intents they correspond to. Then, we use this labeled data to train classifier models, which can be either biclassifiers or multiclassifiers, for classifying intentions on new inputs.

However, even though the classifier approach can handle complex sentence structures and diverse expressions, it has some challenges. First, it requires a large amount of manually labeled data, a process that is not only time-consuming but also costly. The quality of the labeled data directly affects the performance of the model, so the accuracy of the labeling needs to be ensured. Second, the classifier approach also faces the problem of how to effectively collect and process corpus from multiple domains. The corpus from different domains may have different features and expressions, which need to be appropriately adjusted and optimized in the data collection and preprocessing process.

Knowledge Base Q&A

Next, we will discuss the functional modules in the system. First, we focus on the first and most common module - knowledge base Q&A function. This function is very common in intelligent customer service systems, and its core technology is essentially similar to search engine technology, but applied in a different way. The knowledge base quiz function is usually divided into two main phases: candidate set recall and reordering.

In the candidate set recall phase, the system selects the most relevant candidate answers to the user's query from the knowledge base in a variety of ways. Although there are various recall methods, the recall process of knowledge bases is relatively simple compared to the complexity of search engines. This is because search engines need to deal with massive information retrieval, whereas the contents of knowledge bases are usually imported and maintained manually and are relatively small in size, thus making the recall less complex.

The next reordering phase aims to sort the answers in the candidate set to find the most appropriate response. This process can be implemented using a variety of techniques, including text similarity, search relevance, etc. If the amount of data is sufficient, semantic similarity modeling of neural networks can also be applied to reordering. To improve accuracy, the system can also use a multi-model fusion approach, where the results of different models are combined to obtain the final answer.

Knowledge Graph Q&A

Next up is a type of work more related to knowledge bases, knowledge graph quizzing.

Knowledge Graph (KG) is a semantic network that represents entities and their relationships in the form of nodes and edges. Each node represents an entity (e.g., person, place, event), and edges represent relationships between entities (e.g., "belonging to", "located in", "affecting"). Knowledge graphs not only store structured information, but can also incorporate semantic information for smarter information retrieval and reasoning.

In the field of Artificial Intelligence, the importance of Knowledge Graph is obvious. It provides a machine-readable representation of knowledge that enables computers to better understand and process complex human language and its relationship to the real world. By building knowledge graphs, AI systems are able to achieve more effective knowledge integration, reasoning, and querying, thus playing a key role in numerous application domains.

However, the most challenging part in the implementation of a knowledge graph Q&A system is the organization of the data, followed by the selection and optimization of appropriate tools.

Assuming that we have solved the problem of data sourcing and updating and have the required tools, the next key task is to perform query transformation. Since most knowledge graph tools use a specific query language, we need to convert the natural language into the query language supported by these tools in some way.

There are usually two common approaches to this conversion process: one is to use templates for query conversion, and the other is to utilize machine translation techniques to achieve the conversion if the amount of data is large enough. In addition, knowledge bases and knowledge graphs can be integrated into a unified module, which is often referred to as a knowledge base Q&A system.

Currently, we can summarize several technical difficulties that must be solved. The first difficulty is the data cold start problem. In most cases, we do not have enough data to train the model initially, resulting in fewer entities and relationships in the knowledge graph, thus limiting the knowledge coverage, which can make the system difficult to answer complex user questions. The data update and expansion in the initial phase is slow, which affects the richness and accuracy of the graph.

The second difficulty is the multi-round dialog problem. Multi-round conversations are a major challenge in intelligent customer service systems. Multi-round conversations involve multiple interactions between the user and the system, often including multiple questions and answers. When handling such conversations, the system must effectively maintain the context and state of the conversation in order to provide consistent and relevant answers.

The third difficulty is human-robot collaboration. In the existing intelligent customer service system, the human-robot collaboration approach has not yet been able to maximize the value of the robot. Currently robots are mainly used as auxiliary tools, failing to become the main decision maker or processor in the system, which limits their potential and role in intelligent customer service.

Development method

One of the most common application scenarios for knowledge graphs is intelligent customer service, but its development process is complex and time-consuming. The development process usually includes the following steps:

Defining Requirements: Define the problems and target functions that need to be solved by the intelligent customer service system.
Building a Knowledge Graph: Create and organize knowledge graphs containing various entities and relationships to support the system's knowledge base.
Integrated Knowledge Graph: Integrate the constructed knowledge graph with the system to ensure smooth information flow.
Dialogue system design: Design a dialog system for intelligent customer service, including dialog flow, user interaction and response mechanism.
Testing and Optimization: Test the system to optimize its performance and accuracy to ensure it can efficiently answer user questions.
Deployment and maintenance: To put the system into practical use and to carry out ongoing maintenance and updating in order to respond to new needs and challenges.

Maintaining the knowledge graph is a very time-consuming and human resource-intensive part of the process. Even when using third-party services, it is still difficult for companies to set them up in a highly personalized way, especially for problem solutions that are specific to their organization. As a result, only large organizations can usually afford such solutions, while smaller sites or businesses are often unable to develop or implement such intelligent assistants.

A wave of AI emerges

In last year's time, the ChatGPT released by OpenAI can be said to have completely overturned the public's perception of intelligent customer service. Traditional intelligent customer service mainly focuses on solving two problems: firstly, it deals with standardized problems related to enterprises, and secondly, it is unable to communicate naturally and smoothly like humans. When encountering this kind of customer service, many people often prioritize manual service.

However, with the development of AI technology, the emergence of ChatGPT makes communication with intelligent customer service more natural and flexible. Users can ask questions as they wish, whether it is a technical question, a development problem, or a variety of issues within the enterprise, ChatGPT can provide detailed answers and suggestions. This ability not only improves the user experience, but also greatly broadens the application scope and effectiveness of intelligent customer service.

Cue word assistant

At this point, the field of intelligent customer service saw a new direction of improving its functionality by directly interfacing with APIs. However, it was initially discovered that incorporating the right prompt words could significantly improve the quality of the AI's answers. As a result, a wide variety of prompt words were devised to help the larger model perform better in different customer service scenarios.

As this trend intensifies, large enterprises and startups in China have invested in the development of large models, as if they would face elimination without their own large models in this wave. Against this backdrop, Tencent has also responded quickly by developing its own hybrid big model in order to gain a foothold in the fiercely competitive market.

Development method

At this stage, the development method of intelligent customer service has become relatively mature, and enterprises can customize intelligent customer service solutions by writing prompt words locally and providing relevant reference data. This approach makes it easier and more efficient to build an enterprise-specific customer service system.

However, even so, this strategy still fails to completely solve the problem of "serious nonsense" that often occurs in large models. Large models may still give inaccurate or unrealistic answers when dealing with certain complex or ambiguous questions, which to some extent limits their reliability and validity in practical applications.

AI's Wave of Plug-In Functionality

On March 23, 2023, OpenAI launched the ChatGPT plug-in system, which is designed with security at its core and allows ChatGPT to connect to a variety of third-party applications via plug-ins and perform a variety of operations, including retrieving real-time information, accessing knowledge bases, and performing various operations on behalf of users.

Thanks to the integration of the knowledge base plug-in, this system significantly improves the accuracy of the answer to the large model, making it possible to effectively solve about 90% of the "serious nonsense" problems through the combination of well-designed prompt words and plug-in functions. Such progress not only enhances the practicality of intelligent customer service, but also significantly improves its reliability in practical applications.

vector database

The wide application and hot trend of vector databases has only really appeared this year, which is closely related to the plug-in function introduced by OpenAI. Through this plug-in system, ChatGPT can utilize the ability of big models to access and process all kinds of data, thus greatly promoting the practical application of vector databases. The plug-in system not only enhances the data processing capability of the big model, but also promotes the application of vector databases in the fields of information retrieval and knowledge management, further promoting the innovation and development of data-driven technologies.

Tencent Cloud's investment in the field of vector databases stems from insights into market demand. After research, they found that many enterprises are already using vector databases, especially in the context of large models, which makes vector databases especially important as a solution.

Big models are trained based on publicly available data, while enterprises' private data is often not directly utilized. In order for big models to serve enterprises effectively, enterprises need to process data in two main ways: pre-training and fine-tuning. However, the cost and technical threshold of these two approaches are high, so not all enterprises can afford them. At this point, vector databases, as a lower-cost and easy-to-use solution, become the preferred choice for enterprises.

The core of vector databases lies in transforming information such as text and images into vector data and retrieving them through similarity calculations. This technology improves retrieval efficiency through indexing optimization, enabling large models to process data more quickly. Tencent has had many years of experience with vector databases internally and has transformed this experience into cloud service offerings, enabling vector databases to play a role in real-world applications.

Working Principle

A vector is a quantity used in mathematics and physics to represent magnitude and direction. It consists of an ordered set of values that represent the components of the vector on each axis.

Vector retrieval is an information retrieval method based on vector space modeling. Vector databases analyze the correlation between two vectors by calculating the similarity distance between them through a similarity calculation method. If two embedded vectors are very similar, it means that the original data source is also similar.

Intuitively, you can visualize all the unstructured data (e.g., text, images, etc.) in your knowledge base as vector data, since computers can only process numbers. Specifically, this unstructured data is converted into numeric vectors, for example:[0.2123,0.23,0.213]. This digital representation allows computers to perform efficient calculations and processing, thus making complex data analysis and retrieval feasible.

Development method

Although at this stage one has not yet fully adopted the function call form of the Big Model, there has been a significant shift in the approach to development within the company. The approach now is: first, by searching the vector database in advance for private in-house knowledge and providing this information to the Big Model; then, by combining prompt words to complete a normal round of intelligent quizzing. This approach utilizes the multi-round Q&A capability available in the big model itself, thus enabling efficient information retrieval and interactive Q&A.

In fact, this development model makes it easy for any business, with certain technical skills, to interface and implement similar smart applications. Regardless of the size or domain of the organization, it is possible to create an efficient intelligent Q&A system that improves efficiency and information processing by simply leveraging existing technology.

Wave of Intelligent Bodies

Arguably the hottest trend this year has been smart bodies. The rise of this trend is largely due to the fact that it significantly lowers the technical barrier for organizations to use large models on a technical level. As we mentioned before, while vector databases provide a technical solution, organizations still need to have technical teams to develop and implement them.

However, the emergence of Smartbody completely eliminates this concern. With Intelligent Body, users can directly upload the knowledge base manually using a graphical interface, thus eliminating the need for a complex development process. This intuitive operation not only simplifies the application of the technology, but also greatly reduces the time and cost for enterprises in deploying intelligent Q&A systems. This ease of use of the Smart Body enables all types of enterprises to realize intelligence more quickly and efficiently, promoting the widespread use of the technology.

Knowledge Base - Vector Database

Here, we will give a brief explanation of the knowledge base functions of various intelligent body platforms, using Tencent MetaWare as an example for demonstration. This process will help you better understand how intelligent body platforms manage and utilize knowledge bases, and how these features work in real-world applications.

The knowledge base here actually relies entirely on the powerful support of the vector database in the background. When we upload files, the system will automatically convert these files into corresponding vectors and insert these vectors into the vector database.

Vector databases play a key role in this process, which not only store these vectors, but also make subsequent retrieval and querying more efficient. In this way, the knowledge base is able to process user queries more accurately, realizing the efficient information retrieval and Q&A capabilities of the intelligences.

Next, you can maintain and update the knowledge base in a way that enables the intelligences to readily recall and effectively respond to questions posed by users. By regularly updating the knowledge base, you can ensure that the intelligences have access to the most current and relevant information, thereby improving the accuracy and effectiveness of their responses.

In this way, the intelligent body can not only quickly retrieve and process the information stored in the knowledge base, but also continuously adapt to changes in business needs and provide more intelligent and personalized services.

Development method

In this way, most companies can complete the development of intelligent customer service systems by simply interfacing with the API of the intelligent body. In this way, companies can realize the instant on-line and maintenance of the intelligent body without worrying about the complexities of server management or technology development. The process of docking the API is relatively simple, and companies only need to master the basic operation to start using it. In addition, intelligent bodies can be easily published to major platforms, such as WeChat subscription numbers, which further reduces the complexity of docking and integration.

Thus, the role of vector databases in data processing and management remains indispensable, despite the evolution of big model technology. The relationship between vector databases and big models reflects the separation of computation and storage needs, which will become a long-term trend. Although big models are improving, the role of vector databases in data retrieval, management and scheduling remains significant. It not only changes the way data is processed, making retrieval more natural and intuitive, but also brings a new paradigm for data management. Technological innovations in vector databases provide an efficient way to optimize data access and application, ensuring flexibility and efficiency in data processing.

summarize

Domestic databases have made remarkable progress at the beginning of the 21st century, exceeding the expectations of many people back then. Today, there are more than 300 different databases in China, a phenomenon that not only demonstrates China's innovation and breakthroughs in the field of databases, but also provides a wealth of choices for businesses and individuals. However, the evolution of technology does not stop there. In recent years, the rise of vector databases has created a new trend in data processing and storage.

The rise of vector databases has provided a strong complement to traditional databases. Its strength lies in its efficiency in handling complex and high-dimensional data, especially in the areas of large-scale data analysis, real-time retrieval, and intelligent recommendation. Traditional databases excel in structured data and transaction management, but their limitations in unstructured data processing, semantic search, and machine learning tasks are gradually emerging. The emergence of vector databases not only promotes the innovation of data storage and processing technology, but also forces traditional database systems to continuously adjust and adapt to new technical requirements and challenges.

This technological evolution is particularly evident in the field of intelligent customer service. Smart customer service has gradually shifted from an initially manual model to automation and intelligence. In the early days, organizations solved a large number of repetitive problems through manual customer service, but this approach was both time-consuming and expensive. As technology evolved, smart customer service systems gradually introduced advanced features such as natural language processing, knowledge base Q&A and chatbots. Intelligent customer service not only efficiently handles high-frequency questions, but also improves the user experience through machine learning and natural language processing technology. However, for complex and difficult questions, human customer service still has irreplaceable advantages.

With the rapid development of AI technology, especially the emergence of large models, the functions and performance of intelligent customer service systems have been further improved.The combination of OpenAI's ChatGPT, the intelligent body platform and the vector database has brought new application scenarios and possibilities for intelligent customer service. The introduction of intelligent bodies enables enterprises to more conveniently deploy and maintain intelligent customer service systems, and through docking APIs and optimizing the knowledge base, enterprises can achieve efficient intelligent services.

Looking ahead, the development of home-grown databases and vector databases will continue to drive innovation in data processing and storage technologies. As big model and intelligent body technologies continue to mature, enterprises will be able to better utilize these advanced tools to improve information processing efficiency and user service quality. The diversification of domestic databases and the technological innovation of vector databases not only mark China's continued progress in the field of data technology, but also contribute to the development of global technology. The evolution of intelligent customer service and the application of AI technology signal that we are moving towards a smarter and more efficient future.

I'm Rain, a Java server-side coder, studying the mysteries of AI technology. I love technical communication and sharing, and I am passionate about open source community. I'm also a Tencent Cloud Creative Star, an expert blogger in Ali Cloud, a Huawei Cloud Expert, and an excellent author in Nuggets.

💡 I won't be shy about sharing my personal explorations and experiences on the path of technology, in the hope that I can bring some inspiration and help to your learning and growth.

🌟 Welcome to the effortless drizzle! 🌟