Location>code7788 >text

[WPF] Implement AutoCompleteBox with Pinyin Fuzzy Search in offline environment.

Popularity:785 ℃/2024-07-24 20:39:45

AutoCompleteBox is a common component to improve the efficiency of input, many third-party control libraries for WPF provide this component, but basically it is a substring matching of strings, and does not support pinyin fuzzy matching, for example, it is not possible to match a string by inputting aldhmaybeliudehuamatch toAndy Lau. To realize the pinyin fuzzy search function, we usually adopt the techniques of word splitting and database to preprocess the data set to be matched. In some scenarios, the data cannot be preprocessed due to the limitation of conditions, this article will introduce how to realize AutoCompleteBox to support pinyin fuzzy search in this case, let's take a look at the effect of the realization first.
image

Key ideas

There is no AutoCompleteBox control in WPF, we can use theTextBoxEnter the search content withPopup+ListBoxDisplay the content of the matched hints. Pinyin fuzzy matching of Chinese characters is solved by string matching, that is, the search string and the content of the dataset to be matched are all converted to pinyin strings, and then substring matching is performed. There are three problems to be solved here.

  1. Convert Chinese characters to pinyin.
  2. How the pinyin matches. for exampleldhlidhldhualiudehuadhuahuaetc. can be matched toAndy Lau
  3. The matched content is highlighted. When inputting thedhuamatch toAndy LauYou need to put theDehua (name)Two words highlighted.

convert from Chinese characters to pinyin

Microsoft provides Microsoft Visual Studio International Pack for developers to realize the internationalization language conversion, this extension pack has Chinese, Japanese, Korean, English and other national language packs, and provides methods to realize the conversion, get the pinyin, get the number of characters, and even get the number of strokes and so on. DownloadMicrosoft Visual Studio International Pack 1.0 SR1After installation, find the, and then add a reference to it in the project.
When getting the pinyin of a Chinese character, only a single character can be passed in, so the string can only be broken down into individual characters. Due to the polyphony of Chinese characters and the lack of semantic information, the pinyin combination obtained may be multiple, for example, by typingYangtze River, or Chang JiangThe return of thechangjiangrespond in singingzhangjiang. The method of converting Chinese characters to Pinyin is as follows:

/// <summary>
/// Getting the pinyin of a Chinese character
/// </summary>
/// <param name="str">String containing Chinese characters to be processed</param>
/// <param name="split">phonetic separator</param>
/// <returns></returns>
public static List<string> GetChinesePhoneticize(string str, string split = "")
{
    List<string> result = new List<string>();
    char[] chs = ();
    Dictionary<int, List<string>> totalPhoneticizes = new Dictionary<int, List<string>>();
    for (int i = 0; i < ; i++)
    {
        var phoneticizes = new List<string>();
        if ((chs[i]))
        {
            ChineseChar cc = new ChineseChar(chs[i]);
            ((r => !(r)).ToList<string>().ConvertAll(p => (p, @"\d", "").ToLower()).Distinct());
        }
        else
        {
            (chs[i].ToString());
        }
        if (())
            totalPhoneticizes[i] = phoneticizes;
    }

    foreach (var phoneticizes in totalPhoneticizes)
    {
        var items = ;
        if ( <= 0)
        {
            result = items;
        }
        else
        {
            var newtotalPhoneticizes = new List<string>();
            foreach (var totalPingYin in result)
            {
                ((item => totalPingYin + split + item));
            }
            newtotalPhoneticizes = ().ToList();
            result = newtotalPhoneticizes;
        }
    }
    return result;
}

pinyin matching algorithm

There are several groups of pinyin strings after conversion of Chinese characters, as long as there is a group of pinyin combinations converted by the search string that matches the pinyin combinations converted by the string to be matched, the match is considered successful, and in order to follow up with the highlighting, it is necessary to record the starting position of the match as well as the length of the matched sub-string. The code is as follows:

public static bool fuzzyMatchChar(string character, string input, out int matchStart, out int matchCount)
{
    List<string> regexs = GetChinesePhoneticize(input);
    List<string> targetStr = GetChinesePhoneticize(character, " ");
    matchStart = -1;
    matchCount = 0;
    foreach (string regex in regexs)
    {
        foreach (string target in targetStr)
        {
            if (PhoneticizeMatch(regex, (' '), out matchStart, out matchCount))
                return true;
        }
    }
    return false;
}

Here.PhoneticizeMatchmethod is the core of the pinyin matching algorithm, and is used in the[Algorithm] Pinyin Matching AlgorithmThe algorithm in this blog post is slightly modified on the basis of the detailed ideas and illustrations can be read in this blog post.

Highlight matching substrings

WPF can be accessed via theTextEffect(used form a nominal expression)PositionStartPositionCountas well asForegroundattribute sets the start position, length and highlight color of the highlighted content in the string. In the previous Pinyin Matching Algorithm, the starting position and length of the matched substring were obtained in preparation for this. In the previousWPF using TextBlock to achieve the search results highlightingThe article has a detailed description of the ideas and code, will not repeat here.

wrap-up

This paper describes how to realize pinyin fuzzy search and highlight the target string without relying on the database and word segmentation, and there are many deficiencies in the method that need to be improved.

  1. There are mismatches in the matching strategy. For example, the inputstone inscriptioncan match the pinyin ofshiof all Chinese characters.
  2. The matching algorithm is not efficient enough. During the test, 500 pieces of data were simulated in the data set to be matched, and the matching took about 400-500ms.

code example

ChinesePhoneticizeFuzzyMatch