LeetCode Problem Set-3 - Longest Substring without Repeated Characters

Question : Given a string s, find the length of the longest substring that does not contain repeated characters.

Let's understand the question well, how do you get length 3 in example 1?

If the longest substring without repetition, starting with the first character a, is abc; we thus express (a)bcabcbb -> (abc)abcbb, and so express the enumeration of all possible cases as follows:

1.(a)bcabcbb -> (abc)abcbb；

(b)cabcbb -> a(bca)bcbb；

(c)abcbb -> ab(cab)cbb；

(a)bcbb -> abc(abc)bb；

(b)cbb -> abca(bc)bb；

(c)bb -> abcab(cb)b；

(b)b -> abcabc(b)b；

(b) -> abcabcb(b);

The three longest substrings that satisfy the condition in all possible cases are abc, bca, and cab, and all three are of length 3, so the result of Example 1 is 3.

01Solving method I. Double-pointer method

Through the above enumeration of all cases, it can be found to meet the requirements of the string from the starting position to the end of the position of the scroll, and in the process, the length of the string is also changing, that is, as long as we are ready to prepare the two pointers start and end, and to control the rhythm of the two pointers forward to complete the task.

So how do you control the pointer rhythm?

First of all, the second pointer end, we add the steps in 1.(a)bcabcbb -> (abc)abcbb above, it should be 1.(a)bcabcbb -> (ab)cabcbb -> (abc)abcbb, that is, the pointer end step by step backward, even if it encounters a repeat of the character is still steadily moving forward.

Whenever the pointer end moves back one bit, simply determine if this bit has ever appeared in a previous string, and if it has then start adjusting the pointer start.

For example, the above 1->2 that (abc)abcbb -> a(bca)bcbb process, when the pointer end to the second a, and the previous substring abc has appeared in the a, so you need to put the pointer start jump to the b that is, to jump to the substring in the duplicate character after a position.

We use the legend to describe in detail the exact process of moving the pointer from end to start.

Take a look at the implementation code below.

public static int SlidingWindow(string s)
{
    //start pointer
    var startIndex = 0;
    //end pointer
    var endIndex = 0; //endIndex = 0; //endIndex = 0
    // length of the current non-repeating substring
    var currentLength = 0; //longest unrepeated substring length
    //longest unrepeated substring length
    var maxLength = 0; //keep processing until the end pointer is not less than the length of the substring.
    // Process until the end pointer is not less than the length of the string
    while (endIndex < )
    {
        //get the pending character
        var pendingChar = s[endIndex]; //determine if the pending string is in the current string; //find out if the end pointer is in the current string
        // Determine if the pending string exists in the current substring
        for (var i = startIndex; i < endIndex; i++)
        {
            //If the pending character already exists in the substring
            if (pendingChar == s[i])
            {
                // jump the start pointer to the next position of the repeating character in the substring
                startIndex = i + 1;
                // recalculate the length of the current non-repeating substring
                currentLength = endIndex - startIndex; //calculate the current non-repeating substring length again.
                currentLength = endIndex - startIndex; break; }
            }
        }
        //Move the end pointer back one place.
        endIndex++; }
        //The length of the current non-repeating substring is increased by one.
        currentLength++; //compare and update the maximum non-repeating substring length.
        //Compare and update the maximum length of the non-repeating substring.
        if (currentLength > maxLength)
        {
            maxLength = currentLength; }
        }
    }
    return maxLength; }
}

The analysis shows that the algorithm time complexity is because it is a double loop while+for:O(N2), again because there is no reference to extra space therefore the space complexity is:O(1)。

02Solving method two, double pointer + hash method

There are still ways to optimize for double loops, the most common practice is space for time, i.e., the inner loop is swapped out by hash table substitution so that the hash table providesO(1) The query time complexity, which makes the whole algorithm time complexity ofO(N). But hash tables require additionalO(N) Space

If a hash table is used to store already existing characters, how should it be stored, what is stored in key and what is stored in value? Here is a question is the hash table only store the current substring of characters? Or store all the existing characters? If only the current substring of characters means that each time to clear the hash table, and clear the action time complexity isO(N) So we choose to save all existing characters.

If you store all the existing characters, you should be careful to judge the invalid data, for example, in abc(ba)b we can't compare the last b with the first b, because the current substring is (ba), so we should do the judgment with the second b.

The realization code is as follows:

public static int SlidingWindowDictionary(string s)
{
    //start pointer
    var startIndex = 0;
    //end pointer
    var endIndex = 0; //endIndex = 0
    // length of the current non-repeating substring
    var currentLength = 0; //longest unrepeated substring length
    //longest unrepeated substring length
    var maxLength = 0; //Dictionary table, stores existing characters.
    // Dictionary table, store existing characters.
    var dic = new Dictionary<char, int>();; //To be processed until end pointer.
    // Keep processing until the end pointer is not less than the length of the string
    while (endIndex < )
    {
        // Get the pending character
        var pendingChar = s[endIndex];; //Determine if the pending character is in the string.
        // Determine if the pending character exists in the dictionary table and its index position is in the current substring
        if ((pendingChar, out var value) && value >= startIndex)
        {
            // jump the start pointer to the next position of the repeating character in the substring
            startIndex = value + 1; // recalculate the current non-repeating substring.
            // Recalculate the length of the current non-repeating substring.
            currentLength = endIndex - startIndex; }
        }
        //Update the last index position of a character that already exists in the dictionary table
        dic[pendingChar] = endIndex; //end pointer backward.
        //Move the end pointer back one place.
        endIndex++; //current non-repeating substring length
        //The length of the current non-repeating substring is increased by one.
        currentLength++; //compare and update the maximum unrepeated substring length.
        //Compare and update the maximum non-repeating substring length.
        if (currentLength > maxLength)
        {
            maxLength = currentLength; }
        }
    }
    return maxLength; }
}

03Solving method three, double pointer + array method

So is there any room for optimization of this algorithm? We know that hash table operations are consumptive, is there a better way to store them than hash tables?

For different questions may have different ways, for this question, indeed a little special, I do not know if you have noticed the question at the bottom of the "s consists of letters of the alphabet, numbers, symbols and spaces" description, which can not help but make me think of the ASCII code table.

If s is composed of characters in the ASCII code table, then it means that each character has a corresponding decimal value, which is the natural subscript, and then all the number of ASCII code table to build an array of characters used to store the characters that already exist, and each character to store the location of its corresponding decimal value, so that the problem of storage can not be solved?

Because we constructed the array first, we also need to assign a value of -1 to each element of the array to mark that the current element is not yet in use.

The specific implementation code is as follows:

public static int SlidingWindowArray(string s)
{
    //start pointer
    var startIndex = 0;
    //end pointer
    var endIndex = 0; //endIndex = 0
    // length of the current non-repeating substring
    var currentLength = 0; //longest unrepeated substring length
    //longest unrepeated substring length
    var maxLength = 0; //define an array of possible characters, all of which are to be repeated.
    // Define an array of possible characters and fill them all to -1
    var arr = new int[128];
    (arr, -1);
    // Keep processing until the end pointer is not less than the length of the string
    while (endIndex < )
    {
        //get the pending character
        var pendingChar = s[endIndex]; //determine if the pending character index position is at the current string length
        // Determine if the index position of the pending character is within the current substring
        if (arr[pendingChar] >= startIndex)
        {
            // jump the start pointer to the next position of the repeating character in the substring
            startIndex = arr[pendingChar] + 1; // recalculate the current non-repeating substring.
            // Recalculate the length of the current non-repeating substring
            currentLength = endIndex - startIndex; // recalculate the current non-repeating substring length.
        }
        //Update the last index position of the existing character in the array
        arr[pendingChar] = endIndex; //end pointer backward.
        // Move the end pointer back one place.
        endIndex++; //Current non-repeating substring length
        //The length of the current non-repeating substring is increased by one.
        currentLength++; //compare and update the maximum unrepeated substring length.
        //Compare and update the maximum length of the non-repeating substring.
        if (currentLength > maxLength)
        {
            maxLength = currentLength; }
        }
    }
    return maxLength; }
}

Although three solution methods have been implemented, how does it really perform? Below we perform a set of benchmark tests on the three methods, testing each method 10,000 times, each time constructing a random string of length 10,000.

can be found double pointer hash table than simple double pointer performance is still much worse, and double pointer + array overall performance is much better. This shows that the double pointer hash table still has its limitations, although the theoretical value is very good, but the actual performance is not satisfactory, which also reminds us to use the right method in the right place, in order to better solve the problem.

classifier for sums of money：The test method code as well as the sample source code have been uploaded to the code repository for those who are interested./hugogoos/Planner