Location>code7788 >text

Write an sscanf using C# interpolation string processor

Popularity:758 ℃/2025-02-16 17:05:58

Interpolation string processor

C# has a feature called interpolation string. Using interpolation strings, you can naturally insert the value of a variable into the string, such as:$"abc{x}def", this change was passed in the pastThe way to format strings is to no longer need to pass a string template first and then pass parameters one by one, which is very convenient.

Going further on the basis of interpolated strings, C# supports interpolated string processors, which means you can customize the interpolation behavior of strings. For example, a simple example:

[InterpolatedStringHandler]
struct Handler(int literalLength, int formattedCount)
{
    public void AppendLiteral(string s)
    {
        ($"Literal: '{s}'");
    }

    public void AppendFormatted<T>(T v)
    {
        ($"Value: '{v}'");
    }
}

When using it, you only need to pass itstringChange the parameters to thisHandlerTypes can handle interpolated strings in the way you customize. Our interpolated strings will be automatically converted to by the C# compilerHandlerThe constructs and calls are then passed in:

void Foo(Handler handler) { }
var x = 42;
Foo($"abc{x}def");

For example, in the above example, you will get the output:

Literal: 'abc'
Value: '42'
Literal: 'def'

This greatly facilitates the processing of various structured log frameworks. You only need to simply pass the interpolated string in. The log framework can perform structured parsing according to the way you interpolate, thus completely avoiding manual deformatting of characters string.

Interpolation string processor with parameters

In fact, the interpolation string processor of C# also supports additional parameters:

[InterpolatedStringHandler]
struct Handler(int literalLength, int formattedCount, int value)
{
    public void AppendLiteral(string s)
    {
        ($"Literal: '{s}'");
    }

    public void AppendFormatted<T>(T v)
    {
        ($"Value: '{v}'");
    }
}

void Foo(int value, [InterpolatedStringHandlerArgument("value")] Handler handler) { }
Foo(42, $"abc{x}def");

So,42Will be transmitted inhandlerofvalueAmong the parameters, this allows us to capture the context from the caller. After all, in the log scenario, it is common to determine different formats based on different parameters.

sscanf?

As we all know, there is a very commonly used function in C/C++sscanf, it accepts a text input and a formatted template, and then passes a reference to the variable in the formatted part, and parses the value of the variable:

const char* input = "test 123 test";
const char* template = "test %d test";
int v = 0;
sscanf(input, template, &v);
printf("%d\n", v); // 123

So can we copy one in C#? sure! It only takes a little bit of black magic.

Implement sscanf with C#

First we make an interpolation string processor with parameters:

[InterpolatedStringHandler]
ref struct TemplatedStringHandler(int literalLength, int formattedCount, ReadOnlySpan<char> input)
{
    private ReadOnlySpan<char> _input = input;

    public void AppendLiteral(ReadOnlySpan<char> s)
    {
    }

    public void AppendFormatted<T>(T v) where T : ISpanParsable<T>
    {
    }
}

Here we put allstringAll changed toReadOnlySpan<char>Reduce allocation.

according tosscanfWe should use it in theory to make something like this:

void sscanf(ReadOnlySpan<char> input, ReadOnlySpan<char> template, params object[] args);

But obviously, what we need here is(ref object)[], because we need to pass references to update external variables, instead of directly treating the value of the variable asobjectPass it in. So what should I do?

You will find that the interpolation string processor of C# already contains the values ​​of each variable, so we don't need to pass similar things like C/C++%dPlaceholders like this to insert variables! Relative to"test %d test"We can write directly$"test {v} test", and then pass this by referencev

A very natural idea is that we just need toAppendFormatted<T>(T v)Change toAppendFormatted<T>(ref T v)Not it.

However, after actually doing this, you will find that this does not work:

[InterpolatedStringHandler]
ref struct TemplatedStringHandler(int literalLength, int formattedCount, ReadOnlySpan<char> input)
{
    private ReadOnlySpan<char> _input = input;

    public void AppendLiteral(ReadOnlySpan<char> s)
    {
    }

    public void AppendFormatted<T>(ref T v) where T : ISpanParsable<T>
    {
    }
}

void sscanf(ReadOnlySpan<char> input, [InterpolatedStringHandlerArgument("input")] TemplatedStringHandler template);

When we try to callsscanfWhen:

int v = 0;
sscanf("test 123 test", $"test {ref v} test"); // error CS1525: Invalid expression term 'ref'

An error has been reported! Write in the value part of the interpolated stringrefThe keyword is invalid!

Note that this error is from the parser of the C# compiler, which means that we can syntactically take thisrefKill it, and then it can be compiled.

At this moment, we had a sudden inspiration, we didn't have C#inTo pass read-only references? C# ForinPassing read-only references will automatically create references and pass them in, without explicitly specifying them in syntax.refSo let's use this feature to transform it:

[InterpolatedStringHandler]
ref struct TemplatedStringHandler(int literalLength, int formattedCount, ReadOnlySpan<char> input)
{
    private ReadOnlySpan<char> _input = input;

    public void AppendLiteral(ReadOnlySpan<char> s)
    {
    }

    public void AppendFormatted<T>(in T v) where T : ISpanParsable<T>
    {
    }
}

Then you will find that the following code can be successfully compiled:

int v = 0;
sscanf("test 123 test", $"test {v} test");

At this time, we only have the last step to success: the read-only reference is passed in, but in order to extract the variable, we need to update the referenced value, what should we do?

Fortunately, we haveConvert read-only references to variable references, and then the last problem is solved, we can start our implementation.

[InterpolatedStringHandler]
 ref struct TemplatedStringHandler(int literalLength, int formattedCount, ReadOnlySpan<char> input)
 {
     private int _index = 0;
     private ReadOnlySpan<char> _input = input;

     public void AppendLiteral(ReadOnlySpan<char> s)
     {
         var offset = Advance(0); // Skip consecutive whitespace characters first
         _input = _input[offset..];
         _index += offset;
  
         if (_input.StartsWith(s)) // Remove the non-variable part of the template string from the input string
         {
             _input = _input[..];
         }
         else throw new FormatException($"Cannot find '{s}' in the input string (at index: {_index}).");

         _index += ;
         literalLength -= ;
     }

     public void AppendFormatted<T>(in T v) where T : ISpanParsable<T>
     {
         var offset = Advance(0); // Skip consecutive whitespace characters first
         _input = _input[offset..];
         _index += offset;

         var length = Scan(); // Calculate the length until the next whitespace character
         if ((_input[..length], null, out var result)) // parse!
         {
             (in v) = result; // Change the read-only reference to a variable reference and update the reference value
             _input = _input[length..];
             _index += length;
             formattedCount--;
         }
         else
         {
             throw new FormatException($"Cannot parse '{_input[..length]}' to '{typeof(T)}' (at index: {_index}).");
         }
     }

     // Scan backward until the blank character stops
     private int Scan()
     {
         var length = 0;
         for (var i = 0; i < _input.Length; i++)
         {
             if (_input[i] is ' ' or '\t' or '\r' or '\n') break;
             length++;
         }
         return length;
     }

     // Skip all whitespace characters
     private int Advance(int start)
     {
         var length = start;
         while (length < _input.Length && _input[length] is ' ' or '\t' or '\r' or '\n')
         {
             length++;
         }
         return length;
     }
 }

Then we provide asscanfExpose our interpolation string processor:

static void sscanf(ReadOnlySpan<char> input, [InterpolatedStringHandlerArgument("input")] TemplatedStringHandler template) { }

use

int x = 0;
string y = "";
bool z = false;
DateTime d = default;
sscanf("test 123 hello false 2025/01/01T00:00:00 end", $"test{x}{y}{z}{d}end");
(x);
(y);
(z);
(d);

Get the output:

123
 hello
 False
 January 1, 2025 0:00:00

andscanfIt's justsscanf((), template)Just abbreviation, so here we havesscanfThat's totally enough.

in conclusion

The interpolation string processor of C# is very powerful. Using this feature, we have successfully implemented it in C/C++.sscanfThere are also many string parsing functions that need to be used better. Not only does it not require formatting string placeholding, but it also directly saves the syntax of reference passing.