Interpolation string processor
C# has a feature called interpolation string. Using interpolation strings, you can naturally insert the value of a variable into the string, such as:$"abc{x}def"
, this change was passed in the pastThe way to format strings is to no longer need to pass a string template first and then pass parameters one by one, which is very convenient.
Going further on the basis of interpolated strings, C# supports interpolated string processors, which means you can customize the interpolation behavior of strings. For example, a simple example:
[InterpolatedStringHandler]
struct Handler(int literalLength, int formattedCount)
{
public void AppendLiteral(string s)
{
($"Literal: '{s}'");
}
public void AppendFormatted<T>(T v)
{
($"Value: '{v}'");
}
}
When using it, you only need to pass itstring
Change the parameters to thisHandler
Types can handle interpolated strings in the way you customize. Our interpolated strings will be automatically converted to by the C# compilerHandler
The constructs and calls are then passed in:
void Foo(Handler handler) { }
var x = 42;
Foo($"abc{x}def");
For example, in the above example, you will get the output:
Literal: 'abc'
Value: '42'
Literal: 'def'
This greatly facilitates the processing of various structured log frameworks. You only need to simply pass the interpolated string in. The log framework can perform structured parsing according to the way you interpolate, thus completely avoiding manual deformatting of characters string.
Interpolation string processor with parameters
In fact, the interpolation string processor of C# also supports additional parameters:
[InterpolatedStringHandler]
struct Handler(int literalLength, int formattedCount, int value)
{
public void AppendLiteral(string s)
{
($"Literal: '{s}'");
}
public void AppendFormatted<T>(T v)
{
($"Value: '{v}'");
}
}
void Foo(int value, [InterpolatedStringHandlerArgument("value")] Handler handler) { }
Foo(42, $"abc{x}def");
So,42
Will be transmitted inhandler
ofvalue
Among the parameters, this allows us to capture the context from the caller. After all, in the log scenario, it is common to determine different formats based on different parameters.
sscanf?
As we all know, there is a very commonly used function in C/C++sscanf
, it accepts a text input and a formatted template, and then passes a reference to the variable in the formatted part, and parses the value of the variable:
const char* input = "test 123 test";
const char* template = "test %d test";
int v = 0;
sscanf(input, template, &v);
printf("%d\n", v); // 123
So can we copy one in C#? sure! It only takes a little bit of black magic.
Implement sscanf with C#
First we make an interpolation string processor with parameters:
[InterpolatedStringHandler]
ref struct TemplatedStringHandler(int literalLength, int formattedCount, ReadOnlySpan<char> input)
{
private ReadOnlySpan<char> _input = input;
public void AppendLiteral(ReadOnlySpan<char> s)
{
}
public void AppendFormatted<T>(T v) where T : ISpanParsable<T>
{
}
}
Here we put allstring
All changed toReadOnlySpan<char>
Reduce allocation.
according tosscanf
We should use it in theory to make something like this:
void sscanf(ReadOnlySpan<char> input, ReadOnlySpan<char> template, params object[] args);
But obviously, what we need here is(ref object)[]
, because we need to pass references to update external variables, instead of directly treating the value of the variable asobject
Pass it in. So what should I do?
You will find that the interpolation string processor of C# already contains the values of each variable, so we don't need to pass similar things like C/C++%d
Placeholders like this to insert variables! Relative to"test %d test"
We can write directly$"test {v} test"
, and then pass this by referencev
。
A very natural idea is that we just need toAppendFormatted<T>(T v)
Change toAppendFormatted<T>(ref T v)
Not it.
However, after actually doing this, you will find that this does not work:
[InterpolatedStringHandler]
ref struct TemplatedStringHandler(int literalLength, int formattedCount, ReadOnlySpan<char> input)
{
private ReadOnlySpan<char> _input = input;
public void AppendLiteral(ReadOnlySpan<char> s)
{
}
public void AppendFormatted<T>(ref T v) where T : ISpanParsable<T>
{
}
}
void sscanf(ReadOnlySpan<char> input, [InterpolatedStringHandlerArgument("input")] TemplatedStringHandler template);
When we try to callsscanf
When:
int v = 0;
sscanf("test 123 test", $"test {ref v} test"); // error CS1525: Invalid expression term 'ref'
An error has been reported! Write in the value part of the interpolated stringref
The keyword is invalid!
Note that this error is from the parser of the C# compiler, which means that we can syntactically take thisref
Kill it, and then it can be compiled.
At this moment, we had a sudden inspiration, we didn't have C#in
To pass read-only references? C# Forin
Passing read-only references will automatically create references and pass them in, without explicitly specifying them in syntax.ref
So let's use this feature to transform it:
[InterpolatedStringHandler]
ref struct TemplatedStringHandler(int literalLength, int formattedCount, ReadOnlySpan<char> input)
{
private ReadOnlySpan<char> _input = input;
public void AppendLiteral(ReadOnlySpan<char> s)
{
}
public void AppendFormatted<T>(in T v) where T : ISpanParsable<T>
{
}
}
Then you will find that the following code can be successfully compiled:
int v = 0;
sscanf("test 123 test", $"test {v} test");
At this time, we only have the last step to success: the read-only reference is passed in, but in order to extract the variable, we need to update the referenced value, what should we do?
Fortunately, we haveConvert read-only references to variable references, and then the last problem is solved, we can start our implementation.
[InterpolatedStringHandler]
ref struct TemplatedStringHandler(int literalLength, int formattedCount, ReadOnlySpan<char> input)
{
private int _index = 0;
private ReadOnlySpan<char> _input = input;
public void AppendLiteral(ReadOnlySpan<char> s)
{
var offset = Advance(0); // Skip consecutive whitespace characters first
_input = _input[offset..];
_index += offset;
if (_input.StartsWith(s)) // Remove the non-variable part of the template string from the input string
{
_input = _input[..];
}
else throw new FormatException($"Cannot find '{s}' in the input string (at index: {_index}).");
_index += ;
literalLength -= ;
}
public void AppendFormatted<T>(in T v) where T : ISpanParsable<T>
{
var offset = Advance(0); // Skip consecutive whitespace characters first
_input = _input[offset..];
_index += offset;
var length = Scan(); // Calculate the length until the next whitespace character
if ((_input[..length], null, out var result)) // parse!
{
(in v) = result; // Change the read-only reference to a variable reference and update the reference value
_input = _input[length..];
_index += length;
formattedCount--;
}
else
{
throw new FormatException($"Cannot parse '{_input[..length]}' to '{typeof(T)}' (at index: {_index}).");
}
}
// Scan backward until the blank character stops
private int Scan()
{
var length = 0;
for (var i = 0; i < _input.Length; i++)
{
if (_input[i] is ' ' or '\t' or '\r' or '\n') break;
length++;
}
return length;
}
// Skip all whitespace characters
private int Advance(int start)
{
var length = start;
while (length < _input.Length && _input[length] is ' ' or '\t' or '\r' or '\n')
{
length++;
}
return length;
}
}
Then we provide asscanf
Expose our interpolation string processor:
static void sscanf(ReadOnlySpan<char> input, [InterpolatedStringHandlerArgument("input")] TemplatedStringHandler template) { }
use
int x = 0;
string y = "";
bool z = false;
DateTime d = default;
sscanf("test 123 hello false 2025/01/01T00:00:00 end", $"test{x}{y}{z}{d}end");
(x);
(y);
(z);
(d);
Get the output:
123
hello
False
January 1, 2025 0:00:00
andscanf
It's justsscanf((), template)
Just abbreviation, so here we havesscanf
That's totally enough.
in conclusion
The interpolation string processor of C# is very powerful. Using this feature, we have successfully implemented it in C/C++.sscanf
There are also many string parsing functions that need to be used better. Not only does it not require formatting string placeholding, but it also directly saves the syntax of reference passing.