Strings are a type that every programming language can't get around.
However, inRust
You'll see a far richer variety of string types than in any other language.
Below:
for what reason?Rust
What about the need for so many types of string representation in the
beginnerRust
When you do, you may not be able to understand why it was designed this way? Why bring so much unnecessary complexity to using strings?
Actually.Rust
The design for strings in the prioritization of thesurety,efficientlycap (a poem)versatile,
So in terms of ease of use, it doesn't feel as easy to understand and master as other languages (e.g. python, golang).
This article attempts to explainRust
All of the different string types in the
Hopefully this will lead to a better understandingRust
How strings are handled for safety and maximum performance at the same time.
1. Strings in machines
Thestring (computer science)ordigital (electronics etc)that are stored in the machine arebinary system (math), that is, a sequence of 0s and 1s.
Two key pieces of information are required for a program to convert binary data into a human-readable string:
- character encoding
- String length
Common codes areASCII
,UTF-8
Wait a minute. The encoding is the character that corresponds to the binary sequence.
For example.ASCII
be8-bit binarycorresponds to one character, so it can only represent at most256
different kinds of characters.
(indicates contrast)UTF-8
It is possible to use8-bit ~ 32-bitbinary to represent a character, which means it can encode more than a million characters.
Includes complex characters such as every language in the world and a variety of emoticons.
pass (a bill or inspection etc)character encodingWe can convert binary and characters to each other.
funnelString lengthinformation, we'll know when to stop when we convert a binary in memory to a string.
Rust
The strings in theUTF-8
encoding, the following one by one to introduce the various string types and their use scenarios.
2. String and &str
String
cap (a poem)&str
beRust
The two most used types of strings are also the two types that are easily confused in use.
String
are allocated on the heap, growable UTF-8 strings.
It owns the underlying data and is automatically cleaned and released beyond its defined scope.
let my_string = String::from("databook");
println!(
"pointer: {:p}, length: {}, capacity: {}",
&my_string,
my_string.len(),
my_string.capacity()
);
For aString
, there are 3 main parts:
-
Pointer
: points to the start of the string in heap memory -
Length
: Length of valid string -
Capacity
: Stringmy_string
Total space occupied
Watch this space.Length
respond in singingCapacity
The difference between theLength
bemy_string
The length of the valid characters in the string, i.e. the actual length of the string;
Capacity
Indicates that the system ismy_string
Allocated memory space, in generalCapacity >= Length
。
Usually does not require direct treatmentCapacity
, but its presence is essential for writing efficient and resource-sensitiveRust
It's important when it comes to code.
Especially, when you know that you're about to make a move toString
When adding a large amount of content, it may be possible to manually reserve in advance enoughCapacity
to avoid multiple memory reallocations.
&str
is then a slice of a string that represents a sequence of consecutive characters.
It's aType of loan, does not hold string data, and contains only a pointer to the beginning of the slice and the length of the slice.
let my_str: &str = "databook";
println!("pointer: {:p}, length: {}", &my_str, my_str.len());
Attention.&str
hasn'tCapacity
method, as it is just a borrowing and the content is not likely to increase.
Finally, forString
cap (a poem)&str
, recommended when used:
- To dynamically create or modify string data at runtime, use the
String
- To read or analyze string data without changing it, use the
&str
3. Vec[u8] and &[u8]
These are two forms of representing a string as a bit byte, where theVec[u8]
is a vector of bytes.&[u8]
It's byte slicing.
They simply convert individual characters in a string into byte form.
as_bytes
method can be used to convert the&str
convert to&[u8]
;
into_bytes
method can be used to convert theString
convert toVec<u8>
。
let my_str: &str = "databook";
let my_string = String::from("databook");
let s: &[u8] = my_str.as_bytes();
let ss: Vec<u8> = my_string.into_bytes();
println!("s: {:?}", s);
println!("ss: {:?}", ss);
/* running result
s: [100, 97, 116, 97, 98, 111, 111, 107]
ss: [100, 97, 116, 97, 98, 111, 111, 107]
*/
In UTF-8 encoding, each letter of the alphabet corresponds to1 byteand one Chinese character corresponds to3 bytes。
let my_str: &str = "Chinese";;
let my_string = String::from("Chinese");
let s: &[u8] = my_str.as_bytes();
let ss: Vec<u8> = my_string.into_bytes();
println!("s: {:?}" , s);
println!("ss: {:?}" , ss);
/* Running results
ss: [228, 184, 173, 230, 150, 135]
ss: [228, 184, 173, 230, 150, 135]
*/
Vec[u8]
cap (a poem)&[u8]
Storing strings as bytes, without caring about the specific encoding of the string, the
This is useful when transferring binary files or packets over a network to effectively transfer how many bytes at a time.
4. str series
str
The type itself can't be used directly because its size can't be determined at compile time and doesn't conform to theRust
The safety rules of the
However, it can be used with other pointer types that have special purposes.
4.1. Box<str>
If ownership of a string slice is required (&str
is borrowed and has no ownership), then it is possible to use theBox
Smart Pointer.
It is useful when you want to freeze strings to prevent further modifications or save memory by removing extra capacity.
For example, in the code below, we have placed aString
convert toBox<str>
,
This ensures that it won't be modified elsewhere, and it can also be deleted because theBox<str>
Ownership of the string.
let my_string = String::from("databook");
let my_box_str = my_string.into_boxed_str();
println!("{}", my_box_str);
// This is an error because the ownership has been transferred.
// This is the difference between Box<str> and &str.
// println!("{}", my_string); // This is the difference between Box<str> and &str.
4.2. Rc<str>
When you want to be in more than one placeenjoy togetherAn immutable string ofpossessionbut does not clone the actual string data when the
You can try using theRc<str>
Smart Pointer.
For example, if we have a very large text that we want to use in more than one place and we don't want to make multiple copies to take up memory, we can use theRc<str>
。
let my_str: &str = "very long text ....";
let rc_str1: Rc<str> = Rc::from(my_str);
let rc_str2 = Rc::clone(&rc_str1);
let rc_str3 = Rc::clone(&rc_str1);
println!("rc_str1: {}", rc_str1);
println!("rc_str2: {}", rc_str2);
println!("rc_str3: {}", rc_str3);
/* running result
rc_str1: very long text ....
rc_str2: very long text ....
rc_str3: very long text ....
*/
This allows multiple variables to have ownership of the string data without actually cloning it.
4.3. Arc<str>
Arc<str>
together withRc<str>
The main difference is that theArc<str>
is thread-safe.
If in a multi-threaded environment, use theArc<str>
。
let my_str: &str = "very long text ....";
let arc_str: Arc<str> = Arc::from(my_str);
let mut threads = vec![];
let mut cnt = 0;
while cnt < 5 {
let s = Arc::clone(&arc_str);
let t = thread::spawn(move || {
println!("thread-{}: {}", cnt, s);
});
(t);
cnt += 1;
}
for t in threads {
().unwrap();
}
/* running result
thread-0: very long text ....
thread-3: very long text ....
thread-2: very long text ....
thread-1: very long text ....
thread-4: very long text ....
*/
In the above code, the string data is shared among 5 threads.
on top ofrunning resultIn this case, the thread order is not fixed, and it will be different for more executions.
4.4. Cow<str>
Cow
beCopy-on-Write
(copy-on-write) abbreviation.
When you need to implement a feature that determines whether or not you need to modify a string based on its contents, use theCow
It would be appropriate.
For example, when filtering for sensitive words, we replace sensitive words withxx
。
fn filter_words(input: &str) -> Cow<str> {
if ("sb") {
let output = ("sb", "xx");
return Cow::Owned(output);
}
Cow::Borrowed(input)
}
When entering the stringinput
Contains sensitive wordssb
When it does, memory is reallocated and a new string is generated;
Otherwise, the original string is used directly to improve memory efficiency.
5. CStr and CString
CStr
cap (a poem)CString
collaborate withC languageThe two types used to handle strings when interacting.
CStr
for use inRust
to securely access the information provided by theC languageThe assigned string;
(indicates contrast)CString
for use inRust
to create and manage files that can be safely passed to theC languagefunction's string.
Style Cis the same as the string ofRust
The string implementation in the
For example, strings in C are all characterized by anull
character\0
This is the same as an array of bytes at the end ofRust
Very different.
So Rust encapsulates both types separately (CStr
respond in singingCString
) that can safely interact with C strings, enabling seamless integration with existing C libraries and APIs.
6. OsStr and OsString
OsStr
respond in singingOsString
is used to handle string types that are compatible with the operating system.
It is mainly used in scenarios where you need to interact with operating system APIs that are generally platform-specific string encodings (e.g.Windows
upperUTF-16
and most of theUnix-like
systematicUTF-8)
。
OsStr
cap (a poem)OsString
is also equivalent tostr
respond in singingString
The relationship between theOsStr
are generally not used directly in code.
The more commonly used&OsStr
cap (a poem)OsString
。
These two types are generally used when reading/writing operating system environment variables or interacting with system APIs to help us ensure that strings are passed in the correct format.
7. Path and PathBuf
These two types look at the name seems to have little to do with strings, in fact, they are specifically designed to deal with file path strings.
In different filesystems, for the file path format, the characters allowed in the path are different, for example, thewindows
File paths are not even case-sensitive on the system.
utilizationPath
cap (a poem)PathBuf
So that we don't have to be distracted by which filesystem we're using when we're coding.
Path
cap (a poem)PathBuf
The main difference between the two is variability and ownership.
If you need to read and query path information frequently without modifying it, thePath
It's a good choice;
If the path content needs to be dynamically constructed or modified, thePathBuf
would be more appropriate.
8. Summary
In conclusion.Rust
The reason there are so many string types is that they are categorized according to different uses.
This is also to maximize the performance of the program when dealing with different application scenarios; after all, thesuretyrespond in singinghigh performancealwaysRust
Biggest selling point.