Location>code7788 >text

Rust String Types Explained

Popularity:58 ℃/2024-09-25 16:47:38

Strings are a type that every programming language can't get around.

However, inRustYou'll see a far richer variety of string types than in any other language.

Below:

for what reason?RustWhat about the need for so many types of string representation in the

beginnerRustWhen you do, you may not be able to understand why it was designed this way? Why bring so much unnecessary complexity to using strings?

Actually.RustThe design for strings in the prioritization of thesuretyefficientlycap (a poem)versatile

So in terms of ease of use, it doesn't feel as easy to understand and master as other languages (e.g. python, golang).

This article attempts to explainRustAll of the different string types in the

Hopefully this will lead to a better understandingRustHow strings are handled for safety and maximum performance at the same time.

1. Strings in machines

Thestring (computer science)ordigital (electronics etc)that are stored in the machine arebinary system (math), that is, a sequence of 0s and 1s.

Two key pieces of information are required for a program to convert binary data into a human-readable string:

  1. character encoding
  2. String length

Common codes areASCIIUTF-8Wait a minute. The encoding is the character that corresponds to the binary sequence.

For example.ASCIIbe8-bit binarycorresponds to one character, so it can only represent at most256different kinds of characters.

(indicates contrast)UTF-8It is possible to use8-bit ~ 32-bitbinary to represent a character, which means it can encode more than a million characters.

Includes complex characters such as every language in the world and a variety of emoticons.

pass (a bill or inspection etc)character encodingWe can convert binary and characters to each other.

funnelString lengthinformation, we'll know when to stop when we convert a binary in memory to a string.

RustThe strings in theUTF-8encoding, the following one by one to introduce the various string types and their use scenarios.

2. String and &str

Stringcap (a poem)&strbeRustThe two most used types of strings are also the two types that are easily confused in use.

Stringare allocated on the heap, growable UTF-8 strings.

It owns the underlying data and is automatically cleaned and released beyond its defined scope.

let my_string = String::from("databook");
println!(
    "pointer: {:p}, length: {}, capacity: {}",
    &my_string,
    my_string.len(),
    my_string.capacity()
);

For aString, there are 3 main parts:

  1. Pointer: points to the start of the string in heap memory
  2. Length: Length of valid string
  3. Capacity: Stringmy_stringTotal space occupied

Watch this space.Lengthrespond in singingCapacityThe difference between theLengthbemy_stringThe length of the valid characters in the string, i.e. the actual length of the string;

CapacityIndicates that the system ismy_stringAllocated memory space, in generalCapacity >= Length

Usually does not require direct treatmentCapacity, but its presence is essential for writing efficient and resource-sensitiveRustIt's important when it comes to code.

Especially, when you know that you're about to make a move toStringWhen adding a large amount of content, it may be possible to manually reserve in advance enoughCapacityto avoid multiple memory reallocations.

&stris then a slice of a string that represents a sequence of consecutive characters.

It's aType of loan, does not hold string data, and contains only a pointer to the beginning of the slice and the length of the slice.

let my_str: &str = "databook";
println!("pointer: {:p}, length: {}", &my_str, my_str.len());

Attention.&strhasn'tCapacitymethod, as it is just a borrowing and the content is not likely to increase.

Finally, forStringcap (a poem)&str, recommended when used:

  1. To dynamically create or modify string data at runtime, use theString
  2. To read or analyze string data without changing it, use the&str

3. Vec[u8] and &[u8]

These are two forms of representing a string as a bit byte, where theVec[u8]is a vector of bytes.&[u8]It's byte slicing.

They simply convert individual characters in a string into byte form.

as_bytesmethod can be used to convert the&strconvert to&[u8]

into_bytesmethod can be used to convert theStringconvert toVec<u8>

let my_str: &str = "databook";
let my_string = String::from("databook");
let s: &[u8] = my_str.as_bytes();
let ss: Vec<u8> = my_string.into_bytes();

println!("s: {:?}", s);
println!("ss: {:?}", ss);

/* running result
s: [100, 97, 116, 97, 98, 111, 111, 107]
ss: [100, 97, 116, 97, 98, 111, 111, 107]
*/

In UTF-8 encoding, each letter of the alphabet corresponds to1 byteand one Chinese character corresponds to3 bytes

let my_str: &str = "Chinese";;
let my_string = String::from("Chinese");
let s: &[u8] = my_str.as_bytes();
let ss: Vec<u8> = my_string.into_bytes();

println!("s: {:?}" , s);
println!("ss: {:?}" , ss);

/* Running results
ss: [228, 184, 173, 230, 150, 135]
ss: [228, 184, 173, 230, 150, 135]
*/

Vec[u8]cap (a poem)&[u8]Storing strings as bytes, without caring about the specific encoding of the string, the

This is useful when transferring binary files or packets over a network to effectively transfer how many bytes at a time.

4. str series

strThe type itself can't be used directly because its size can't be determined at compile time and doesn't conform to theRustThe safety rules of the

However, it can be used with other pointer types that have special purposes.

4.1. Box<str>

If ownership of a string slice is required (&stris borrowed and has no ownership), then it is possible to use theBoxSmart Pointer.

It is useful when you want to freeze strings to prevent further modifications or save memory by removing extra capacity.

For example, in the code below, we have placed aStringconvert toBox<str>

This ensures that it won't be modified elsewhere, and it can also be deleted because theBox<str>Ownership of the string.

let my_string = String::from("databook");
let my_box_str = my_string.into_boxed_str();
println!("{}", my_box_str);

// This is an error because the ownership has been transferred.
// This is the difference between Box<str> and &str.
// println!("{}", my_string); // This is the difference between Box<str> and &str.

4.2. Rc<str>

When you want to be in more than one placeenjoy togetherAn immutable string ofpossessionbut does not clone the actual string data when the

You can try using theRc<str>Smart Pointer.

For example, if we have a very large text that we want to use in more than one place and we don't want to make multiple copies to take up memory, we can use theRc<str>

let my_str: &str = "very long text ....";
let rc_str1: Rc<str> = Rc::from(my_str);

let rc_str2 = Rc::clone(&rc_str1);
let rc_str3 = Rc::clone(&rc_str1);

println!("rc_str1: {}", rc_str1);
println!("rc_str2: {}", rc_str2);
println!("rc_str3: {}", rc_str3);

/* running result
rc_str1: very long text ....
rc_str2: very long text ....
rc_str3: very long text ....
*/

This allows multiple variables to have ownership of the string data without actually cloning it.

4.3. Arc<str>

Arc<str>together withRc<str>The main difference is that theArc<str>is thread-safe.

If in a multi-threaded environment, use theArc<str>

let my_str: &str = "very long text ....";
let arc_str: Arc<str> = Arc::from(my_str);

let mut threads = vec![];

let mut cnt = 0;
while cnt < 5 {
    let s = Arc::clone(&arc_str);
    let t = thread::spawn(move || {
        println!("thread-{}: {}", cnt, s);
    });

    (t);
    cnt += 1;
}

for t in threads {
    ().unwrap();
}

/* running result
thread-0: very long text ....
thread-3: very long text ....
thread-2: very long text ....
thread-1: very long text ....
thread-4: very long text ....
*/

In the above code, the string data is shared among 5 threads.

on top ofrunning resultIn this case, the thread order is not fixed, and it will be different for more executions.

4.4. Cow<str>

CowbeCopy-on-Write(copy-on-write) abbreviation.

When you need to implement a feature that determines whether or not you need to modify a string based on its contents, use theCowIt would be appropriate.

For example, when filtering for sensitive words, we replace sensitive words withxx

fn filter_words(input: &str) -> Cow<str> {
    if ("sb") {
        let output = ("sb", "xx");
        return Cow::Owned(output);
    }

    Cow::Borrowed(input)
}

When entering the stringinputContains sensitive wordssbWhen it does, memory is reallocated and a new string is generated;

Otherwise, the original string is used directly to improve memory efficiency.

5. CStr and CString

CStrcap (a poem)CStringcollaborate withC languageThe two types used to handle strings when interacting.

CStrfor use inRustto securely access the information provided by theC languageThe assigned string;

(indicates contrast)CStringfor use inRustto create and manage files that can be safely passed to theC languagefunction's string.

Style Cis the same as the string ofRustThe string implementation in the

For example, strings in C are all characterized by anullcharacter\0This is the same as an array of bytes at the end ofRustVery different.

So Rust encapsulates both types separately (CStrrespond in singingCString) that can safely interact with C strings, enabling seamless integration with existing C libraries and APIs.

6. OsStr and OsString

OsStr respond in singingOsString is used to handle string types that are compatible with the operating system.

It is mainly used in scenarios where you need to interact with operating system APIs that are generally platform-specific string encodings (e.g.WindowsupperUTF-16and most of theUnix-likesystematicUTF-8)

OsStr cap (a poem)OsString is also equivalent tostrrespond in singingStringThe relationship between theOsStr are generally not used directly in code.

The more commonly used&OsStrcap (a poem)OsString

These two types are generally used when reading/writing operating system environment variables or interacting with system APIs to help us ensure that strings are passed in the correct format.

7. Path and PathBuf

These two types look at the name seems to have little to do with strings, in fact, they are specifically designed to deal with file path strings.

In different filesystems, for the file path format, the characters allowed in the path are different, for example, thewindowsFile paths are not even case-sensitive on the system.

utilizationPath cap (a poem)PathBufSo that we don't have to be distracted by which filesystem we're using when we're coding.

Pathcap (a poem)PathBufThe main difference between the two is variability and ownership.

If you need to read and query path information frequently without modifying it, thePathIt's a good choice;

If the path content needs to be dynamically constructed or modified, thePathBufwould be more appropriate.

8. Summary

In conclusion.RustThe reason there are so many string types is that they are categorized according to different uses.

This is also to maximize the performance of the program when dealing with different application scenarios; after all, thesuretyrespond in singinghigh performancealwaysRustBiggest selling point.