Location>code7788 >text

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 53 Python Strings and Serialization - String and Character Encoding

Popularity:850 ℃/2024-09-28 00:50:50

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 53 Python Strings and Serialization - String and Character Encoding

image

Abstracts:

In Python, strings are representations of text that use Unicode encoding by default, which allows you to work with a variety of character sets. Character encodings are rules for converting characters to bytes, and common encodings include UTF-8, UTF-16, and ASCII.

Link to original article:

FreakStudio's Blog

Past Recommendations:

You're learning embedded and you don't know how to be object oriented?

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 00 Introduction to Object-Oriented Design Methods

The network's most suitable for the introduction of object-oriented programming tutorials: 01 Basic Concepts of Object-Oriented Programming

The Best Object-Oriented Programming Tutorials for Getting Started on the Web: 02 Python Implementations of Classes and Objects - Creating Classes with Python

The Best Object-Oriented Programming Tutorials for Getting Started on the Web: 03 Python Implementations of Classes and Objects - Adding Attributes to Custom Classes

The Best Object-Oriented Programming Tutorial on the Net for Getting Started: 04 Python Implementation of Classes and Objects - Adding Methods to Custom Classes

The Best Object-Oriented Programming Tutorial on the Net for Getting Started: 05 Python Implementation of Classes and Objects - PyCharm Code Tags

The best object-oriented programming tutorials on the net for getting started: 06 Python implementation of classes and objects - data encapsulation of custom classes

The best object-oriented programming tutorial on the net for getting started: 07 Python implementation of classes and objects - type annotations

The best object-oriented programming tutorials on the net for getting started: 08 Python implementations of classes and objects - @property decorator

The best object-oriented programming tutorials on the net for getting started: 09 Python implementation of classes and objects - the relationship between classes

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 10 Python Implementations of Classes and Objects - Class Inheritance and Richter's Replacement Principle

The best object-oriented programming tutorials on the net for getting started: 11 Python implementation of classes and objects - subclasses call parent class methods

The network's most suitable for the introduction of object-oriented programming tutorials: 12 classes and objects of the Python implementation - Python using the logging module to output the program running logs

The network's most suitable for the introduction of object-oriented programming tutorials: 13 classes and objects of the Python implementation - visual reading code artifacts Sourcetrail's installation use

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 14 Python Implementations of Classes and Objects - Static Methods and Class Methods for Classes

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 15 Python Implementations of Classes and Objects - __slots__ Magic Methods

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 16 Python Implementations of Classes and Objects - Polymorphism, Method Overriding, and the Principle of Open-Close

The Best Object-Oriented Programming Tutorials for Getting Started on the Web: 17 Python Implementations of Classes and Objects - Duck Types and "file-like objects"

The network's most suitable for the introduction of object-oriented programming tutorials: 18 classes and objects Python implementation - multiple inheritance and PyQtGraph serial data plotting graphs

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 19 Python Implementations of Classes and Objects - Using PyCharm to Automatically Generate File Annotations and Function Annotations

The best object-oriented programming tutorials on the web for getting started: 20 Python implementation of classes and objects - Combinatorial relationship implementation and CSV file saving

The best introductory object-oriented programming tutorials on the net: 21 Python implementation of classes and objects - Organization of multiple files: modulemodule and packagepackage

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 22 Python Implementations of Classes and Objects - Exceptions and Syntax Errors

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 23 Python Implementation of Classes and Objects - Throwing Exceptions

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 24 Python Implementations of Classes and Objects - Exception Catching and Handling

The best object-oriented programming tutorials on the web for getting started: 25 Python implementation of classes and objects - Python to determine the type of input data

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 26 Python Implementations of Classes and Objects - Context Managers and with Statements

The best introductory object-oriented programming tutorials on the web: 27 Python implementation of classes and objects - Exception hierarchy and custom exception class implementation in Python

The best object-oriented programming tutorials on the net for getting started: 28 Python implementations of classes and objects - Python programming principles, philosophies and norms in a big summary

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 29 Python Implementations of Classes and Objects - Assertions and Defensive Programming and Use of the help Function

The Best Object-Oriented Programming Tutorials for Getting Started on the Web: 30 Python's Built-In Data Types - the root class of object

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 31 Python's Built-In Data Types - Object Object and Type Type

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 32 Python's Built-in Data Types - Class Class and Instance Instance

The Best Object-Oriented Programming Tutorials for Getting Started on the Web: 33 Python's Built-In Data Types - The Relationship Between the Object Object and the Type Type

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 34 Python's Built-In Data Types - Python's Common Compound Data Types: Tuples and Named Tuples

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 35 Python's Built-In Data Types - Document Strings and the __doc__ Attribute

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 36 Python's Built-In Data Types - Dictionaries

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 37 Python's Common Composite Data Types - Lists and List Derivatives

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 38 Python's Common Composite Data Types - Using Lists to Implement Stacks, Queues, and Double-Ended Queues

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 39 Python Common Composite Data Types - Collections

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 40 Python's Common Compound Data Types - Enumeration and Use of the enum Module

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 41 Python's Common Composite Data Types - Queues (FIFO, LIFO, Priority Queue, Double-Ended Queue, and Ring Queue)

The best introductory object-oriented programming tutorials on the web: 42 Python commonly used composite data types-collections container data type

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 43 Python's Common Composite Data Types - Extended Built-In Data Types

The Best Object-Oriented Programming Tutorial on the Net for Getting Started: 44 Python Built-In Functions and Magic Methods - Magic Methods for Rewriting Built-In Types

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 45 Python Implementations of Common Data Structures - Chain Tables, Trees, Hash Tables, Graphs, and Heaps

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 46 Python Function Methods and Interfaces - Functions and Event-Driven Frameworks

The network's most suitable for the introduction of object-oriented programming tutorials: 47 Python function methods and interfaces - callback function Callback

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 48 Python Function Methods and Interfaces - Positional Arguments, Default Arguments, Variable Arguments, and Keyword Arguments

Best Object-Oriented Programming Tutorials on the Net for Getting Started: 49 Python Functions Methods and Interfaces - Difference between Functions and Methods and lamda Anonymous Functions

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 50 Python Function Methods and Interfaces - Interfaces and Abstract Base Classes

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 51 Python Function Methods and Interfaces - Implementing Interfaces with Zope

Best Object-Oriented Programming Tutorials for Beginners on the Web: 52 Python Functions Methods and Interfaces-Protocol Protocols and Interfaces

More highlights to watch:

Accelerating Your Python: A Quick Guide to Python Parallel Computing

Understanding CM3 MCU Debugging Principles in One Article

Liver half a month, embedded technology stack summary out of the big

The "Secrets of the Martial Arts" of the Computer Competition

A MicroPython open source project collection: awesome-micropython, including all aspects of Micropython tool library

Avnet ZUBoard 1CG Development Board - A New Choice for Deep Learning

SenseCraft Deploys Models to Grove Vision AI V2 Image Processing Module

Documentation and code acquisition:

The following link can be accessed to download the document:

/leezisheng/Doc

image

This document mainly introduces how to use Python for object-oriented programming, which requires readers to have a basic understanding of Python syntax and microcontroller development. Compared with other blogs or books that explain Python object-oriented programming, this document is more detailed and focuses on embedded host computer applications, with common serial port data sending and receiving, data processing, and dynamic graph drawing as application examples for the host computer and the lower computer, and using Sourcetrail code software to visualize and read the code for readers' easy understanding.

The link to get the relevant sample code is below:/leezisheng/Python-OOP-Demo

main body (of a book)

Strings and character encoding

A string is a basic type in Python that represents a set of immutable characters (i.e., you can't directly modify the characters corresponding to an index of a string; you need to convert it to a list).In a way you can think of strings as special tuple types.

Strings in Python are represented by Unicode, which is a character encoding standardSo what is a character encoding standard?In fact, in computer science, data processing and storage are based on the binary system. For the processing of textual information, it needs to be converted into digital form first to adapt to the computer's arithmetic logic.In computer architecture, early designs used an 8-bit binary number, a byte, as the basic unit. The digital form is adapted to the computer's arithmetic logic. Early designs in computer architecture used the 8-bit binary number, a byte, as the basic unit. Thus, the maximum integer value that can be represented by a byte is 255, which is the result of converting the binary number 11111111 to a decimal number. The representation of larger integers is achieved by increasing the number of bytes. For example, two bytes can represent a maximum integer value of 65535, while four bytes can represent a maximum integer value of up to 4294967295.

WHEREAS, the birthplace of computer technology is the United States.Early character encodings were based primarily on the ASCII standard, which covered only 127 characters, including upper- and lower-case letters, numbers, and some commonly used symbols.However, for non-English characters, such as Chinese, a single byte encoding is clearly not sufficient. For this reason, China has developed the GB2312 encoding standard, which uses at least two bytes to represent Chinese characters and ensure compatibility with ASCII encoding. Globally, different linguistic and cultural backgrounds have led to diverse encoding standards, such as Shift_JIS in Japan and Euc-kr in South Korea, and these different encoding standards can lead to garbled displays in mixed multilingual text environments.Unicode is also known as Unicode or Universal Code; it sets a uniform and unique binary code for each character in each language to meet the requirements of text conversion and processing across languages and platforms. **** From this point of view, we can regard a string as an immutable sequence of Unicode characters.

The Unicode standard explains in detail how characters are expressed as code points.The range of code points is limited to integers from 0 to 0x10FFFF, which theoretically covers about 1.1 million possible values, but the actual number of assignments does not reach this size. In the Unicode standard and in this document, the code point is expressed as U+265E, which refers to the character with the value 0x265e, whose decimal representation is 9822.

In addition, the Unicode Standard has compiled a number of tables that provide an exhaustive list of characters and their corresponding code points.

image

The above paragraph can be summarized as follows: a Unicode string is a sequence of code units (numbers from 0 to 0x10FFFF, or 1,114,111 in decimal). This sequence is represented in memory as a code unit, which is mapped to a byte containing eight binary bits.The rules for translating a Unicode string into a sequence of bytes are called character encoding, or encoding.

The first encoding that comes to mind might be to use 32-bit integers as code bits, and then use the CPU representation of 32-bit integers. The string "Python" might look like this:

image

This representation is very straightforward, but there are some problems with it:

  • (1) Not portable; byte order varies from processor to processor;
  • (2) Very wasteful of space: In most text, most code bits are less than 127 or 255, so byte 0x00 takes up a lot of space. Compared to the 6 bytes needed for ASCII representation, the above string takes up 24 bytes;
  • (3) is incompatible with existing C functions (e.g. strlen()), so a new set of wide string functions is needed.

As a result this encoding is not used much and people turn to other more efficient and convenient encodings such as UTF-8.UTF-8 is one of the most common encodings, and Python tends to use it by default. UTF stands for "Unicode Transformation Format", and '8' means that the encoding is in 8 bits.

UTF-8 encodes a Unicode character into 1-6 bytes depending on the size of the number, commonly used English letters are encoded into 1 byte, Chinese characters are usually 3 bytes, and only very rare characters are encoded into 4-6 bytes. If the text you want to transfer contains a lot of English characters, using UTF-8 encoding will save space.UTF-8 encoding has the added benefit that ASCII encoding can actually be viewed as part of UTF-8 encoding, so a great deal of legacy software that only supports ASCII encoding can continue to work under UTF-8 encoding.

In fact, Unicode can be encoded using any of the following character encoding schemes:

  • (1)UTF-8: UTF-8 is a variable-length form of Unicode that transparently preserves ASCII character code values.This form is used as file code in the Solaris Unicode language environment.
  • (2)UTF-16: UTF-16 is a 16-bit form of Unicode. In UTF-16, up to 65,535 characters are encoded as a single 16-bit value.Characters mapped from 65,535 to 1,114,111 are encoded as paired 16-bit values (proxies).
  • (3)UTF-32: UTF-32 is a fixed-length, 21-bit encoding form of Unicode, typically used in 32-bit containers or data types.This form is used as process code (wide character code) in the Solaris Unicode language environment.

Common operations on strings

As of Python 3.0, the str type contains Unicode characters, which means that any string created with "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode.

Note that Python does not support the single character type, which is also used as a string in Python.

The str class has a large number of methods that make string manipulation easy. The Python interpreter uses the dir and help commands to find out how to use all the methods.

print(help(str))

image

The basic methods of string applications and operations are not covered here, but only summarized in a document below. The focus of this chapter is to help you understand character encoding, serializing objects and applying regular expressions to parse strings and match arbitrary patterns.The following table lists the common methods and roles of the str class.

methodologies corresponds English -ity, -ism, -ization
lower\upper Returns the S string in lowercase and uppercase format. For these methods below, which have blue and red backgrounds, note that the input string is not changed, but a completely new instance of str is returned.
title\capital The former returns all words in the S-string with the first letter capitalized and all other letters lowercase, while the latter returns a new string with the first letter capitalized and all other letters lowercase.
swapcase swapcase() does case conversion (uppercase -- > lowercase, lowercase -- > uppercase) of all strings in S.
istitle Note that it does not strictly enforce the title format in the grammatical definition of English; for example, Leigh Hunt's poem The Glove and the Lions would be a legitimate title, even though it doesn't capitalize the first letter of all the words.Robert Service's The Gremation of Sam McGee is also a legitimate Robert Service's The Gremation of Sam McGee is also a legitimate title, even though the last word contains a capital letter in the middle.
isdecimal Checks if a string contains only decimal characters. Returns True if the string contains only decimal characters, otherwise returns False. this method only exists in unicode objects. Note: To define a decimal string, simply prefix the string with 'u'.
isdigit Detect whether the string consists of only numbers, there is at least one character in the string and all the characters are numbers, then return True, otherwise return False.
Isnumeric Checks if a string consists of only numbers. This method is only available for unicode objects. It returns True if the string contains only numeric characters, False otherwise, and when using the isdigit, isdecimal, and isnumeric methods, be aware that many Unicode characters are also considered to be numbers, not just the 10 Arabic numerals that we're used to. To make matters worse, the floating point numbers we form with decimal points are not considered decimals in strings, so isdecimal() returns False for '45.2'. The true decimal character in Unicode is 0660, so 45.2 would be 45\u06602. Furthermore, these methods do not validate that the string is a legal number. Furthermore, these methods do not verify that the string is a legitimate number; all three methods return True for "127.0.0.1".
isalnum Checks if a string consists of letters and numbers. str returns True if there is at least one character in the string and all characters are letters or numbers, otherwise it returns False.
isalpha Checks if a string consists of only letters. Returns True if there is at least one character in the string and all characters are letters, otherwise returns False.
Center(width[, fillchar]) Centers the string, and fills the left and right sides with fillchar so that the length of the string is width. fillchar defaults to a space. If the width is less than the length of the string, then it cannot be filled and the string itself is returned (no new string object is created).
ljust/rjust ljust() uses fillchar to fill in the right side of the string S so that the overall length is width. rjust() fills in the left side. If fillchar is not specified, spaces are used by default. If width is less than or equal to the length of the string S, it cannot be filled and returns the string S (no new string object is created).
zfill Fill the left side of the string S with 0 so that its length is width; if S is preceded by a right plus or minus sign, +/-, then 0 is filled in after both signs and the sign is counted in the length. If width is less than or equal to the length of S, it cannot be padded, and S itself is returned (no new string object is created).
Count(sub[, start[, end]]) Returns the number of occurrences of the substring sub in the string S. You can specify where to start counting (start) and where to end counting (end), indexed from 0, excluding the end boundary.
endswith/startswith endswith() checks whether the string S is suffix-terminated, returning Boolean values True and False. suffix can be a tuple. You can specify search boundaries for start and end. Similarly startswith() is used to determine whether the string S begins with a prefix.
find\rfind\index\rindex find() searches the string S to see if it contains a substring sub, if so, it returns the index position of sub, otherwise it returns "-1". You can specify the start and end of the search position. index() is the same as find(), the only difference is that when the substring can not be found, the ValueError error is thrown. rfind() is to return the position of the rightmost substring searched for, if only one or no substring is searched for, then it is equivalent to find().
Translate\maketrans () generates a character-by-character mapping table, and then uses translate(table) to map each character in the string S. This method can be used to achieve simple encryption of strings. Note that x and y in maketrans(x[, y[, z]]) are both strings and must be of equal length.
partition(sep)/rpartition(sep) Searches for the substring sep in the string S and partitions S from sep, returning a tuple with 3 elements: the part to the left of sep is the first element of the tuple, sep itself is the second element of the tuple, and sep is the third element of the tuple to the right of sep. partition(sep) partitions from the first sep on the left, and rpartition( sep) partitions from the first sep on the right. If sep is not searched, two of the returned 3-element tuple are empty. partition() is empty for the last two elements, rpartition() is empty for the first two elements. The following string methods return or act on strings.
Split(sep=None,maxsplit=-1)Rsplit(sep=None,maxsplit=-1)Splitlines([keepends]) Both are used to split a string and generate a list. split() splits S according to sep. maxsplit is used to specify the number of splits; if maxsplit is not specified or is given a value of "-1", it searches from do to right and splits every time it encounters sep until it finishes searching the string. rsplit() and split() are the same: a space is used as the separator and consecutive whitespace is compressed into a single space. rsplit() and split() are the same. If sep is not specified, or if None is specified, the splitting algorithm is changed: a space is used as the separator, and consecutive whitespace is compressed into a single space. rsplit() is the same as split(), except that it searches from the right to the left. splitlines() is used exclusively to split line breaks. Various line breaks can be specified, commonly \n, \r, \r\n. If keepends is specified as True, all newlines are kept.
join(iterable) Concatenates the strings in an iterable object (iterable) using S. Note that the iterable must all be of type string, or it will report an error. It takes a list of strings as arguments and returns the string after concatenating all the strings in the list with the original string.
strip\lstrip\rstrip Removes left and right, left and right chars, respectively, and removes whitespace (spaces, tabs, newlines) by default if chars is not specified or is specified as None. The only thing to note is that chars can be a sequence of characters. When removing them, any character in that sequence will be removed.
replace(old, new, count) Replace old in str with new, or no more than count times if count is specified....
expandtabs(N) Replaces \t in the string S with a certain number of spaces. Default N=8. Note that expandtabs(8) does not directly replace \t with 8 spaces. For example, 'xyz\tab'.expandtabs() will replace \t with 5 spaces because "xyz" takes up 3 character spaces. Also, it will not replace line breaks (\n or \r).

For more information on the specific use of these methods, you can see the following links:

/zh-cn/latest/#

In the meantime, here we review string escape characters and operators, just refer to it:

image

image