Location>code7788 >text

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 56 Python Strings and Serialization - Regular Expressions and re Module Applications

Popularity:862 ℃/2024-10-07 00:57:42

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 56 Python Strings and Serialization - Regular Expressions and re Module Applications

image

Abstracts:

Python's re module provides powerful regular expression manipulation for searching, matching, replacing, and so on in strings. A regular expression is a pattern for matching strings. Regular expressions make it easy to find pieces of strings with specific patterns, such as matching e-mail addresses, cell phone numbers, dates in a particular format, and so on.

Link to original article:

FreakStudio's Blog

Past Recommendations:

You're learning embedded and you don't know how to be object oriented?

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 00 Introduction to Object-Oriented Design Methods

The network's most suitable for the introduction of object-oriented programming tutorials: 01 Basic Concepts of Object-Oriented Programming

The Best Object-Oriented Programming Tutorials for Getting Started on the Web: 02 Python Implementations of Classes and Objects - Creating Classes with Python

The Best Object-Oriented Programming Tutorials for Getting Started on the Web: 03 Python Implementations of Classes and Objects - Adding Attributes to Custom Classes

The Best Object-Oriented Programming Tutorial on the Net for Getting Started: 04 Python Implementation of Classes and Objects - Adding Methods to Custom Classes

The Best Object-Oriented Programming Tutorial on the Net for Getting Started: 05 Python Implementation of Classes and Objects - PyCharm Code Tags

The best object-oriented programming tutorials on the net for getting started: 06 Python implementation of classes and objects - data encapsulation of custom classes

The best object-oriented programming tutorial on the net for getting started: 07 Python implementation of classes and objects - type annotations

The best object-oriented programming tutorials on the net for getting started: 08 Python implementations of classes and objects - @property decorator

The best object-oriented programming tutorials on the net for getting started: 09 Python implementation of classes and objects - the relationship between classes

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 10 Python Implementations of Classes and Objects - Class Inheritance and Richter's Replacement Principle

The best object-oriented programming tutorials on the net for getting started: 11 Python implementation of classes and objects - subclasses call parent class methods

The network's most suitable for the introduction of object-oriented programming tutorials: 12 classes and objects of the Python implementation - Python using the logging module to output the program running logs

The network's most suitable for the introduction of object-oriented programming tutorials: 13 classes and objects of the Python implementation - visual reading code artifacts Sourcetrail's installation use

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 14 Python Implementations of Classes and Objects - Static Methods and Class Methods for Classes

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 15 Python Implementations of Classes and Objects - __slots__ Magic Methods

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 16 Python Implementations of Classes and Objects - Polymorphism, Method Overriding, and the Principle of Open-Close

The Best Object-Oriented Programming Tutorials for Getting Started on the Web: 17 Python Implementations of Classes and Objects - Duck Types and "file-like objects"

The network's most suitable for the introduction of object-oriented programming tutorials: 18 classes and objects Python implementation - multiple inheritance and PyQtGraph serial data plotting graphs

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 19 Python Implementations of Classes and Objects - Using PyCharm to Automatically Generate File Annotations and Function Annotations

The best object-oriented programming tutorials on the web for getting started: 20 Python implementation of classes and objects - Combinatorial relationship implementation and CSV file saving

The best introductory object-oriented programming tutorials on the net: 21 Python implementation of classes and objects - Organization of multiple files: modulemodule and packagepackage

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 22 Python Implementations of Classes and Objects - Exceptions and Syntax Errors

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 23 Python Implementation of Classes and Objects - Throwing Exceptions

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 24 Python Implementations of Classes and Objects - Exception Catching and Handling

The best object-oriented programming tutorials on the web for getting started: 25 Python implementation of classes and objects - Python to determine the type of input data

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 26 Python Implementations of Classes and Objects - Context Managers and with Statements

The best introductory object-oriented programming tutorials on the web: 27 Python implementation of classes and objects - Exception hierarchy and custom exception class implementation in Python

The best object-oriented programming tutorials on the net for getting started: 28 Python implementations of classes and objects - Python programming principles, philosophies and norms in a big summary

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 29 Python Implementations of Classes and Objects - Assertions and Defensive Programming and Use of the help Function

The Best Object-Oriented Programming Tutorials for Getting Started on the Web: 30 Python's Built-In Data Types - the root class of object

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 31 Python's Built-In Data Types - Object Object and Type Type

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 32 Python's Built-in Data Types - Class Class and Instance Instance

The Best Object-Oriented Programming Tutorials for Getting Started on the Web: 33 Python's Built-In Data Types - The Relationship Between the Object Object and the Type Type

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 34 Python's Built-In Data Types - Python's Common Compound Data Types: Tuples and Named Tuples

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 35 Python's Built-In Data Types - Document Strings and the __doc__ Attribute

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 36 Python's Built-In Data Types - Dictionaries

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 37 Python's Common Composite Data Types - Lists and List Derivatives

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 38 Python's Common Composite Data Types - Using Lists to Implement Stacks, Queues, and Double-Ended Queues

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 39 Python Common Composite Data Types - Collections

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 40 Python's Common Compound Data Types - Enumeration and Use of the enum Module

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 41 Python's Common Composite Data Types - Queues (FIFO, LIFO, Priority Queue, Double-Ended Queue, and Ring Queue)

The best introductory object-oriented programming tutorials on the web: 42 Python commonly used composite data types-collections container data type

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 43 Python's Common Composite Data Types - Extended Built-In Data Types

The Best Object-Oriented Programming Tutorial on the Net for Getting Started: 44 Python Built-In Functions and Magic Methods - Magic Methods for Rewriting Built-In Types

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 45 Python Implementations of Common Data Structures - Chain Tables, Trees, Hash Tables, Graphs, and Heaps

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 46 Python Function Methods and Interfaces - Functions and Event-Driven Frameworks

The network's most suitable for the introduction of object-oriented programming tutorials: 47 Python function methods and interfaces - callback function Callback

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 48 Python Function Methods and Interfaces - Positional Arguments, Default Arguments, Variable Arguments, and Keyword Arguments

Best Object-Oriented Programming Tutorials on the Net for Getting Started: 49 Python Functions Methods and Interfaces - Difference between Functions and Methods and lamda Anonymous Functions

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 50 Python Function Methods and Interfaces - Interfaces and Abstract Base Classes

The Best Object-Oriented Programming Tutorials on the Web for Getting Started: 51 Python Function Methods and Interfaces - Implementing Interfaces with Zope

Best Object-Oriented Programming Tutorials for Beginners on the Web: 52 Python Functions Methods and Interfaces-Protocol Protocols and Interfaces

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 53 Python Strings and Serialization - Strings and Character Encoding

The Best Object-Oriented Programming Tutorials on the Net for Getting Started: 54 Python Strings and Serialization - String Formatting and the format method

The Best Object-Oriented Programming Tutorial on the Net for Getting Started: 55 Python Strings and Serialization - Byte Sequence Types and Variable Byte Strings

More highlights to watch:

Accelerating Your Python: A Quick Guide to Python Parallel Computing

Understanding CM3 MCU Debugging Principles in One Article

Liver half a month, embedded technology stack summary out of the big

The "Secrets of the Martial Arts" of the Computer Competition

A MicroPython open source project collection: awesome-micropython, including all aspects of Micropython tool library

Avnet ZUBoard 1CG Development Board - A New Choice for Deep Learning

SenseCraft Deploys Models to Grove Vision AI V2 Image Processing Module

Documentation and code acquisition:

The following link can be accessed to download the document:

/leezisheng/Doc

image

This document mainly introduces how to use Python for object-oriented programming, which requires readers to have a basic understanding of Python syntax and microcontroller development. Compared with other blogs or books that explain Python object-oriented programming, this document is more detailed and focuses on embedded host computer applications, with common serial port data sending and receiving, data processing, and dynamic graph drawing as application examples for the host computer and the lower computer, and using Sourcetrail code software to visualize and read the code for readers' easy understanding.

The link to get the relevant sample code is below:/leezisheng/Python-OOP-Demo

main body (of a book)

Introduction to Regular Expressions

We often need to determine the legitimacy of a given string, such as whether a string of digits is a phone number; whether a string of characters is a legitimate URL, Email address; whether the password entered by the user meets the complexity requirements, and so on.

If we define a decision function for each format, first of all, this definition may be very complex, for example, the phone number can be a landline expressed as 010-12345678, can also be expressed as 0510-12345678, can also be a cell phone number 13800000000. so the logical complexity of the code increases linearly. Secondly, we define the function function is difficult to reuse, match A can not match B.Can there be a one-size-fits-all function that can fulfill our specific character matching needs as long as we pass in specific parameters? The answer is yes.

In the real world, most programming languages handle string parsing through regular expressions.Regular expressions are a type of text that are used to retrieve text from a text that matches some specific pattern, and they rely on special symbols to match unknown strings.It can be used to solve a common problem: given a string, determine whether it can match a given pattern, and the sub-strings that can be collected containing the relevant information.

There are two concepts in regular expressions, a string contains a number of characters, each character has a corresponding binary code in memory, as well as the characters successive relationship between the composition of the position, such as the beginning of the string position and the end of the position as shown in the figure is expressed as ps and pe. contains N characters in the string has N +1 position, the position of the position does not take up the memory, and is only used to match the localization.

image

A regular expression makesUse some special characters (usually starting with \, \ is an escape character) to represent a specific class of character set (e.g., the numbers 0-9) and character position (e.g., the beginning of a string). They are called metacharacters. Expressions consisting of metacharacters and other control characters are called patterns. The matching process has a position pointer, which always starts at position ps, and moves the pointer to the backward position of the matching character for each match according to the matching pattern, and tries to match the pattern at each position until the pe position is tried and the matching process ends.'.' In regular expression, it means to match all characters except newline character \n, if you want to match '.' itself, you have to use the '\ .' form.Since Python strings themselves use \ as an escape, regular expression strings are prefixed with r, indicating the original input, to prevent escape conflicts.

In regular expressions, if a character is given directly, it is an exact match. With '\d' you can match a number, '\w' can match a letter or a number, so:

'00\d' can match '007' but not '00A';
'\d\d\d\d' can match '010';
'\w\w\d' can match 'py3';

'.' can match any character, so:

'py.' can match 'pyc', 'pyo', 'py!' and so on.

To match variable-length characters, in a regular expression, use '*' for any character (including 0), '+' for at least one character, '?' means 0 or 1 character, '{n}' means n characters, '{n,m}' means n-m characters:

\d{3}\s+\d{3,8}
1. \d{3} means match 3 numbers, e.g. '010';
2. \s can match one space (also include blank characters like Tab), so \s+ means at least one space, e.g. match ' ', ' ' etc;
3. \d{3,8} means 3-8 numbers, e.g. '1234567'.
Taken together, the above regular expressions can match phone numbers with area codes separated by any number of spaces.

What if we want to match a number like '010-12345'? Since '-' is a special character, in regular expressions, it is escaped with '', so the above regular is \d{3}-\d{3,8}.

To do a more precise match, you can use '[]' to indicate a range, for example:

[0-9a-zA-Z\_] can match a number, letter or underscore;
[0-9a-zA-Z\__]+ can match strings consisting of at least one number, letter, or underscore, such as 'a100', '0__Z', 'Py3000', and so on;
[a-zA-Z\_][0-9a-zA-Z\_]* can match strings that start with a letter or underscore followed by any string consisting of a single digit, letter, or underscore, i.e., Python legal variables;
[a-zA-Z\_][0-9a-zA-Z\_]{0, 19} more precisely limits the length of the variable to 1-20 characters (1 character before + up to 19 characters after).

'A|B' can match either A or B.

(P|p)ython can match 'Python' or 'python'.

Other special symbols are available:

^ indicates the beginning of a line, ^\d indicates that it must begin with a number.
$ indicates the end of the line, \d$ means it must end with a number.
? means match 0 to 1 repetition of the regular formula that precedes it, ab? will match 'a' or 'ab'.

More related expressions can be viewed at the following link:

/uploads/apidocs/jquery/

/zh-cn/3/howto/

/zh-cn/latest/

/regexp/

image

Regular Expressions with the re Module

The regular expression module in the Python standard library is called re. We import it and create a search string and a pattern to search for.

In the following example, we use () to create a compiled regular expression object and use the (string[, pos[, endpos]]) method: if zero or more characters at the beginning of the string match the regular expression, the corresponding Match is returned, and if the string does not match the pattern, None is returned.

image

image

image

Since the string to be searched for and the pattern are matched, the conditional judgment will pass and the print statement will be executed. The sample code is as follows:

import re
search_string = "hello world"
pattern = ("hello world")
match = (pattern, search_string)
if match:
    print("regex matches")
    print(match)
    print(match.__doc__)

The results of the run are as follows:

image

Since the match function matches the pattern from the beginning of the string, it will not match if the pattern is changed to "ello world".

import re
search_string = "hello world"
pattern = ("ello world")
match = (pattern, search_string)
if match:
    print("regex matches")
    print(match)
    print(match.__doc__)
else:
    print("no match")

The results of the run are as follows:

image

The difference is that the parser will stop searching as soon as a match is found, so the pattern "hello wo" will also be matched successfully. If we only want a few specific characters to be matched we can put a few characters in square brackets to match any of them. Thus, if we encounter a regular expression pattern string of [abc], we know that the 5 characters (including the two square brackets) will match only one character in the search string, and that character can only be one of a, b, or c. Sample code is as follows:

import re

search_string = "hello world"
pattern = ("hell[lpo] world")
match = (pattern, search_string)

if match:
    print("regex matches")
    print(match)

The result of the run is as follows; in fact, these square brackets should be named character sets, although they more often refer to character classes.

image

Often, we want to use more characters, but typing them one by one is both monotonous and error-prone. Fortunately, the designers of regular expressions took this into account and provided shorthand. The dash symbol can represent a range. This is useful if you want to match "all lowercase letters" or "all numbers", for example:

import re
search_string = "hello world"
pattern = ("hell[a-z] world")
match = (pattern, search_string)
if match:
    print("regex matches")
    print(match)
    print(match.__doc__)
else:
    print("no match")

The results of the run are as follows:

image

As we said earlier, we can also use the backslash escape character to match some special symbols, such as '. , '(', etc., sample code is as follows:

import re

search_string = "0.05"
pattern = ("0\\.[0-9][0-9]")
match = (pattern, search_string)
if match:
    print("regex matches")
    print(match)
    print(match.__doc__)

The results of the run are as follows:

image

Here, it is important to note that the result string we pass to () must escape two backslashes "\section", if we want to match two backslashes with the related character "\section", we need to use four backslashes '\\' to match two backslashes of the related character '\section', four backslashes would be needed. It is also possible to precede Python regular expressions with an r to indicate a native string. The r character declares the content in quotes to indicate the original meaning of that content, avoiding the backslash hassle caused by multiple escapes.

The re module class also has the following methods, which we will try next:

image

We can use the search method to search for a regular expression in a string, and it will return the first position that matches the regular expression, as follows:

image

The sample code is as follows:

import re

pattern = ("o")
locate = ("dog")
print(locate)

image

In the first example of a regular expression, we matched the entire regular expression to the string, when in fact we should have used the fullmatch function instead of the match function:

image

image

image

import re

pattern = ("o[gh]")
_# (string[, pos[, endpos]])_
_# The second argument, pos, gives the index of the position in the string at which to start the search _#
_# The third argument endpos qualifies the end of the string search _#
_# Characters from pos to endpos - 1 will be matched, in this case 'ogg' _
match = ("doggie", 1, 3)

print(match)
_# Print the matched substring _
print(())
_# Print the starting position of the matched character _
print()

image

You can see that in this example we have extracted substrings using the group method. group(0) is the string that matches the entire regular expression and group(1), group(2)...... represent the 1st, 2nd, ...... substring.

We can also use regular expressions to slice strings, the method is more flexible than using fixed characters.

image

The sample program is as follows:

import re

str = 'a b c'
_# Ordinary split method to split a string_
str_split = (' ')
_# Output ['a', 'b', '', '', 'c']_
_# Can't recognize consecutive spaces _
print(str_split)

_# Use the split method from the re library_.
str_re_split = (r'\s+', str)
print(str_re_split)

The results of the run are as follows:

image

Where, \s means match any blank character, + means match the previous character one or more times, this regular expression can match one or more blank characters, including spaces, tabs, line breaks and so on.

So far, we have successfully matched most strings of known length. However, in most real-world scenarios, we don't know the exact number of characters we need to match. This is where regular expressions come in handy. We can add one or more punctuation marks by fine-tuning the pattern to effectively match multiple characters. This flexibility gives regular expressions a significant advantage when dealing with complex string matching problems.

Asterisk () means that the previous pattern can occur zero or more times. Combining asterisks with other symbols that match multiple characters gives more interesting results, for example, '.' will match any string, while '[a-z]*' will match any number of lowercase letters, including the empty string; the plus sign (+) behaves similarly to the asterisk, except that it requires one or more occurrences of the previous pattern, unlike the asterisk, which is optional; The question mark (?) requires the previous pattern to occur only zero or one time, no more.

Common examples are listed below:

'0.4' matches pattern '\d+\.\d+' 
'1.002' matches pattern '\d+\.\d+' 
'1.' does not match pattern '\d+\.\d+' 
'1%' matches pattern '\d?\d%' 
'99%' matches pattern '\d?\d%' 
'999%' does not match pattern '\d?\d%'

Next, we use an example to apply the relevant points illustrated earlier in more depth. In general, both the username and domain name of an email address must contain at least two characters and can only use letters, numbers, dots, underscores, percent signs, plus or minus signs as characters. The username and domain name have the @ symbol in the middle, and the domain name format is.

In the following example, let's write a regular expression to match a valid email address:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Among them:

  • (1) ^ indicates the beginning of a string;
  • (2) [a-zA-Z0-9._%+-]+ matches one or more letters, numbers, dots, underscores, percent signs, plus signs, or minus signs, which are characters that typically appear in the username of an e-mail address;
  • (3) @ matches the @ symbol;
  • (4) [a-zA-Z0-9.-]+ matches one or more letters, numbers, dots, or minus signs, the characters typically found in the domain name of an e-mail address;
  • (5) . Match a dot symbol;
  • (6) [a-zA-Z]{2,} matches two or more letters, the characters usually found in the top-level domains of e-mail addresses (e.g., .com, .org, etc.);
  • (7) $ denotes the end of the string.

This regular expression can match email addresses like "example@".

In the following code, we define a complex regular expression that matches valid email addresses. Then, we define a list emails that contains a number of email addresses. Use the search() method to match the email addresses one by one and output the result. The sample code is as follows:

import re

_# Define regular expression _
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\. [a-zA-Z]{2,}$'

_# Define target string _
emails = [
    "user@",
    "user-1@",
    "@",
    "user@",
    "invalid_email"
]


_# Use the search() method to match valid email addresses _
for email in emails.
    match = (pattern, email)
    if match.
        print("Valid email address:", ())
    else.
        print("Invalid e-mail address:", email)

The results of the run are as follows:

image

We can also use the findall function to query the results of all non-overlapping parts of a matching pattern, not just the first one like the search function. Basically it finds the first match first, then resets the string from the end of that result and does the next search.

image

Instead of returning matching objects, it returns a list of matching strings or tuples. The type of result returned depends on the number of bracketed combinations in the regular expression: if there are no combinations in the pattern, findall will return a list of strings, where each value is a substring from the source string that matches the pattern; if there is only one combination in the pattern, findall will return a list of strings, where each value is the contents of that group; if there are multiple combinations in the pattern If there are multiple combinations in the pattern, findall will return a list of tuples, each of which contains, in order, the results of one of the combinations.

import re
_# \b denotes word boundaries, [a-z] denotes any lowercase letter _
_# can match strings that start with f followed by zero or more lowercase letters. _
print((r'\bf[a-z]*', 'which foot or hand fell fastest'))

The results of the run are as follows:

image

Finally, it is important to note that regular matching is greedy by default, that is, it matches as many characters as possible. An example of this would be matching a 0 followed by a number:

import re
_# Since \d+ uses greedy matching, it directly matches all the 0's that follow, and as a result, 0* can only match the empty string _
print((r'^(\d+)(0*)$', '102300').groups())
_# Adding a ? will allow \d+ to use non-greedy matching:_
print((r'^(\d+?)) (0*)$', '102300').groups())

The results of the run are as follows:

image

image