Integer TypeIt is one of the most common data types in programming, but its implementation details are little known.
Unlike other languages,Python
Integers are of arbitrary precision, meaning they can be infinitely large and only limited by memory.
This characteristic makesPython
Very powerful when dealing with large integers, but also increases the complexity of the implementation.
Today, we will discussPython
The internal implementation of integers reveals the mystery behind it.
1. Internal representation of integers
In most programming languages, integers are usually of fixed precision, e.g.32 bitsor64 bits。
However,Python
The integers are of arbitrary precision, which means they can be infinitely large without overflow problems.
This characteristic makesPython
Very practical in cryptography, computer algebra and other fields.
# Integers in Python can be very large without overflowing
big_number = 12345678901234567890123456789012345678901234567890
print(big_number * big_number) # output a larger integer
How is this arbitrary precision characteristic implemented?
The answer lies inPython
integer implementation method.
Python
The integer is passedCPython
ofPyLongObject
Implemented by the structure,
This structure defines how integers are stored, including symbols and numbers.
PyLongObject
Definition reference:Include/cpython/
document.
typedef struct _PyLongValue {
uintptr_t lv_tag; /* Number of digits, sign and flags */
digit ob_digit[1];
} _PyLongValue;
struct _longobject {
PyObject_HEAD
_PyLongValue long_value;
};
Here_longobject
that isPyLongObject
,_PyLongValue
The symbols and numbers of numbers are stored.
Python
Use one"Big Base"notation, notation rather than common decimal representations,
exist64 bitsOn the platform, the base is\(2^{30}\), and in32 bitsOn the platform, the base is\(2^{15}\) 。
by64-bit platform(The cardinality is $2^{30} $) as an example, a big data1234567890123456789
Store as[1038713109, 76039121, 1]
。
def to_digits(n, base=2**30):
digits = [n % base]
n = n // base
while n != 0:
(n % base)
n = n // base
Return digits
x = 1234567890123456789
print(f"The underlying numerical representation of integer {x}: {to_digits(x)}")
# The underlying numerical representation of integer 1234567890123456789: [1038713109, 76039121, 1]
If you want to calculate it32 bitsThe representation on the platform is just passed into_digits
ofbase
Change the parameters to2**15
Just do it.
So, any large integer,Python
All internally use a list to store, each value in the list is less than $2^{30}\(or\) 2^{15} $。
2. Memory optimization of integers
Python
Integers take up a lot of memory, even small integers require 28 bytes (on 64-bit platforms).
In order to optimize memory usage,CPython
Some clever strategies were adopted, especially when dealing with small integers.
In Python 3.12.4 version on my native machine, integers less than or equal to $2^{64}$ are cached.
i = 2**64
j = 2**64
print(f"addr i: {id(i)}, addr j: {id(j)}")
print(Is f"i and j the same: {i is j}")
# addr i: 2595289736288, addr j: 2595289736288
# Is i and j the same? True
i = 2**65
j = 2**65
print(f"addr i: {id(i)}, addr j: {id(j)}")
print(Is f"i and j the same: {i is j}")
# addr i: 2595289736432, addr j: 2595289736480
Is # i and j the same? False
As can be seen from the example above, when the integer $\le 2^{64} $,i
andj
The memory address is the same; otherwise, it is different.
However, althoughCPython
Implementing integers is already very efficient, but memory usage is still a problem to consider when dealing with large numbers of integers.
Here are some optimization suggestions:
- use
array
Module ornumpy
: If you need to store a large number of integers of the same type, usearray
Module ornumpy
Data will be stored in a more compact way. - Avoid unnecessary integer creation: Try to reuse existing integer objects, especially in loops.
- Using generator: If you only need to process integers one by one, you can use generator instead of creating the entire list.
3. Performance optimization of integers
CPython
The integer implementation not only takes into account memory usage, but also improves computing performance through various optimization methods.
-
Bit operation optimization: For large integers,
CPython
Using multi-precision arithmetic, multi-precision integers are stored in memory as arrays, each element representing the numerical value of a positioned number.
Related source codes can be found:Include/cpython/
andObjects/
- Cache mechanism optimization: For some frequently occurring operations or intermediate results, they will be cached. When these results are needed again, get them directly from the cache, rather than recalculate them.
Related source codes can be found:Objects/
andObjects/
- Parallel computing support: For large integer addition, the calculation task will be decomposed into multiple subtasks and executed on multiple cores in parallel.
Related source codes can be found:Python/thread_pthread.h
、Python/thread_pthread.c
andObjects/
-
Code generation optimization: Adding integers
Python
When the code is converted to machine code, a more efficient sequence of instructions is generated.
Related source codes can be found:Python/
andPython/
4. Summary
Python
The integer implementation is an efficient and flexible arbitrary precision integer system.
passCPython
We can see the source code ofPython
How to handle large integers internally, and how to improve performance and save memory through optimization strategies.
However, althoughPython
The integer implementation is already very powerful, but when dealing with large amounts of data, we can still further optimize memory usage and performance with some tricks.