Location>code7788 >text

"Python underlying principle"--Why can Python integers be infinitely large

Popularity:81 ℃/2025-02-24 10:44:41

Integer TypeIt is one of the most common data types in programming, but its implementation details are little known.

Unlike other languages,PythonIntegers are of arbitrary precision, meaning they can be infinitely large and only limited by memory.

This characteristic makesPythonVery powerful when dealing with large integers, but also increases the complexity of the implementation.

Today, we will discussPythonThe internal implementation of integers reveals the mystery behind it.

1. Internal representation of integers

In most programming languages, integers are usually of fixed precision, e.g.32 bitsor64 bits

However,PythonThe integers are of arbitrary precision, which means they can be infinitely large without overflow problems.

This characteristic makesPythonVery practical in cryptography, computer algebra and other fields.

# Integers in Python can be very large without overflowing
 big_number = 12345678901234567890123456789012345678901234567890
 print(big_number * big_number) # output a larger integer

How is this arbitrary precision characteristic implemented?

The answer lies inPythoninteger implementation method.

PythonThe integer is passedCPythonofPyLongObjectImplemented by the structure,

This structure defines how integers are stored, including symbols and numbers.

PyLongObjectDefinition reference:Include/cpython/document.

typedef struct _PyLongValue {
    uintptr_t lv_tag; /* Number of digits, sign and flags */
    digit ob_digit[1];
} _PyLongValue;

struct _longobject {
    PyObject_HEAD
    _PyLongValue long_value;
};

Here_longobjectthat isPyLongObject_PyLongValueThe symbols and numbers of numbers are stored.

PythonUse one"Big Base"notation, notation rather than common decimal representations,

exist64 bitsOn the platform, the base is\(2^{30}\), and in32 bitsOn the platform, the base is\(2^{15}\)

by64-bit platform(The cardinality is $2^{30} $) as an example, a big data1234567890123456789Store as[1038713109, 76039121, 1]

def to_digits(n, base=2**30):
     digits = [n % base]

     n = n // base
     while n != 0:
         (n % base)
         n = n // base

     Return digits

 x = 1234567890123456789
 print(f"The underlying numerical representation of integer {x}: {to_digits(x)}")
 # The underlying numerical representation of integer 1234567890123456789: [1038713109, 76039121, 1]

If you want to calculate it32 bitsThe representation on the platform is just passed into_digitsofbaseChange the parameters to2**15Just do it.

So, any large integer,PythonAll internally use a list to store, each value in the list is less than $2^{30}\(or\) 2^{15} $。

2. Memory optimization of integers

PythonIntegers take up a lot of memory, even small integers require 28 bytes (on 64-bit platforms).

In order to optimize memory usage,CPythonSome clever strategies were adopted, especially when dealing with small integers.

In Python 3.12.4 version on my native machine, integers less than or equal to $2^{64}$ are cached.

i = 2**64
 j = 2**64
 print(f"addr i: {id(i)}, addr j: {id(j)}")
 print(Is f"i and j the same: {i is j}")
 # addr i: 2595289736288, addr j: 2595289736288
 # Is i and j the same? True

 i = 2**65
 j = 2**65
 print(f"addr i: {id(i)}, addr j: {id(j)}")
 print(Is f"i and j the same: {i is j}")
 # addr i: 2595289736432, addr j: 2595289736480
 Is # i and j the same? False

As can be seen from the example above, when the integer $\le 2^{64} $,iandjThe memory address is the same; otherwise, it is different.

However, althoughCPythonImplementing integers is already very efficient, but memory usage is still a problem to consider when dealing with large numbers of integers.

Here are some optimization suggestions:

  1. usearrayModule ornumpy: If you need to store a large number of integers of the same type, usearrayModule ornumpyData will be stored in a more compact way.
  2. Avoid unnecessary integer creation: Try to reuse existing integer objects, especially in loops.
  3. Using generator: If you only need to process integers one by one, you can use generator instead of creating the entire list.

3. Performance optimization of integers

CPythonThe integer implementation not only takes into account memory usage, but also improves computing performance through various optimization methods.

  1. Bit operation optimization: For large integers,CPythonUsing multi-precision arithmetic, multi-precision integers are stored in memory as arrays, each element representing the numerical value of a positioned number.

Related source codes can be found:Include/cpython/andObjects/

  1. Cache mechanism optimization: For some frequently occurring operations or intermediate results, they will be cached. When these results are needed again, get them directly from the cache, rather than recalculate them.

Related source codes can be found:Objects/andObjects/

  1. Parallel computing support: For large integer addition, the calculation task will be decomposed into multiple subtasks and executed on multiple cores in parallel.

Related source codes can be found:Python/thread_pthread.hPython/thread_pthread.candObjects/

  1. Code generation optimization: Adding integersPythonWhen the code is converted to machine code, a more efficient sequence of instructions is generated.

Related source codes can be found:Python/andPython/

4. Summary

PythonThe integer implementation is an efficient and flexible arbitrary precision integer system.

passCPythonWe can see the source code ofPythonHow to handle large integers internally, and how to improve performance and save memory through optimization strategies.

However, althoughPythonThe integer implementation is already very powerful, but when dealing with large amounts of data, we can still further optimize memory usage and performance with some tricks.