Location>code7788 >text

Why hash(-1) == hash(-2) in Python?

Popularity:535 ℃/2025-01-13 21:38:55

English:/posts/2021-07-16-why-is-hash-in-python

Author: Omair Majid

Translator: Cat Under the Pea Flower&Claude-3.5-Sonnet

Time: Original published on 2021.07.16, translated on 2025.01.11

Included in: Python Why Series/chinesehuazhou/python-whydo

when i wasWait for code to compileI saw this question on Reddit's r/Python:

Is hash(-1) == hash(-2) an Easter egg?

Wait, is this true?

$ python
Python 3.9.6 (default, Jun 29 2021, 00:00:00)
[GCC 11.1.1 20210531 (Red Hat 11.1.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> hash(-1)
-2
>>> hash(-2)
-2
>>> hash(-1) == hash(-2)
True

Yes, indeed. What a surprise!

Let’s look at some other common hashes:

>>> hash(1)
1
>>> hash(0)
0
>>> hash(3)
3
>>> hash(-4)
-4

It looks like all small integers have hashes equal to themselves, except-1...

Now I'm completely obsessed with this question. I tried to find out for myself. In what follows, I’ll walk you through how to find this answer for yourself.

How to get started? What can give us an authoritative answer?

Let's take a look at the source code! Actual implementation code in Python!

Get source code

I'm assuming that you, like me, have absolutely no idea where Python's source code is.

So, how do we (assuming we've never seen Python's source code) get the source code to answer the original question?

Maybe we can use Google? Searching for "python implementation" brings up some interesting results.

I searched forfirst resultMentioned the "CPython Reference Implementation".

on GithubPython organizationThe second repository is "cpython". This seems reliable. How do we make sure nothing goes wrong?

We have access to . Let's go to the source code download page. finally i found itPython 3.9.6 compressed package. After decompression,Also points to CPython on Github.

Okay, that's where we start. Let's get these codes for subsequent searches:

git clone /python/cpython --depth 1

--depth 1parameters makegitOnly limited history is obtained. This makes the cloning operation much faster. If we need the complete history later, we can get it again.

Let's delve deeper

When working on code, we need to find a starting point. Preferably something easy to search for, like a simple string that won't have too many misleading matches.

Maybe we can usehashDocumentation for the function? we can usehelp(hash)To view the document content:

>>> hash
<built-in function hash>
>>> help(hash)
Help on built-in function hash in module builtins:

hash(obj, /)
    Return the hash value for the given object.

    Two objects that compare equal must also have the same hash value, but the
    reverse is not necessarily true.

Now we can use this to findhash()Implementation of:

$ grep -r 'Return the hash value'
Python/clinic/:"Return the hash value for the given object.\n"
Python/:Return the hash value for the given object.
Doc/library/:   Return the hash value of the object (if it has one).  Hash values are
Lib/:        """Return the hash value of this hashing object.

hmacProbably related to the encryption's HMAC implementation, so we can ignore it.It is a documentation file, so it can be ignored.

Python/Looks like fun. If we look at this file, we will find this code:

/*
...
Return the hash value for the given object.

Two objects that compare equal must also have the same hash value, but the
reverse is not necessarily true.
[clinic start generated code]*/

static PyObject *
builtin_hash(PyObject *module, PyObject *obj)
/*[clinic end generated code: output=237668e9d7688db7 input=58c48be822bf9c54]*/
{
    Py_hash_t x;

    x = PyObject_Hash(obj);
    if (x == -1)
        return NULL;
    return PyLong_FromSsize_t(x);
}

searchPyLongtake me tohere. It seemsPyLongObjectis Python's native representation of integers (this will come in handy later). BrowsingPyLongObjectAfter reading the documentation and re-reading this code, it looks like this:

  1. We callPyObject_HashTo get the hash value of an object
  2. If the calculated hash value is -1, it is an error
    • It looks like we used -1 to indicate error, so no hash function would compute -1 for the real object
  3. we willPy_Ssize_tConvert toPyLongObject(The documentation calls it: "This is a subtype of PyObject that represents a Python integer object.")

Aha! This explains whyhash(0)yes0hash(1)yes1hash(-2)yes-2,buthash(-1)no-1. This is because-1Used internally to indicate errors.

but whyhash(-1)yes-2Woolen cloth? What sets it to this value?

Let's see if we can find out why.

We can first searchPyObject_Hash. Let's search it.

$ ag PyObject_Hash
...
Objects/
552:    result = PyObject_Hash(t);

Objects/
777:PyObject_HashNotImplemented(PyObject *v)
785:PyObject_Hash(PyObject *v)
802:    return PyObject_HashNotImplemented(v);

Objects/
307:    y = PyObject_Hash(a->im_func);
538:    y = PyObject_Hash(PyInstanceMethod_GET_FUNCTION(self));
...

Although there is a lot of interference, the only implementation seems to be inObjects/middle:

Py_hash_t
 PyObject_Hash(PyObject *v)
 {
     PyTypeObject *tp = Py_TYPE(v);
     if (tp->tp_hash != NULL)
         return (*tp->tp_hash)(v);
     /* To maintain common practice: types that inherit only from object in C code should work without explicit calls to PyType_Ready,
      * We implicitly call PyType_Ready here and then check the tp_hash slot again
      */
     if (tp->tp_dict == NULL) {
         if (PyType_Ready(tp) < 0)
             return -1;
         if (tp->tp_hash != NULL)
             return (*tp->tp_hash)(v);
     }
     /* Otherwise, the object can't be hashed */
     return PyObject_HashNotImplemented(v);
 }

This code is quite confusing. Fortunately, the comments are clear. After reading it multiple times, it seems like this code - given some lazy loading of types (?) - first finds the type of the object (usingPy_TYPE). and then look for that typetp_hashfunction and call it on v:(*tp->tp_hash)(v)

where can we find-1oftp_hashWoolen cloth? Let's search againtp_hash

$ ag tp_hash -l
...
Modules/_multiprocessing/
Objects/
Objects/
Objects/
Modules/_pickle.c
Objects/
Objects/
Objects/
Objects/
Objects/
Objects/
Objects/
Objects/
Objects/
Objects/
...

This is a long list. Recall from the documentation aboutPyLongObjectDescription ("This... represents a Python integer object"), let me check it firstObjects/

PyTypeObject PyLong_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "int",                                      /* tp_name */
    offsetof(PyLongObject, ob_digit),           /* tp_basicsize */
    sizeof(digit),                              /* tp_itemsize */
    0,                                          /* tp_dealloc */
    0,                                          /* tp_vectorcall_offset */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    0,                                          /* tp_as_async */
    long_to_decimal_string,                     /* tp_repr */
    &long_as_number,                            /* tp_as_number */
    0,                                          /* tp_as_sequence */
    0,                                          /* tp_as_mapping */
    (hashfunc)long_hash,                        /* tp_hash */
    ...

soPyLongObjecttype objecttp_hashyeslong_hash. Let's take a look at this function.

static Py_hash_t
long_hash(PyLongObject *v)
{
    Py_uhash_t x;
    Py_ssize_t i;
    int sign;

    ...

    if (x == (Py_uhash_t)-1)
        x = (Py_uhash_t)-2;
    return (Py_hash_t)x;
}

Note that I removed most of the implementation details. But the end of this function is exactly what we expected:-1is reserved for use as an error signal, so the code explicitly converts this return value to-2

This explains whyhash(-1)ultimately withhash(-2)same. This is not an easter egg, just to avoid using-1ashash()The return value of the method, hence the workaround.

Is this the correct/complete answer?

As mentioned, I have never looked at the Python code base. I think I have the answer. But is this right? I could be completely wrong.

Fortunately,/u/ExoticMandibles provided the answer in a Reddit post

The reference implementation of Python is "CPython", which is most likely the Python you are using. CPython is written in C language which, unlike Python, does not have exception handling. So, in C language, when you design a function and want to indicate "an error occurred", you must indicate the error by returning a value.

The hash() function in CPython may return an error, so it defines a return value of -1 to mean "an error occurred". But this can be confusing if the hash is calculated correctly and the object's actual hash value happens to be -1. So the convention is: if the hash calculation is successful and the value is -1, then -2 is returned.

In CPython, there is special code in the hash function for integers ("long objects") to handle this case:

/python/cpython/blob/main/Objects/#L2967

This is exactly what I inferred from reading the code.

in conclusion

I start with a seemingly difficult question to answer. But with a few minutes of code exploration—Python’s clean code base makes looking at its code much easier than other code bases I’ve seen—the answer was easily discovered and understood! If you've ever worked with computers, this should come as no surprise. There's no magic here, just layers of abstraction and code.

If this article has taught me anything, it’s this:Check out the source code!(The documentation may be outdated and the comments may be inaccurate, but the source code is eternal.)