Location>code7788 >text

Understanding ConcurrentHashMap

Popularity:600 ℃/2024-09-20 17:13:20

Why HashMap is not thread-safe

The put's not safe.

HashMap's putVal() is called due to a multi-threaded put operation on the HashMap for a specific reason:

  1. Suppose both threads A and B are performing put operations and the hash function calculates the same insertion subscript;

    1. When thread A finishes executing the sixth line it is hung due to time slice exhaustion, and thread B gets the time slice and inserts the element at that subscript, completing the normal insertion;
    2. Then thread A gets the time slice, and since the hash collision judgment has already been performed before, all will not be judged again at this time, but will be inserted directly;
    3. This ultimately results in the data inserted by thread B being overwritten by thread A, and thus thread unsafe.
  2. There is a ++size at line 38 of the code, and threads A and B, both of which are performing put operations at the same time, assume that the current zise size of the HashMap is 10;

    1. When thread A reaches line 38 of code, it is ready to perform the +1 operation after getting the value of size as 10 from the main memory, but it has to give up the CPU due to time slice exhaustion;
    2. Then thread B gets the CPU and gets the value 10 of size from main memory for +1 operation, completes the put operation and writes size=11 back to main memory;
    3. Then thread A gets the CPU again and continues execution (at this point the value of size is still 10), and when the put operation is performed, it still writes size=11 back to memory;
    4. At this point, threads A and B both perform a put operation, but the value of size only increases by 1, all that to say or due to data overwriting again leads to thread unsafe.
1 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
2 											boolean evict) {
3 	Node <K, V> [] tab; Node <K, V> p; int n, i;
4	if ((tab = table) == null || (n = ) == 0)
5 		n = (tab = resize()).length;
6	if ((p = tab[i = (n - 1) & hash]) == null) //
        tab[i] = newNode(hash, key, value, null);
    else {
        Node < K, V > e;
        K k;
        if ( == hash &&
            ((k = ) == key || (key != null && (k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode <K, V> ) p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0;; ++binCount) {
                if ((e = ) == null) {
                     = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if ( == hash &&
                    ((k = ) == key || (key != null && (k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = ;
            if (!onlyIfAbsent || oldValue == null)
                 = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    
38  if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Enlargement is not safe

Java7In the head insertion method of expansion will lead to a dead loop and data loss, Java8 will be changed from head insertion method to tail insertion method after the dead loop and data loss has been resolved, but there is still the problem of data coverage.

This is a problem in jdk7

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = ;
    for (Entry <K, V> e: table) {
        while (null != e) {
            Entry <K, V> next = ;
            if (rehash) {
                 = null ==  ? 0 : hash();
            }
            int i = indexFor(, newCapacity);
             = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

The transfer process is as follows:

  1. Iterate over the elements of the indexed array
  2. Iterate over each node in the chain table: use next to get the next element to be transferred, transfer e to the head of the new hash table, and use header insertion to insert the node.
  3. Loop 2 until all the nodes of the chain table are transferred
  4. Loop 1 until all indexed arrays have been transferred

Note that the lines of code = newTable[i] and newTable[i] = e cause the order of the chain table to flip.

The resize operation is to generate a new array with a new capacity, and then recalculate and write all key-value pairs of the original array to the new array, and then point to the newly generated array. When multiple threads at the same time to detect the total number of more than the threshold value will be called at the same time resize operation, each generate a new array and rehash assigned to the map underlying array table, the result is that only the last thread generated by the new array is assigned to the table variable, the other threads will be lost. And when some threads have completed the assignment and other threads have just begun, it will use the table that has been assigned as the original array, which will also be a problem.

Map m = (new LinkedHashMap(...));

Introduction to concurrentHashMap

concurrentHashMap is a hash table (based on HashMap) that supports highly concurrent updates and queries.

hashtable This class does not rely on synchronization to ensure thread-safe operations. () can also convert map to thread-safe. And concurrentHashMap in the premise of guaranteeing the safety of get does not need to lock.

underlying source code

put method

Recall the process of hashMap's put method

  1. Calculate the slot of the key
  2. Operations based on slot type (chained list, red-black tree)
  3. Data conversion, capacity expansion, etc. according to the number of members in the slot

How to efficiently perform concurrent operations:According to the above hashMap data structure can be intuitively seen, if the entire container as a resource for locking, then it becomes a serial operation. According to the characteristics of the hash table, the operation with a conflict will only appear in the same slot, and other slots of the operation does not affect each other. Based on this judgment, then the resource locking granularity can be reduced to the slot, so that the hot spot a dispersion, the probability of conflict is greatly reduced, concurrency performance can be well enhanced.

Underlying source code:

final V putVal(K key, V value, boolean onlyIfAbsent) {
    // If key and value are null, throw an exception.
    if (key == null || value == null) throw new NullPointerException();
    // Calculate the hash value of the key, (the hash value determines at which index of the array the Entry is stored)
    int hash = spread(()); // temporarily used as an identifier.
    // Temporarily used as an identifier with a value of 0
    int binCount = 0; // Declare a temporary variable as tab.
    // Declare the temporary variable to be tab, which assigns table, which is the current HashMap's array! This is a dead loop
    for (Node<K,V>[] tab = table;;) {
    // Declares a bunch of variables
        // f-the data at the current index position
        // n-length of the array
        //i-the index position where the data is to be stored
        // fh-hash value of the data at the bucket position
        Node<K,V> f; int n, i, fh.
    // If tab is null, or tab's length is 0
        if (tab == null || (n = ) == 0)
        // Come in to indicate that the array is not initialized, start initializing, ConcurrentHashMap to avoid the problems caused when initializing concurrently
            tab = initTable();

        // tabAt(array, index position), get the value of the specified index position of the array, f is the value of the subscript position of the array
        // if f == null
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
        // Going into this means that there is no value at the index location, and the current key-value is stored at this index location based on CAS
            if (casTabAt(tab, i, null.
                         new Node<K,V>(hash, key, value, null)))
            // If CAS succeeds, the data was added successfully (to the array), if it goes false, continue as above and try something else
                break; // no lock when adding to empty bin
        }
    // f is the value of the index position obtained after the above if, the current key-value hash is MOVED or not, if equal, prove that the current position is being expanded
        else if ((fh = ) == MOVED)// MOVED means expanding.
        // If you're expanding, help you expand (build an array twice as long and move the values from the old array to the new one), the operation that helps expand is the operation that migrates the data
            tab = helpTransfer(tab, f);
        else {
            // First judgment: is the array initialized?
            // Second judgment: does the array have a value at the specified location?
            // Third judgment: is the array being expanded now?
            // The else is the fourth judgment: do we need to chain the data or add it to a red-black tree? (Hash conflict (collision) occurs)
            V oldVal = null; // Add a lock.
            // Put a lock on f (f is the value of the array's subscript position), i.e. here, locking the bucket
            synchronized (f) {
                // Get the data at index i, and see if it's the same as the locked f.
                if (tabAt(tab, i) == f) {
                    // if fh is greater than or equal to 0 (determine if hash value is greater than or equal to 0), it's a chained table
                    if (fh >= 0) {
                        // Change the identifier to 1
                        binCount = 1; // Start the loop.
                        // Start the loop, in order to put the inserted value at the end of the list.
                        for (Node<K,V> e = f;; ++binCount) {
                            K ek;
                            if ( == hash && // Determine if the key of the node pointed to is equal to the key of the node currently to be inserted
                                ((ek = ) == key || // Determine if the key of the node is equal to the key of the node to be inserted.
                                 (ek ! = null && (ek)))) { // Determine if the key of the node pointed to is equal to the key of the node currently being inserted.
                                 // Indicates that this is a modification operation
                                oldVal = ;
                                if (!onlyIfAbsent) // if onlyIfAbsent is true, then nothing is done
                                     = value; // modify the value
                                break; }
                            }
                            // It's a normal add operation.
                            Node<K,V> pred = e;
                            // If the next node is null, it is the last node in the chain, so add it to the end of the chain.
                            if ((e = ) == null) {
                                // Add the current value to the last node in the chain.
                                 = new Node<K,V>(hash, key, value, null); // Add the current value to the next point of the last node in the chain.
                                                          value, null);
                                break;
                            }
                        }
                    }

                    // Determine if the bucket at the current position is a tree.
                    else if (f instanceof TreeBin) {
                        Node<K,V> p;
                        // Marker change to 2
                        binCount = 2; // call putTreeValue.
                        // Call putTreeVal to throw to the red-black tree
                        if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                       value)) ! = null) {
                            // Go to if to indicate that it's an override
                            oldVal = ;
                            if (!onlyIfAbsent) // Decide, based on the judgment, whether to modify the data or not.
                                 = value; // decide whether or not to modify the data based on the judgment.
                        }
                    }
                }
            }
            if (binCount ! = 0) {
                if (binCount >= TREEIFY_THRESHOLD)//chain table length is greater than or equal to 8
                    treeifyBin(tab, i);//try to turn a red-black tree
                if (oldVal ! = null)
                    return oldVal.
                break; }
            }
        }
    }
    addCount(1L, binCount); return null; }
    return null; }
}

The spread method for calculating the hash value

static final int spread(int h) {
    // (h ^ (h >>> 16)): the same hashing algorithm as for HashMap, with the higher 16 bits also involved
    // Entry holds index position = (array length - 1) & hash
    The purpose of // & HASH_BITS is to ensure that the hash value is always positive, since the highest sign bit is 100% 0.
    // HASH_BITS = 0x7fffffff
    // Because a negative hash value has a special meaning.
    return (h ^ (h >>> 16)) & HASH_BITS.
}

//Special meaning when the hash value is negative
static final int MOVED = -1; // data at current index position is being expanded
static final int TREEBIN = -2; // It's not a chained table, it's a red-black tree at the current index position
static final int RESERVED = -3; // The current index position is temporarily occupied, the compute method is involved

Initialize the initTable method

/* sizeCtl = -1: indicates that the current ConcurrentHashMap is being initialized!!!!
sizeCtl = -N: it means the current ConcurrentHashMap is expanding!!!!
sizeCtl = 0: default initial value, the current ConcurrentHashMap has not been initialized!
sizeCtl > 0: if it has been initialized, sizeCtl identifies the threshold for expansion, if not initialized, sizeCtl identifies the initialized capacity of the array */

private final Node<K,V>[] initTable() {
    // Declare some variables
    Node<K,V>[] tab; int sc; // Declare some variables.
    // The tab variable is an array in the HashMap. An array length of null, or an array length of 0, means that the array has not been initialized!
    while ((tab = table) == null || == 0) {
        // Assign sizeCtl to sc, if it goes to if, it means it's being expanded or initialized.
        if ((sc = sizeCtl) < 0)
            (); // lost initialization race; just spin

        // Try to replace sizeCtl with -1 from the previous oldValue, CAS style, to identify that my ConcurrentHashMap is currently being initialized.
        else if ((this, SIZECTL, sc, -1)) {
            // Start expanding
            try {
                // DCL idea
                // if table is still null, or if its length is still 0
                if ((tab = table) == null || == 0) {
                    // declare n, sc is sizeCtl, default sizectl is 0, but now it's being initialized, I've changed sizeCtl to -1, but sc is still 0
                    // sc if 0, not greater than 0, so DEFAULT_CAPACITY, 16
                    int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
                    @SuppressWarnings("unchecked")
                    // Create the array!!!
                    Node<K,V>[] nt = (Node<K,V>[])new Node<? ,? >[n];
                    // Assign the initialized nt array to ConcurrentHashMap's table
                    table = tab = nt;
                    // Default sc = 12, which is the expansion threshold
                    sc = n - (n >>> 2); }
                }
            } finally {
                // Assign the threshold to sizeCtl, this is the end of the initialization.
                sizeCtl = sc; }
            }
            break; }
        }
    }
    return tab; }
}

Chained table to red-black tree: treeifyBin

As mentioned in the put source code analysis, treeifyBin doesn't necessarily do a red-black tree conversion, it may just do an array expansion.

private final void treeifyBin(Node<K,V>[] tab, int index) {
    Node<K,V> b; int n, sc.
    if (tab ! = null) {
        // MIN_TREEIFY_CAPACITY is 64.
        // So if the length of the array is less than 64, which is actually 32 or 16 or less, the array will be expanded
        if ((n = ) < MIN_TREEIFY_CAPACITY))
            // We'll analyze this method in more detail later
            tryPresize(n << 1);
        // b is the head node
        else if ((b = tabAt(tab, index)) ! = null && >= 0) {
            // lock
            synchronized (b) {

                if (tabAt(tab, index) == b) {
                    // The next step is to traverse the chained table and create a red-black tree.
                    TreeNode<K,V> hd = null, tl = null;
                    for (Node<K,V> e = b; e ! = null; e = ) {
                        TreeNode<K,V> p =
                            new TreeNode<K,V>(, , , , , null, null); { TreeNode<K,V> e = b; e !
                                              null, null);
                        if (( = tl) == null)
                            hd = p;
                        hd = p; else
                             = p;
                        tl = p; }
                    }
                    // Set the red-black tree to the appropriate position in the array
                    setTabAt(tab, index, new TreeBin<K,V>(hd));
                }
            }
        }
    }
}

Expansion: tryPresize

If the source code of Java8 ConcurrentHashMap is not simple, it is talking about expansion operations and migration operations.

To fully understand this method you need to look at the transfer method later.

Here the expansion is also done to double the capacity of the array, after the expansion of the capacity of the array for the original two times.

// The first thing to note is that the method parameter size is already doubled when it's passed in.
private final void tryPresize(int size) {
    // c: 1.5 times size, plus 1, and up to the nth power of the nearest two.
    int c = (size >= (MAXIMUM_CAPACITY >>> 1)) ? MAXIMUM_CAPACITY :
        TableSizeFor(size + (size >>> 1) + 1);
    int sc; int
    while ((sc = sizeCtl) >= 0) {
        Node<K,V>[] tab = table; int n;

        // This if branch is essentially the same code as the initialization of the array described earlier, where it's possible to leave this piece of code alone
        if (tab == null || (n = ) == 0) {
            n = (sc > c) ? sc : c;
            if ((this, SIZECTL, sc, -1)) {
                try {
                    if (table == tab) {
                        @SuppressWarnings("unchecked")
                        Node<K,V>[] nt = (Node<K,V>[])new Node<? ,? >[n];
                        table = nt;
                        sc = n - (n >>> 2); // 0.75 * n
                    }
                } finally {
                    sizeCtl = sc; }
                }
            }
        }
        else if (c <= sc || n >= MAXIMUM_CAPACITY)
            break; else if (tab == table) {
        else if (tab == table) {
            int rs = resizeStamp(n);

            if (sc < 0) {
                Node<K,V>[] nt;
                if ((sc >>> RESIZE_STAMP_SHIFT) ! = rs || sc == rs + 1 ||
                    sc == rs + MAX_RESIZERS || (nt = nextTable) == null ||
                    transferIndex <= 0)
                    break; // 2.
                // 2. Use CAS to add 1 to sizeCtl, then execute the transfer method.
                // if ((this, SIZECT)) == null || transferIndex <= 0) break
                if ((this, SIZECTL, sc, sc + 1))
                    transfer(tab, nt); }
            }
            // 1. set sizeCtl to (rs << RESIZE_STAMP_SHIFT) + 2)
            // Don't see what this value really means? What I can figure out is that the result is a large negative number.
            // Call the transfer method, where the nextTab parameter is null.
            else if ((this, SIZECTL, sc.
                                         (rs << RESIZE_STAMP_SHIFT) + 2)))
                transfer(tab, null); }
        }
    }
}

The heart of this method lies in the manipulation of the sizeCtl value, which is first set to a negative number, then transfer(tab, null) is performed, then the next loop adds 1 to sizeCtl and performs transfer(tab, nt), after which it may be the case that it continues to add 1 to sizeCtl and performs transfer(tab, nt).

So, the possible operation is to perform 1 transfer(tab, null) + multiple transfers(tab, nt), where you need to read the transfer source code to see how to end the loop.

Data migration: transfer

The following method, which is a bit long, migrates the elements of the original tab array into the new nextTab array.

Although the multiple calls to transfer in the tryPresize method we talked about earlier don't involve multithreading, the transfer method can be called elsewhere. Typically, as we talked about when we talked about the put method earlier, look up at the put method, and see if there's a place where the helpTransfer method is called, and the helpTransfer method calls the transfer method. The helpTransfer method calls the transfer method.

This method supports multi-threaded execution. Peripheral calls to this method will ensure that the first thread that initiates the data migration, with the nextTab parameter as null, will not have nextTab as null when this method is called later.

Before reading the source code, it is important to understand the mechanics of concurrent operations. The length of the original array is n, so there are n migration tasks, so that each thread is responsible for one small task at a time is the simplest, each finished a task and then detect whether there are other unfinished tasks, to help migrate on it, and Doug Lea used a stride, a simple understanding of the step, each thread is responsible for migrating a part of it each time, such as each time to migrate a part of the 16 small tasks. So we need a global scheduler to arrange which threads perform which tasks, and this is where the property transferIndex comes in.

The first thread that initiates the data migration will point the transferIndex to the last position of the original array, and then the stride from back to front belongs to the first thread, and then point the transferIndex to the new position, and then the stride from front belongs to the second thread, and so on. Of course, the second thread does not really mean the second thread, it can also be the same thread, the reader should be able to understand it. In fact, it is a large migration task is divided into a package of tasks.

private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) {
    int n = , stride;

    // stride Under single-core it is directly equal to n,Multi-core mode is (n>>>3)/NCPU,The minimum value is 16
    // stride It can be understood as”pacemaker“,there are n positions are required to be relocated,
    //   general term "will" be used for this n Tasks are divided into multiple task packages,每specific task包there are stride specific task
    if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE)
        stride = MIN_TRANSFER_STRIDE; // subdivide range

    // in the event that nextTab because of null,Initialize once first
    //    aforementioned,The periphery will ensure that the first thread that initiates the migration calls this method when the,parameters nextTab because of null
    //       When this method is later called by a thread involved in the migration,nextTab 不会because of null
    if (nextTab == null) {
        try {
            // Double the capacity
            Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1];
            nextTab = nt;
        } catch (Throwable ex) {      // try to cope with OOME
            sizeCtl = Integer.MAX_VALUE;
            return;
        }
        // nextTable be ConcurrentHashMap attribute
        nextTable = nextTab;
        // transferIndex 也be ConcurrentHashMap particular property,Used to control the location of migrations
        transferIndex = n;
    }

    int nextn = ;

    // ForwardingNode 翻译过来就be正在被迁移的 Node
    // This constructor generates aNode,key、value respond in singing next 都because of null,关键be hash because of MOVED
    // We'll see later.,Position in original array i Upon completion of the migration of the nodes at,
    //    It will then place the position i 处设置because of这个 ForwardingNode,Used to tell other threads that the location has already been processed
    //    the reason why它其实相当于be一个标志。
    ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);


    // advance 指的be做完了一个位置的迁移工作,You're ready for the next position.
    boolean advance = true;
    boolean finishing = false; // to ensure sweep before committing nextTab

    /*
     * Here's one. for circulate,The hardest part to understand is up front.,And to read them,Should've read the back first.,Then look at it backwards.
     * 
     */

    // i be位置索引,bound be边界,take note ofbe从后往前
    for (int i = 0, bound = 0;;) {
        Node<K,V> f; int fh;

        // Here's one. while 真的be不好理解
        // advance because of true Indicates that the next location is ready for migration
        // Simple understanding of the ending: i It's pointing. transferIndex,bound It's pointing. transferIndex-stride
        while (advance) {
            int nextIndex, nextBound;
            if (--i >= bound || finishing)
                advance = false;

            // commander-in-chief (military) transferIndex value assigned to nextIndex
            // here are transferIndex Once it is less than or equal to 0,clarification原数组的所there are位置都there are相应的线程去处理了
            else if ((nextIndex = transferIndex) <= 0) {
                i = -1;
                advance = false;
            }
            else if (
                     (this, TRANSFERINDEX, nextIndex,
                      nextBound = (nextIndex > stride ?
                                   nextIndex - stride : 0))) {
                // Look at the code in brackets,nextBound be这次迁移任务的边界,take note of,be从后往前
                bound = nextBound;
                i = nextIndex - 1;
                advance = false;
            }
        }
        if (i < 0 || i >= n || i + n >= nextn) {
            int sc;
            if (finishing) {
                // 所there are的迁移操作已经完成
                nextTable = null;
                // commander-in-chief (military)新的 nextTab assign a value to table causality,Completion of migration
                table = nextTab;
                // recalculate sizeCtl: n be原数组长度,the reason why sizeCtl 得出的值commander-in-chief (military)be新数组长度的 0.75 times (multiplier)
                sizeCtl = (n << 1) - (n >>> 1);
                return;
            }

            // As we said before.,sizeCtl 在迁移前会设置because of (rs << RESIZE_STAMP_SHIFT) + 2
            // after that,每there are一个线程参与迁移就会commander-in-chief (military) sizeCtl plus 1,
            // here are使用 CAS operational pair sizeCtl carry out a reduction 1,The delegate has done his part.
            if ((this, SIZECTL, sc = sizeCtl, sc - 1)) {
                // End of mandate,method exit
                if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
                    return;

                // 到here are,clarification (sc - 2) == resizeStamp(n) << RESIZE_STAMP_SHIFT,
                // 也就be说,所there are的迁移任务都做完了,It also goes to the above if(finishing){} Branching out
                finishing = advance = true;
                i = n; // recheck before commit
            }
        }
        // in the event that位置 i 处be空的,没there are任何nodal,Then put in the just-initialized ForwardingNode ”empty node“
        else if ((f = tabAt(tab, i)) == null)
            advance = casTabAt(tab, i, null, fwd);
        // 该位置处be一个 ForwardingNode,means that the location has been relocated
        else if ((fh = ) == MOVED)
            advance = true; // already processed
        else {
            // 对数组该位置处的结点plus锁,Begin processing the migration of the array at this location
            synchronized (f) {
                if (tabAt(tab, i) == f) {
                    Node<K,V> ln, hn;
                    // header node of the hash more than 0,clarificationbe链表的 Node nodal
                    if (fh >= 0) {
                        // 下面这一块respond in singing Java7 hit the nail on the head ConcurrentHashMap 迁移be差不多的,
                        // 需要commander-in-chief (military)链表一分because of二,
                        //   找到原链表hit the nail on the head lastRun,after that lastRun 及其之后的nodalbe一起进行迁移的
                        //   lastRun 之前的nodal需要进行克隆,after that分到两个链表中
                        int runBit = fh & n;
                        Node<K,V> lastRun = f;
                        for (Node<K,V> p = ; p != null; p = ) {
                            int b =  & n;
                            if (b != runBit) {
                                runBit = b;
                                lastRun = p;
                            }
                        }
                        if (runBit == 0) {
                            ln = lastRun;
                            hn = null;
                        }
                        else {
                            hn = lastRun;
                            ln = null;
                        }
                        for (Node<K,V> p = f; p != lastRun; p = ) {
                            int ph = ; K pk = ; V pv = ;
                            if ((ph & n) == 0)
                                ln = new Node<K,V>(ph, pk, pv, ln);
                            else
                                hn = new Node<K,V>(ph, pk, pv, hn);
                        }
                        // 其hit the nail on the head一个链表放在新数组的位置 i
                        setTabAt(nextTab, i, ln);
                        // Another linked table is placed in the place of the new array i+n
                        setTabAt(nextTab, i + n, hn);
                        // commander-in-chief (military)原数组该位置处设置because of fwd,means that the position has been processed,
                        //    Once other threads see the location of the hash 值because of MOVED,There will be no migration.
                        setTabAt(tab, i, fwd);
                        // advance 设置because of true,Indicates that the location has been relocated
                        advance = true;
                    }
                    else if (f instanceof TreeBin) {
                        // Migration of red and black trees
                        TreeBin<K,V> t = (TreeBin<K,V>)f;
                        TreeNode<K,V> lo = null, loTail = null;
                        TreeNode<K,V> hi = null, hiTail = null;
                        int lc = 0, hc = 0;
                        for (Node<K,V> e = ; e != null; e = ) {
                            int h = ;
                            TreeNode<K,V> p = new TreeNode<K,V>
                                (h, , , null, null);
                            if ((h & n) == 0) {
                                if (( = loTail) == null)
                                    lo = p;
                                else
                                     = p;
                                loTail = p;
                                ++lc;
                            }
                            else {
                                if (( = hiTail) == null)
                                    hi = p;
                                else
                                     = p;
                                hiTail = p;
                                ++hc;
                            }
                        }
                        // in the event that一分because of二后,nodal数小于等于6,那么commander-in-chief (military)红黑树转换回链表
                        ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) :
                            (hc != 0) ? new TreeBin<K,V>(lo) : t;
                        hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) :
                            (lc != 0) ? new TreeBin<K,V>(hi) : t;

                        // commander-in-chief (military) ln Placement in new array i
                        setTabAt(nextTab, i, ln);
                        // commander-in-chief (military) hn Placement in new array i+n
                        setTabAt(nextTab, i + n, hn);
                        // commander-in-chief (military)原数组该位置处设置because of fwd,means that the position has been processed,
                        //    Once other threads see the location of the hash 值because of MOVED,There will be no migration.
                        setTabAt(tab, i, fwd);
                        // advance 设置because of true,Indicates that the location has been relocated
                        advance = true;
                    }
                }
            }
        }
    }
}

At the end of the day, the transfer method doesn't do all the migrating. Each call to this method only does the migrating of the transferIndex stride one position forward, and the rest needs to be controlled by the periphery.

At this point, it may be clearer to go back and take a closer look at the tryPresize method.

get Process Analysis

The get method has always been the simplest, and it's no different here: the

  • Calculate the hash value

  • Find the position of the array according to the hash value: (n - 1) & h

  • According to the nature of the node at this location, the corresponding search is performed

    • If the position is null, then just return null.

    • If the node at that position is exactly what we need, return the value of that node.

    • If the hash value of the node at this location is less than 0, it means that it is expanding, or it is a red-black tree, and we will introduce the find method later.

    • If none of the above 3 criteria are met, then it is a chained table, and traversing it is sufficient

public V get(Object key) {
    Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
    int h = spread(());
    if ((tab = table) ! = null && (n = ) > 0 &&
        (e = tabAt(tab, (n - 1) & h)) ! = null) {
        // Determine if the header node is the node we need
        if ((eh = ) == h) {
            if ((ek = ) == key || (ek ! = null && (ek))))
                return ;
        }
        // If the hash of the head node is less than 0, then either expansion is in progress, or the location is a red-black tree.
        else if (eh < 0))
            // Refer to (int h, Object k) and (int h, Object k).
            return (p = (h, key)) ! = null ?  : null.

        // Iterate through the chain table
        while ((e = ) ! = null) {
            if ( == h && ) == key || (ek != key) {
                ((ek = ) == key || (ek ! = null && (ek))))
                return ;
        }
    }
    return null; }
}

Simply put, most of this method is very simple, only happen to encounter the expansion of the case, (int h, Object k) slightly more complex, but in the understanding of the process of data migration, this is not difficult, so limited to the length of this will not be expanded here.

Calculate Size

ConcurrentHashMap's size() operation does not add any locks, so how does it calculate the size of the map in a thread-safe way in a multi-threaded environment?

Looking at the source code, you can see that size() is calculated using the sumCount() method.

public int size() {
    long n = sumCount(); return ((n < 0L) ?
    return ((n < 0L) ? 0 :
           (n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE : (int)n) ?
           (int)n);
}

final long sumCount() {
    CounterCell[] as = counterCells;
    // After CAS fails to modify baseCount, use CounterCell to count which thread is going to modify it
    CounterCell a; // when there is no concurrent contention
    // When there is no concurrent contention, only use baseCount to count the size of the map.
    long sum = baseCount; // Iterate through the counterCells.
    // Iterate through the CounterCells and add the value of the elements in the CounterCell array to the sum variable.
    if (as ! = null) {
        for (int i = 0; i < ; ++i) {
            if ((a = as[i]) ! = null)
                sum += ;
        }
    }

    // This number may be inaccurate, so the size of the ConCurrentHashMap is a reference value, not an exact value in real time.
    return sum; }
}
  1. The size of ConCurrentHashMap is maintained by the variables baseCount and counterCells:

    • In the absence of concurrency, a volatile-modified baseCount variable is sufficient;

    • When there is concurrency and CAS fails to modify baseCount, the CounterCell class is used, i.e., a CounterCell object is created, its volatile-modified value property is set to 1, and it is placed in a random position in the ConterCells array;

  2. Eventually the total size Size of the Map is derived in the sumCount() method by accumulating the baseCount and the value of each CounterCell in the CounterCells array.

  3. However, the value returned is an estimate; if there are concurrent insertion or deletion operations, it may differ from the actual number.

  4. In addition, the maximum value of the size() method is the maximum value of the Integer type, and the size of the Map may exceed Integer.MAX_VALUE, so JAVA8 suggests the use of mappingCount().

Compute method

Need to know is that the business thread-safe is not the same as the collection thread-safe, does not mean that the use of a thread-safe collection such as ConcurrentHashMap can ensure that the business thread-safe. This is because, ConcurrentHashMap can only guarantee that the put is safe, but in the put operation before if there are other operations, the business is not necessarily thread-safe.

followingcompute()method for some typical usage scenarios:

  1. Atomic update key-value pairs: When you need to make sure that updates to key-value pairs are atomic, i.e., when one thread makes an update to a key-value pair, no other thread can see the intermediate state.
  2. Calculate the value corresponding to the key: If the mapping needs to be updated by calculating new values based on the keys, thecompute() You can ensure the atomicity of computation and update operations.
  3. Cache Updates: In a cache implementation, when a cache item needs to be dynamically updated based on certain conditions, you can use thecompute()method to ensure the atomicity of the update operation.
  4. parallel processing: In parallel computing, when multiple threads need to update the sameConcurrentHashMapWhen the term in thecompute()can be used to ensure that the processing of each key is non-interfering with each other.

Example of use:

ConcurrentHashMap<String, Integer> map = new ConcurrentHashMap<>();
("key1", 1);

// utilizationcomputeThe method update key is"key1"value of
("key1", (key, value) -> value + 1);

comparison summary

  • HashTable: uses the synchronized keyword to lock operations such as puts.

  • ConcurrentHashMap JDK1.7: Implemented using a segmented locking mechanism.

  • ConcurrentHashMap JDK1.8: then use array + chained table + red-black tree data structure and CAS atomic operations to achieve; synchronized locking bucket, as well as a lot of CAS operations

Extension: Segmented Locking Mechanism in JDK7

The underlying data structure of ConcurrentHashMap in JDK7 is an array plus a linked list, the source code is as follows:

//Initial total capacity,default (setting)16
static final int DEFAULT_INITIAL_CAPACITY = 16;
//loading factor,default (setting)0.75
static final float DEFAULT_LOAD_FACTOR = 0.75f;
//concurrency level,default (setting)16
static final int DEFAULT_CONCURRENCY_LEVEL = 16;
 
static final class Segment<K,V> extends ReentrantLock implements Serializable {
	transient volatile HashEntry<K,V>[] table;
}

Which concurrency level controls the number of Segment, in a ConcurrentHashMap created after the number of Segment can not be changed, the expansion process over the change is the size of each Segment.

Segment Segment inherits the reentrant lock ReentrantLock, with locks, each of which controls a segment, as shown below:

A large Map into a number of small segment, each segment using an independent lock to ensure thread safety, multiple threads to access different segments can be accessed concurrently, thus improving concurrency performance. This is relative to the whole map directly synchronized synchronized is advantageous.

So why did J DK8 ditch segmented locks again?

  • The number of Segments is not changeable, so as more data is put in, each Segment gets bigger and bigger, and the granularity of the lock becomes bigger and bigger. At this point, when a segment is very large, the performance of the segment lock will decline.

  • Wastes memory space when divided into many segments (discontinuous, fragmented).

  • The probability of competing for the same segment lock when operating a map is very small, and segment locks can instead cause long waits for operations such as updates; the

As a result, segmented locks were deprecated in Java 8, and a new approach to thread security was adopted.

Extension: Why JDK8 doesn't use ReentrantLock instead of synchronized

  • Reduced memory overhead: If using ReentrantLock then nodes need to inherit from AQS to get synchronization support, increasing memory overhead, whereas in 1.8 only the head node needs to be synchronized.
  • Internal optimization: synchronized is directly supported by the JVM, which can make corresponding optimization measures at runtime: lock coarsening, lock elimination, lock spin and so on.

Why key and value are not allowed to be null

In HashMap, null can be used as either key or value. Whereas in ConcurrentHashMap, both key and value are not allowed to be null.

The author of ConcurrentHashMap - Doug Lea explains it as follows:

The main point is to say:

The main reason ConcurrentMap (e.g., ConcurrentHashMap, ConcurrentSkipListMap) does not allow null values is that ambiguity (dichotomy) is tolerated in non-concurrent Maps (e.g., HashMap), whereas it is not tolerated in concurrent Maps.

If all maps support null, then (key) can return null, but then there is an uncertainty, when you get a null, you do not know whether it is because he originally stored a null in or that is because it was not found and returned null.

In HashMap, because it is designed for single-threaded use, so when we (key) return null, we are able to detect by (key) check, if it returns true, it is considered to be stored a null, otherwise it is because it did not find and returned null.

However, like ConcurrentHashMap, it is made for concurrency, it is to be used in concurrent scenarios, when we (key) return null, there is no way to accurately detect it by (key) (ConcurrentHashMap has this method, but it is unreliable) checking, because in the process of detecting it may be modified by other thread locks, and the result is not reliable. and the detection result is not reliable.

So, in order to make the semantics of ConcurrentHashMap more accurate and not dichotomous, he doesn't support null.

About the Author.

From the first-line programmer Seven's exploration and practice, continuous learning iteration in the~

This article is included in my personal blog:https://

Public number: seven97, welcome to follow~