LinkedHashMap Principle Explained-From LRU Caching Mechanism

write sth. upfront

From a Leetcode question

First, take a look at a classic question inside Leetcode:caching mechanism, the title is described below:

Please design and implement a program that satisfies theLRU (Least Recently Used) Cache Constrained data structures.

realizationLRUCache Class:

LRUCache(int capacity) in order topositive integer act as a volumecapacity Initialize LRU Cache

int get(int key) If the keywordkey exists in the cache, the value of the keyword is returned, otherwise the value of-1 。

void put(int key, int value) If the keywordkey Already exists, change its data valuevalue ; if it does not exist, the group is inserted into the cachekey-value . If the insertion operation causes the number of keywords to exceedcapacity Then it should bedrive out The longest unused keyword.

function (math.)get cap (a poem)put must be based onO(1) The average time complexity of the run.

The full name of LRU is Least Recently Used, which means that we believe that the data that has been used recently should be "useful" and the data that has not been used for a long time should be useless, and when the memory is full, we will prioritize deleting the data that has not been used for a long time.

analyze

To make the LRU'sput respond in singingget The time complexity of the method is O(1), which can be summarized as necessary for the data structure LRU:

Obviously the elements in the LRU must be time-ordered to distinguish between recently used and long unused data, and when the capacity is full the longest unused one is deleted to vacate the location.
To quickly find a particularkey Whether or not it already exists and gets the correspondingval；
Each time you access one of the LRU'skeyIn addition, the element needs to be the most recently used, which means that the LRU needs to support fast insertion and deletion of elements at arbitrary locations.

So, what data structure meets both of these conditions? Hash table lookup fast, but the data is not fixed order; chained table has order, insertion and deletion fast, but slow lookup. So combine them to form a new data structure: the hash table.LinkedHashMap。

The core data structure of the LRU caching algorithm is the hash chain table, a combination of a bi-directional chain table and a hash table. This data structure looks like this:

With the help of this structure, the 3 conditions above are analyzed one by one:

If we default to adding elements from the end of the chain every time, then obviously the elements closer to the end are the most recently used, and the elements closer to the head are the longest unused.
For a particularkeyWe can quickly locate a node in a chained table by using a hash table to obtain the correspondingval。
Chained tables obviously support fast insertion and deletion at any position, just change the pointers. It's just that traditional chained tables don't allow fast access to an element at a certain position according to its index, whereas here, with the help of a hash table, you can access the element at a certain position via thekey Quickly maps to any of the chained table nodes and then performs insertions and deletions.

put method flowchart:

Introduction to LinkedHashMap

LinkedHashSetcap (a poem)LinkedHashMapIt's actually the same thing.LinkedHashSetcap (a poem)LinkedHashMaphas the same implementation in Java, the former is simply a wrapper around the latter, i.e. theThere is a LinkedHashMap inside a LinkedHashSet (adapter pattern)。

LinkedHashMapRealizedMapinterface, i.e., it allows putting in elements with a key of null and inserting elements with a value of null. As you can see from the name the container islinked listcap (a poem)HashMapof the hybrid, that is, it simultaneously satisfiesHashMaprespond in singinglinked listSome of the characteristics of theA LinkedHashMap can be thought of as a linked list augmented with theHashMap。

ipso factoLinkedHashMapbeHashMapa direct subclass ofThe only difference between the two is that LinkedHashMap uses a bidirectional chained list (doubly-linked list) on top of HashMap to connect all entries in such a way that the benefits:

It is guaranteed that the elements are iterated in the same order as they are inserted.. Follow.HashMapCompared to the extra header pointing to the head of the bidirectional linked list (which is a dummy element), theThe iteration order of this bi-directional chain table is the insertion order of entries。
Instead of iterating over the entire table as in a HashMap, you just need to directly traverse the two-way linked table pointed to by the header.In other words.LinkedHashMapThe iteration time is then only related to the number of entries and not to the size of the table.

There are two parameters that can affectLinkedHashMapperformance: inital capacity and load factor. The initial capacity specifies the initial table size, and the load factor is used to specify the threshold value for auto-expansion. When the number of entries exceeds capacity*load_factor, the container will be automatically expanded and re-hashed. For scenarios with more inserted elements, setting the initial capacity large can reduce the number of re-hashing. This is similar to theHashMapIt's the same.

methodological analysis

get()

get(Object key)method returns the corresponding value based on the specified key value.This method follows almost exactly the same process as the () method

put()

put(K key, V value)method is to add the specified key, value pair to the map. The method will first do a lookup on the map to see if it contains the tuple, if it already contains it then it returns it directly, the lookup process is similar to the get() method; if it doesn't find it, then it will be returned via theaddEntry(int hash, K key, V value, int bucketIndex)method to insert a new entry.

Note that theInsertion has two meanings:

From the table's point of view, the new entry needs to be inserted into the corresponding bucket, and when there is a hash conflict, the header insertion method is used to insert the new entry into the head of the conflict chain table.
From the HEADER's point of view, the new ENTRY needs to be inserted into the end of the bidirectional linked table.

The code for addEntry() is as follows.

// ()
void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * );// automatic capacity expansion，and rehash
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = hash & (-1);// hash%
    }
    // 1.Insert a new one at the head of the conflict chain tableentry
    <K,V> old = table[bucketIndex];
    Entry<K,V> e = new Entry<>(hash, key, value, old);
    table[bucketIndex] = e;
    // 2.Insert a new one at the end of the two-way chain table into the newentry
    (header);
    size++;
}

The above code uses the addBefore() method to insert the new entry e in front of the header reference header of the bi-directional chained table so that e becomes the last element in the bi-directional chained table. addBefore() code is as follows.

// ()，commander-in-chief (military)thisinsert intoexistingEntryfront
private void addBefore(Entry<K,V> existingEntry) {
    after = existingEntry;
    before = ;
     = this;
     = this;
}

remove()

The function of remove(Object key) is to remove the entry corresponding to the value of key, and the specific logic of this method is implemented in removeEntryForKey(Object key). removeEntryForKey() method will first find the entry corresponding to the value of key, and then remove the entry ( ) of the chain table. The lookup process is similar to the get() method.

Note that theDeletion also has two meanings:

From the table's point of view, the entry needs to be removed from the corresponding bucket, and the corresponding reference to the conflict chain needs to be modified if the corresponding conflict chain is not empty.
From the HEADER's point of view, the ENTRY needs to be removed from the bi-directional chain table, and the corresponding references to the preceding as well as the following elements in the chain table need to be modified.

The corresponding code for removeEntryForKey() is as follows.

// ()，removingkeyvalue corresponding to theentry
final Entry<K,V> removeEntryForKey(Object key) {
    ......
    int hash = (key == null) ? 0 : hash(key);
    int i = indexFor(hash, );// hash&(-1)
    Entry<K,V> prev = table[i];// Getting a Conflict Chained Table
    Entry<K,V> e = prev;
    while (e != null) {// Iterate over conflicting linked tables
        Entry<K,V> next = ;
        Object k;
        if ( == hash &&
            ((k = ) == key || (key != null && (k)))) {// 找到要removing的entry
            modCount++; size--;
            // 1. commander-in-chief (military)ealgebraic term for a correspondencebucket的冲突链表中removing
            if (prev == e) table[i] = next;
            else = next;
            // 2. commander-in-chief (military)e从双向链表中removing
             = ;
             = ;
            return e;
        }
        prev = e; e = next;
    }
    return e;
}

LinkedHashSet

LinkedHashSet is a simple wrapper for LinkedHashMap, function calls to LinkedHashSet are converted to appropriate LinkedHashMap methods, so the implementation of LinkedHashSet is very simple

public class LinkedHashSet<E>
    extends HashSet<E>
    implements Set<E>, Cloneable, {
    ......
    // LinkedHashSetThere's one inside.LinkedHashMap
    public LinkedHashSet(int initialCapacity, float loadFactor) {
        map = new LinkedHashMap<>(initialCapacity, loadFactor);
    }
    ......
    public boolean add(E e) {//Simple Method Conversion
        return (e, PRESENT)==null;
    }
    ......
}

About the Author.

From the first-line programmer Seven's exploration and practice, continuous learning iteration in the~

This article is included in my personal blog:https://

Public number: seven97, welcome to follow~