JUC-并发容器-0-HashMap

述

并发编程中,并发容器也是非常重要的一个工具,我们平时用的 ArrayList, HashMap 等等这些集合类都不是线程安全的,在并发环境下就可能会出现问题,JDK给我们提供了一些线程安全的容器,供我们使用

过时的并发容器

在早期的版本中, Hashtable 和 Vector 就对应的是线程安全的 HashMap 和 ArrayList

这两个线程安全类,本质上就是给方法加上 synchronized 锁,把方法都变成同步方法,这种方式的性能是很差的

后来又可以使用 Collections.synchronizedList(new ArrayList<>());, Collections.synchronizedMap(new HashMap<K, V>()); 这种方法把一个普通集合转成并发容器,这种方式,这种方式的原理其实也是用 synchronized 锁,区别是这种方式是锁的一部分代码块,上面的是锁的整个方法, 性能上有些许提升,但也还是很差

所以后来就出现了 ConcurrentHashMap 和 CopyOnWriteArrayList 这样的性能较为优秀的并发容器,这两个也是本文的重点

为什么HashMap不是线程安全的

同时put碰撞导致数据丢失,HashMap 使用的是链表的数据结构, 如果多个线程同时put操作,两个key的hash值相同的话,他们两个会落到同一个地方,最终结果肯定是只有一个线程put成功了,另一个的数据就丢失了
扩容导致数据丢失,多个线程往里面put数据,当容量不够用的时候,多个线程同时去扩容,最终也是只有一个线程扩容成功,那么另一个线程的数据就会丢失
可能会死循环造成CPU100%, 具体原因可以参考这里

HashMap数据结构

在了解 ConcurrentHashMap 之前,首先需要了解一下 HashMap

HashMap 底层是通过数组+链表组成的,在 jdk1.7和jdk1.8中的实现有些区别

JDK1.7中的 HashMap

jdk1.7 中 HashMap 的数据结构是这样的

这里就是一个数组+链表的结构,进入1.7中 HashMap 的源码看一下

这里是 HashMap 的几个重点的属性,这里重点看一下负载因子, HashMap 中默认的容量是16,负载因子是0.75,Map在使用的时候不断往里放数据,当容量达到了 16 * 0.75 = 12,就需要将当前的16的容量进行扩容,而扩容这个过程涉及到了 rehash,复制数据等操作,非常耗费性能,所以阿里巴巴开发规范中,推荐尽可能在Map初始化的时候就给定容量

然后这里具体存放数据的就是 Entry 看一下他的源码

这里的 next 就是用于实现链表的数据结构

知道了基本的数据结构后,再来看一下里面的几个重要方法

首先是 put 方法,代码如下

public V put(K key, V value) {
    // 如果 key 为空,则 put 一个空值进去
    if (key == null)
        return putForNullKey(value);
    // 根据 key 计算出 hashcode
    int hash = hash(key.hashCode());
    // 根据计算出的 hashcode 定位出所在桶
    int i = indexFor(hash, table.length);
    // 如果桶是一个链表则需要遍历判断里面的 hashcode、key 是否和传入 key 相等,如果相等则进行覆盖,并返回原来的值
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    
    // 如果桶是空的,说明当前位置没有数据存入,新增一个 Entry 对象写入当前位置
    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

这里再看一下 addEntry 这个方法

void addEntry(int hash, K key, V value, int bucketIndex) {
    // 判断是否需要扩容
    if ((size >= threshold) && (null != table[bucketIndex])) {
        // 如果需要扩容,就 *2 
        resize(2 * table.length);
        // 将当前的key重新计算hash
        hash = (null != key) ? hash(key) : 0;
        // 定位
        bucketIndex = indexFor(hash, table.length);
    }
    createEntry(hash, key, value, bucketIndex);
}

// 将当前位置的桶传入到新建的桶中，如果当前桶有值就会在位置形成链表
void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

再来看看 get() 方法

public V get(Object key) {
    // 为空就取空key的返回
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);
    return null == entry ? null : entry.getValue();
}

final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }
    // 根据key计算hash值
    int hash = (key == null) ? 0 : hash(key);
    // 判断是否为链表.不是链表就根据 key、key 的 hashcode 是否相等来返回值,链表则需要遍历直到 key 及 hashcode 相等时候就返回值
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

到这里 1.7 的 HashMap 就介绍完了,这个数据结构有一个需要优化的地方,就是当 hash 冲突比较严重的时候,链表就会越来越长,这样查询的效率也会越来越低的

所以再来看看 1.8 中的 HashMap 是如何处理的

JDK1.8中的 HashMap

首先看一下数据结构,如图

相对于1.7的改变就是加了红黑树,当链表太长,达到设置的阈值的时候,就会将整个链表转成红黑树

下面来看看1.8中的几个重点方法,首先是 put() 方法,代码如下

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 判断当前桶是否为空,
    if ((tab = table) == null || (n = tab.length) == 0)
        // 空的话 resize() 初始化
        n = (tab = resize()).length; 
    // 根据当前 key 的 hashcode 定位到具体的桶中并判断是否为空
    if ((p = tab[i = (n - 1) & hash]) == null)
        // 为空表明没有 Hash 冲突就直接在当前位置创建一个新桶即可
        tab[i] = newNode(hash, key, value, null);
    else {
        // 以下就是hash冲突的情况  
        Node<K,V> e; K k;
        // 比较当前桶中的 key、key 的 hashcode 与写入的 key 是否相等，相等就赋值给 e,在后面统一返回
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 如果当前桶是红黑树
        else if (p instanceof TreeNode)
            // 按红黑树的方式写入数据
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // 如果是个链表,就需要将当前的 key、value 封装成一个新节点写入到当前桶的后面(形成链表)
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // 这里判断链表的大小是否大于转红黑树的阈值
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        // 达到转红黑树的阈值的时候,就把整个链表都转成红黑树
                        treeifyBin(tab, hash);
                    break;
                }
                // 如果在遍历过程中找到 key 相同时直接退出遍历
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // 如果 e != null 就相当于存在相同的 key,那就需要将值覆盖
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // 最后判断是否需要进行扩容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

各种赋值比较可能有点绕,多看几次, 下面再看看 get() 方法

public V get(Object key) {
    Node<K,V> e;
    // 根据hash值定位桶,如果桶为空直接返回
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

/**
 * Implements Map.get and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @return the node, or null if none
 */
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // 首先判断桶的第一个位置(有可能是链表、红黑树)的key,是否为查询的key,如果是的话直接返回
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 不是的话
        if ((e = first.next) != null) {
            // 判断是不是红黑树,如果是,就按红黑树的方法查询
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 不是红黑树就是链表,按链表的方式循环查找  
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

1.8 中转为红黑树的数据结构,提高了查询效率,但是在并发环境下的问题还是存在的,因此就诞生了 ConcurrentHashMap 专用于并发环境下的 HashMap

总结

本文主要了解 HashMap 的数据结构,以及一个进化的过程,为了解 ConcurrentHashMap 打基础