前言
我们先看一段代码,代码中使用 Map 的时候,有可能会这么写:
1 2 3 4 5 6 7 8 |
Map<String, Value> map; // ... Value result = map.get(key); if (null == result) { result = this.calculateValue(key); map.put(key, result); } return result; |
Java 8 的 java.util.Map 里面有个方法 computeIfAbsent,能够简化以上代码:
1 2 3 |
Map<String, Value> map; // ... return map.computeIfAbsent(key, this::calculateValue); |
以上这种写法除了简洁,如果使用的是 java.util.concurrent.ConcurrentHashMap,还能够在并发调用的情况下确保 calculateValue 方法不会被重复调用,保证原子性。
不过,前段时间对 Apache ShardingSphere-Proxy 做压测时遇到一个问题,当 BenchmarkSQL 连接 ShardingSphere Proxy 的 Terminal 数量比较高时,其中一条很简单的插入 SQL 执行延迟增加了很多。借助 Async Profiler 发现 Java 8 ConcurrentHashMap 的 computeIfAbsent 在性能上有坑。
不了解 Apache ShardingSphere 的读者可以参考 https://github.com/apache/shardingsphere。
排查
考虑到当时的压测的现象是 BenchmarkSQL 并发数(Terminals)越高,New Order 业务中一条简单且重复执行的 insert SQL 执行延时越长。但是 ShardingSphere-Proxy 的所在机器的 CPU 也没有压满,考虑是不是 Proxy 代码层面存在瓶颈,于是借助 async-profiler 对压测状态下的 Proxy JVM 采样。
1 |
./profiler.sh -e lock --lock 1ms -d 180 -o jfr -f output.jfr $PID |
关于 async-profiler 可以参考 https://github.com/jvm-profiling-tools/async-profiler,后续我也考虑写一些相关文章。
使用 IDEA 读取采样获得的 jfr 文件,看到 Java Monitor Blocked 事件居然有三百多万次!
根据堆栈,找到 ShardingSphere 这段使用了 computeIfAbsent 代码,以下为节选:
1 2 3 4 5 6 7 8 9 10 11 12 |
// ... private static final Map<String, SQLExecutionUnitBuilder> TYPE_TO_BUILDER_MAP = new ConcurrentHashMap<>(8, 1); // ... public DriverExecutionPrepareEngine(final String type, final int maxConnectionsSizePerQuery, final ExecutorDriverManager<C, ?, ?> executorDriverManager, final StorageResourceOption option, final Collection<ShardingSphereRule> rules) { super(maxConnectionsSizePerQuery, rules); this.executorDriverManager = executorDriverManager; this.option = option; sqlExecutionUnitBuilder = TYPE_TO_BUILDER_MAP.computeIfAbsent(type, key -> TypedSPIRegistry.getRegisteredService(SQLExecutionUnitBuilder.class, key, new Properties())); } // ... |
以上这段代码在每一次 Proxy 与数据库交互前都会执行,即通过 Proxy 执行 CRUD 操作的必经之路,而且里面的 type 目前只有 2 种,分别是 JDBC.STATEMENT 和 JDBC.PREPARED_STATEMENT,所以在高并发的情况下会有大量的线程调用同一个 key 的 computeIfAbsent。
我的理解是,如果在 key 存在的情况下,computeIfAbsent 操作就不存在修改的情况了,直接 get 出来就好,那事实如何?
看一下 computeIfAbsent 方法的实现(JDK 是 Oracle 8u311),节选代码并加了一些注释:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
public V computeIfAbsent(K key, Function<? super K, ? extends V> mappingFunction) { if (key == null || mappingFunction == null) throw new NullPointerException(); int h = spread(key.hashCode()); V val = null; int binCount = 0; for (Node<K,V>[] tab = table;;) { Node<K,V> f; int n, i, fh; if (tab == null || (n = tab.length) == 0) // Map 初始化 tab = initTable(); else if ((f = tabAt(tab, i = (n - 1) & h)) == null) { // key 不存在且 hash 对应的位置还没有东西 Node<K,V> r = new ReservationNode<K,V>(); synchronized (r) { // 初始化 hash 对应的位置,放入 kv 等操作 } } else if ((fh = f.hash) == MOVED) // Map 正忙着扩容 tab = helpTransfer(tab, f); else { // key 的 hash 对应的位置已经存在链表或红黑树 boolean added = false; synchronized (f) { if (tabAt(tab, i) == f) { if (fh >= 0) { // 去链表里面找 key } else if (f instanceof TreeBin) { // 去红黑树里面找 key } } } // 省略部分代码 } } // 省略部分代码 return val; } |
根据我对源码的理解,即使 key 存在,computeIfAbsent 去找 key 的时候,都会进入 synchronized 代码。
那这相比 ConcurrentHashMap 不加锁的 get 操作不就影响性能了吗?Google 一下相应的话题,发现了一些内容:
https://bugs.openjdk.java.net/browse/JDK-8161372
这个问题早就有人提过了,也在 JDK 9 处理了。截至本文编写 JDK 17 已经正式发布了。
解决
在目前 JDK 8 仍然盛行的环境下,我们有必要考虑如何避免上面的问题,于是相应的处理方法就诞生了:https://github.com/apache/shardingsphere/pull/13275/files
1 2 3 4 5 |
SQLExecutionUnitBuilder result; if (null == (result = TYPE_TO_BUILDER_MAP.get(type))) { result = TYPE_TO_BUILDER_MAP.computeIfAbsent(type, key -> TypedSPIRegistry.getRegisteredService(SQLExecutionUnitBuilder.class, key, new Properties())); } return result; |
每次从 Map 中获取 value 前,都先用 get 做一次检查,value 不存在才使用 computeIfAbsent 放入 value。由于 ConcurrentHashMap 的 computeIfAbsent 可以保证操作原子性,这里也不需要自己加 synchronized 或者做多重检查之类的操作。
问题解决~
附:JMH 测试
测试环境
测试代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
package icu.wwj.jmh.dangling; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.Fork; import org.openjdk.jmh.annotations.Level; import org.openjdk.jmh.annotations.Measurement; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.Setup; import org.openjdk.jmh.annotations.State; import org.openjdk.jmh.annotations.Threads; import org.openjdk.jmh.annotations.Warmup; import java.util.Map; import java.util.concurrent.ConcurrentHashMap; @Fork(3) @Warmup(iterations = 3, time = 5) @Measurement(iterations = 3, time = 5) @Threads(16) @State(Scope.Benchmark) public class ConcurrentHashMapBenchmark { private static final String KEY = "key"; private static final Object VALUE = new Object(); private final Map<String, Object> concurrentMap = new ConcurrentHashMap<>(1, 1); @Setup(Level.Iteration) public void setup() { concurrentMap.clear(); } @Benchmark public Object benchGetBeforeComputeIfAbsent() { Object result = concurrentMap.get(KEY); if (null == result) { result = concurrentMap.computeIfAbsent(KEY, __ -> VALUE); } return result; } @Benchmark public Object benchComputeIfAbsent() { return concurrentMap.computeIfAbsent(KEY, __ -> VALUE); } } |
JDK 8 测试结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# JMH version: 1.33 # VM version: JDK 1.8.0_311, Java HotSpot(TM) 64-Bit Server VM, 25.311-b11 # VM invoker: /usr/local/java/jdk1.8.0_311/jre/bin/java # VM options: -Dvisualvm.id=172855224679674 -javaagent:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/lib/idea_rt.jar=38763:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/bin -Dfile.encoding=UTF-8 # Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect) # Warmup: 3 iterations, 5 s each # Measurement: 3 iterations, 5 s each # Timeout: 10 min per iteration # Threads: 16 threads, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchComputeIfAbsent # Run progress: 0.00% complete, ETA 00:03:00 # Fork: 1 of 3 # Warmup Iteration 1: 11173878.242 ops/s # Warmup Iteration 2: 8471364.065 ops/s # Warmup Iteration 3: 8766401.960 ops/s Iteration 1: 8776260.796 ops/s Iteration 2: 8632907.974 ops/s Iteration 3: 8557264.788 ops/s # Run progress: 16.67% complete, ETA 00:02:33 # Fork: 2 of 3 # Warmup Iteration 1: 7757506.431 ops/s # Warmup Iteration 2: 8176991.807 ops/s # Warmup Iteration 3: 8795107.589 ops/s Iteration 1: 8668883.337 ops/s Iteration 2: 8866318.073 ops/s Iteration 3: 8848517.540 ops/s # Run progress: 33.33% complete, ETA 00:02:02 # Fork: 3 of 3 # Warmup Iteration 1: 8154698.571 ops/s # Warmup Iteration 2: 8317945.491 ops/s # Warmup Iteration 3: 8884286.732 ops/s Iteration 1: 8912555.062 ops/s Iteration 2: 8894750.001 ops/s Iteration 3: 8780504.227 ops/s Result "icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchComputeIfAbsent": 8770884.644 ±(99.9%) 210678.797 ops/s [Average] (min, avg, max) = (8557264.788, 8770884.644, 8912555.062), stdev = 125371.573 CI (99.9%): [8560205.847, 8981563.442] (assumes normal distribution) # JMH version: 1.33 # VM version: JDK 1.8.0_311, Java HotSpot(TM) 64-Bit Server VM, 25.311-b11 # VM invoker: /usr/local/java/jdk1.8.0_311/jre/bin/java # VM options: -Dvisualvm.id=172855224679674 -javaagent:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/lib/idea_rt.jar=38763:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/bin -Dfile.encoding=UTF-8 # Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect) # Warmup: 3 iterations, 5 s each # Measurement: 3 iterations, 5 s each # Timeout: 10 min per iteration # Threads: 16 threads, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent # Run progress: 50.00% complete, ETA 00:01:31 # Fork: 1 of 3 # Warmup Iteration 1: 1881091972.510 ops/s # Warmup Iteration 2: 1843432746.197 ops/s # Warmup Iteration 3: 2353506882.860 ops/s Iteration 1: 2389458285.091 ops/s Iteration 2: 2391001171.657 ops/s Iteration 3: 2387181602.010 ops/s # Run progress: 66.67% complete, ETA 00:01:01 # Fork: 2 of 3 # Warmup Iteration 1: 1872514017.315 ops/s # Warmup Iteration 2: 1855584197.510 ops/s # Warmup Iteration 3: 2342392977.207 ops/s Iteration 1: 2378551289.692 ops/s Iteration 2: 2374081014.168 ops/s Iteration 3: 2389909613.865 ops/s # Run progress: 83.33% complete, ETA 00:00:30 # Fork: 3 of 3 # Warmup Iteration 1: 1880210774.729 ops/s # Warmup Iteration 2: 1804266170.900 ops/s # Warmup Iteration 3: 2337740394.373 ops/s Iteration 1: 2363741084.192 ops/s Iteration 2: 2372565304.724 ops/s Iteration 3: 2388015878.515 ops/s Result "icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent": 2381611693.768 ±(99.9%) 16356182.057 ops/s [Average] (min, avg, max) = (2363741084.192, 2381611693.768, 2391001171.657), stdev = 9733301.586 CI (99.9%): [2365255511.711, 2397967875.825] (assumes normal distribution) # Run complete. Total time: 00:03:03 REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. Benchmark Mode Cnt Score Error Units ConcurrentHashMapBenchmark.benchComputeIfAbsent thrpt 9 8770884.644 ± 210678.797 ops/s ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent thrpt 9 2381611693.768 ± 16356182.057 ops/s |
可以看到,两种方式在性能上相差了很多个数量级,直接调用 computeIfAbsent 的性能是每秒百万级,先调用 get 做检查的性能是每秒十亿级,而且这仅仅是 16 线程的测试。
在资源方面,benchComputeIfAbsent 测试期间 CPU 利用率一直维持在 20% 左右;而 benchGetBeforeComputeIfAbsent 测试期间的 CPU 利用率一直 100%。
JDK 17 测试结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# JMH version: 1.33 # VM version: JDK 17.0.1, Java HotSpot(TM) 64-Bit Server VM, 17.0.1+12-LTS-39 # VM invoker: /usr/local/java/jdk-17.0.1/bin/java # VM options: -Dvisualvm.id=173221627574053 -javaagent:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/lib/idea_rt.jar=33189:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/bin -Dfile.encoding=UTF-8 # Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect) # Warmup: 3 iterations, 5 s each # Measurement: 3 iterations, 5 s each # Timeout: 10 min per iteration # Threads: 16 threads, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchComputeIfAbsent # Run progress: 0.00% complete, ETA 00:03:00 # Fork: 1 of 3 # Warmup Iteration 1: 1544327446.565 ops/s # Warmup Iteration 2: 1475077923.449 ops/s # Warmup Iteration 3: 1565544222.606 ops/s Iteration 1: 1564346089.698 ops/s Iteration 2: 1560062375.891 ops/s Iteration 3: 1552569020.412 ops/s # Run progress: 16.67% complete, ETA 00:02:33 # Fork: 2 of 3 # Warmup Iteration 1: 1617143507.004 ops/s # Warmup Iteration 2: 1433136907.916 ops/s # Warmup Iteration 3: 1527623176.866 ops/s Iteration 1: 1522331660.180 ops/s Iteration 2: 1524798683.186 ops/s Iteration 3: 1522686827.744 ops/s # Run progress: 33.33% complete, ETA 00:02:02 # Fork: 3 of 3 # Warmup Iteration 1: 1671732222.173 ops/s # Warmup Iteration 2: 1462966231.429 ops/s # Warmup Iteration 3: 1553792663.545 ops/s Iteration 1: 1549840468.944 ops/s Iteration 2: 1549245571.349 ops/s Iteration 3: 1554801575.735 ops/s Result "icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchComputeIfAbsent": 1544520252.571 ±(99.9%) 27953594.118 ops/s [Average] (min, avg, max) = (1522331660.180, 1544520252.571, 1564346089.698), stdev = 16634735.479 CI (99.9%): [1516566658.453, 1572473846.689] (assumes normal distribution) # JMH version: 1.33 # VM version: JDK 17.0.1, Java HotSpot(TM) 64-Bit Server VM, 17.0.1+12-LTS-39 # VM invoker: /usr/local/java/jdk-17.0.1/bin/java # VM options: -Dvisualvm.id=173221627574053 -javaagent:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/lib/idea_rt.jar=33189:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/bin -Dfile.encoding=UTF-8 # Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect) # Warmup: 3 iterations, 5 s each # Measurement: 3 iterations, 5 s each # Timeout: 10 min per iteration的 # Threads: 16 threads, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent # Run progress: 50.00% complete, ETA 00:01:31 # Fork: 1 of 3 # Warmup Iteration 1: 1813078468.960 ops/s # Warmup Iteration 2: 1944438216.902 ops/s # Warmup Iteration 3: 2232703681.960 ops/s Iteration 1: 2233727123.664 ops/s Iteration 2: 2233657163.983 ops/s Iteration 3: 2229008772.953 ops/s # Run progress: 66.67% complete, ETA 00:01:01 # Fork: 2 of 3 # Warmup Iteration 1: 1767187585.805 ops/s # Warmup Iteration 2: 1900420998.518 ops/s # Warmup Iteration 3: 2175122268.840 ops/s Iteration 1: 2180409680.029 ops/s Iteration 2: 2181398523.091 ops/s Iteration 3: 2176454597.329 ops/s # Run progress: 83.33% complete, ETA 00:00:30 # Fork: 3 of 3 # Warmup Iteration 1: 1822355551.990 ops/s # Warmup Iteration 2: 1832618832.110 ops/s # Warmup Iteration 3: 2225265888.631 ops/s Iteration 1: 2240765668.888 ops/s Iteration 2: 2225847700.599 ops/s Iteration 3: 2232257415.965 ops/s Result "icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent": 2214836294.056 ±(99.9%) 45190341.578 ops/s [Average] (min, avg, max) = (2176454597.329, 2214836294.056, 2240765668.888), stdev = 26892047.412 CI (99.9%): [2169645952.478, 2260026635.633] (assumes normal distribution) # Run complete. Total time: 00:03:03 REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. Benchmark Mode Cnt Score Error Units ConcurrentHashMapBenchmark.benchComputeIfAbsent thrpt 9 1544520252.571 ± 27953594.118 ops/s ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent thrpt 9 2214836294.056 ± 45190341.578 ops/s |
JDK 17 测试结果看来,computeIfAbsent 的性能相比先 get 稍微低一些,但性能至少在同一个数量级上了。而且两个用例运行期间 CPU 都是满载的。
总结
如果在 Java 8 的环境下使用 ConcurrentHashMap,一定要注意是否会并发对同一个 key 调用 computeIfAbsent,如果存在需要先尝试调用 get。
1 2 3 4 5 |
Object result = concurrentMap.get(KEY); if (null == result) { result = concurrentMap.computeIfAbsent(KEY, __ -> VALUE); } return result; |
或者干脆升级到 Java 11 或 Java 17。