RFR JDK-8187653: Lock in CoderResult.Cache becomes performance bottleneck

Fri Mar 2 00:35:00 UTC 2018

Hi Sherman,

On 24/02/2018 11:26 AM, Xueming Shen wrote:
> Hi,
> 
> Please help review the proposed change to remove the potential performance
> bottleneck in CoderResult caused by the "over" synchronization the cache
> mechanism.
> 
> issue: https://bugs.openjdk.java.net/browse/JDK-8187653
> webrev: http://cr.openjdk.java.net/~sherman/8187653/webrev
> 
> Notes:
> While the original test case (new String/String.getBytes() with UTF8 as 
> showed in the
> stacktrace) described in the bug report might no longer be reproducible 
> in jdk9, as
> we have been optimizing lots of String related  char[]/byte[] coding 
> path away from
> the paths that use CoderResult. But it is still a concern/issue for the 
> "general"
> CharsetDe/Encoder.de/encode() operation, in which the malformed or 
> unmappable
> CoderResult object is returned.
> 
> As showed in the "[synchronized]" section in bm.scores [1], which is 
> from the simple
> jmh benchmark test CoderResultBM.java [2], the sores are getting 
> significant worse
> when the number of concurrent de/encoding threads gets bigger.
> 
> It appears the easy fix is to replace the sync mechanism from "method 
> synchronized
> + HashMap" to  "ConcurrentHashMap" solves the problem, as showed in the 
> same bm
> result [1] in [ConcurrentHashMap] section, because most of the accesses 
> to the caching
> HashMap is read operation. The ConcurrentHahsMap's "almost non-block for 
> retrieval
> operation" really helps here.

When you replace synchronized code with concurrent data structures you 
introduce race conditions that are precluded in the synchronized code. 
These need to be examined carefully to ensure they are safe. For 
example, whenever you replace a HashMap with a ConcurrentHashMap you 
need to see if put() needs to be replaced by putIfAbsent().

David
-----

> There is another fact that might help us optimize further here. For most 
> of our charsets
> in JDK repository (with couple exceptions), the length of malformed and 
> unmappable
> CoderResult that these charsets might trigger actually is never longer 
> than 4. So we might
> not need to access the ConcurrentHashMap cache at all in normal use 
> scenario. I'm putting
> a CoderResult[4] on top the hashmap cache in proposed webrev. It does 
> not improve the
> performance significantly, but when the thread number gets bigger, a 
> 10%+ improvement
> is observed. So I would assume it might be something worth doing, given 
> its cost is really
> low.
> 
> Thanks,
> Sherman
> 
> 
> [1] http://cr.openjdk.java.net/~sherman/8187653/bm.scores
> [2] http://cr.openjdk.java.net/~sherman/8187653/CoderResultBM.java
> [3] http://cr.openjdk.java.net/~sherman/8187653/bm.scores.ms932
>