[10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck
Peter Levart
peter.levart at gmail.com
Mon Dec 4 15:52:34 UTC 2017
Hi Ivan,
On 11/23/2017 01:16 AM, Ivan Gerasimov wrote:
> Hi Peter!
>
> Thank you very much for looking into this!
>
>
> On 11/22/17 1:45 AM, Peter Levart wrote:
>> Hi Ivan,
>>
>> Here's my attempt to increase multithreaded scalability of Cache:
>>
>> http://cr.openjdk.java.net/~plevart/jdk10-dev/8186628_ssl_session_cache_scalability/webrev.01/
>>
>>
>> Haven't tested this yet, but I thought that since you already have
>> relevant performance tests, you might want to try this, so I decided
>> to release it as is.
> I plugged your implementation into the benchmark I had [*] and got
> these numbers for the throughput:
>
> Original MemoryCache: 4,116,537 puts/second
> Your variant: 2,942,561 puts/second
>
> So, in the tested scenario performance downgraded by 28%.
Hm. What kind of benchmark did you have? Can you share the source? Was
this multi-threaded concurrent puts or single-threaded? I'm surprised
about the results since I did some benchmarking myself and got entirely
different results. Here's what I did:
http://cr.openjdk.java.net/~plevart/jdk10-dev/8186628_ssl_session_cache_scalability/bench/
You can download the whole directory and just setup the JMH project over
it. The main class is CacheBench. It uses a typical caching scenario:
@Threads(4)
@Benchmark
public Object original() {
Integer key = ThreadLocalRandom.current().nextInt(keyspan);
Object val = origCache.get(key);
if (val == null) {
origCache.put(key, val = Boolean.TRUE);
}
return val;
}
The cache hit-ratio is varied by adjusting cache size and/or the range
of distinct key values produced by RNG. With this setup I get the
following results:
Benchmark (hitratio) (size) (timeout) Mode Cnt
Score Error Units
CacheBench.original 0 1000 86400 avgt 10 18023,156
± 434,710 ns/op
CacheBench.original 5 1000 86400 avgt 10 17645,688
± 767,696 ns/op
CacheBench.original 50 1000 86400 avgt 10 10412,352
± 216,208 ns/op
CacheBench.original 95 1000 86400 avgt 10 2275,257
± 449,302 ns/op
CacheBench.original 100 1000 86400 avgt 10 542,321
± 15,133 ns/op
CacheBench.patched1 0 1000 86400 avgt 10 1685,499
± 24,257 ns/op
CacheBench.patched1 5 1000 86400 avgt 10 1633,203
± 15,045 ns/op
CacheBench.patched1 50 1000 86400 avgt 10 1061,596
± 12,312 ns/op
CacheBench.patched1 95 1000 86400 avgt 10 125,456
± 3,426 ns/op
CacheBench.patched1 100 1000 86400 avgt 10 47,166
± 3,140 ns/op
patched1 is mostly what I had in my proposed code but with a fix in
put() method which could install an entry into the cacheMap after some
other thread already removed it from the DL-list. The race is prevended
by 1st installing a unique reservation value into the cacheMap, then
adding the entry to the DL-list and afterwards trying to replace the
reservation value with the entry and cleaning up if not successful.
If this is not enough, I also tried an approach with lock-free DL-list.
Basically a modified ConcurrentLinkedDequeue where intermediary nodes
may be removed in constant time (important for fast removal of elements
that become soft-reachable). This variant is even more scalable than
lock-based DL-list:
CacheBench.patched2 0 1000 86400 avgt 10 774,356
± 4,113 ns/op
CacheBench.patched2 5 1000 86400 avgt 10 754,222
± 2,956 ns/op
CacheBench.patched2 50 1000 86400 avgt 10 434,969
± 2,359 ns/op
CacheBench.patched2 95 1000 86400 avgt 10 98,352
± 0,750 ns/op
CacheBench.patched2 100 1000 86400 avgt 10 47,692
± 0,995 ns/op
Here's what such MemoryCache looks like in the JDK:
http://cr.openjdk.java.net/~plevart/jdk10-dev/8186628_ssl_session_cache_scalability/webrev.02/
I took the ConcurrentLinkedDequeue by Doug Lea and Martin Buchholz and
removed all public methods keeping just the delicate internal mechanics.
The Node is an interface and can be implemented by different classes.
BasicNode and SoftReferenceNode are provided (and are subclassed in
MemoryCache). Public methods are tailored to Cache use case and deal
with Node<E>(s) rather than E(lement)s.
>
> Still, I think it's important to try to improve performance of the
> MemoryCache and, if possible, remove the points of contention.
So what do you think of the benchmark results?
>
>>
>> Rough sketch of changes:
>> - replaced LinkedHashMap with ConcurrentHashMap
>> - CacheEntry(s) are additionaly linked into a double-linked list.
>> DL-list management is the only synchronization point (similar to
>> Cleaner API) and the rule is: 1st new entry is linked into DL-list,
>> then it is put into map - published. The same with removing: 1st an
>> entry is unlinked from DL-list, then if successfull, removed from map
>> and invalidated.
>> - changed the way expiry is handled so that the whole list is never
>> needed to be scanned.
>>
>> The code speaks for itself.
>>
> Your implementation is somewhat similar to what I had tried before
> coming up with proposal of the option to turn the cache off.
> I used ConcurrentHashMap + ConcurrentLinkedQueue to maintain the FIFO
> order, still the performance was a few percent less then the one of
> the original implementation.
Removing an intermediary element from ConcurrentLinkedQueue takes O(n)
time. This is needed in emptyQueue() for all sofly-reachable-and-cleared
entries and for remove() when application removes entry explicitly from
cache (for example on SSL session termination). We need double-linked
list (like ConcurrentLinkedDequeue) with constant-time removal of
intermediary node(s).
>
> That's really why I decided to split the issue:
> - first, provide an option to turn the cache off (that should be
> easily backported and can provide immediate relief to the customers
> that experience the scalability bottleneck;
> - second, continue work on the cache implementation improvements.
Maybe with a scalable Cache such switch will not be needed. I can
imagine that such switch provides relief for kinds of loads that may
happen in very special circumstances (like DOS attacks for example that
don't re-use SSL sessions), but in normal operation such switch would
degrade performance wouldn't it?
The problem I see is that if you provide such switch now, users may set
it and later when they upgrade to newer JDK with scalable cache, don't
realize that they don't need it any more and consequently unnecessarily
keep paying for it in the future...
Regards, Peter
>
>> Let me know if you find it usefull and/or it solves the scalability
>> bottleneck.
>>
> Yes, I think it's very useful!
>
> However, as I wrote above, I think that the issue needs be split into
> two parts: an option to turn the caching off (which can be easily
> backported) and improving the cache implementation (which can even
> relax the requirements, as the FIFO order or absolutely hard upper
> bound of the cache size).
>
> With kind regards,
> Ivan
More information about the security-dev
mailing list