RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

djelinski github.com+30433125+djelinski at openjdk.java.net
Tue Feb 16 19:41:39 UTC 2021


On Wed, 10 Feb 2021 19:19:33 GMT, djelinski <github.com+30433125+djelinski at openjdk.org> wrote:

>>> Thanks for your review! Some comments below.
>>> 
>>> > A full handshake or OCSP status grabbing could counteract all the performance gain with the cache update.
>>> 
>>> Yes, but that's unlikely. Note that K=3 is before K=1 in the queue only because 3 wasn't used since 1 was last used. This means that either K=3 is used less frequently than K=1, or that all cached items are in active use. In the former case we don't lose much by dropping K=3 (granted, there's nothing to offset that). In the latter we are dealing with full cache at all times, which means that most `put()`s would scan the queue, and we will gain a lot by finishing faster.
>> 
>> I may think it differently.  It may be hard to know the future frequency of an cached item based on the past behaviors.  For the above case, I'm not sure that K=3 is used less frequently than K=1.  Maybe, next few seconds, K=1 could be more frequently.
>> 
>> I would like a solution to following the timeout specification: keep the newer items if possible.
>> 
>>> 
>>> > get() [..] without [..] change the order of the queue
>>> 
>>> If we do that, frequently used entries will be evicted at the same age as never used ones. This means we will have to recompute (full handshake/fresh OCSP) both the frequently used and the infrequently used entries. It's better to recompute only the infrequently used ones, and reuse the frequently used as long as possible - we will do less work that way.
>>> That's probably the reason why a `LinkedHashMap` with `accessOrder=true` was chosen as the backing store implementation originally.
>>> 
>> 
>> See above.  It may be true for some case to determine the frequency, but Cache is a general class and we may want to be more careful about if we are really be able to determine the frequency within the Cache implementation.
>> 
>>> > get() performance is more important [..] so we should try to keep the cache small if possible
>>> 
>>> I don't see the link; could you explain?
>>> 
>> 
>> link? Did you mean the link to get() method?  It is a method in the Cache class.
>> 
>>> > In the update, the SoftReference.clear() get removed. I'm not sure of the impact of the enqueued objects any longer. In theory, it could improve the memory use, which could counteract the performance gain in some situation.
>>> 
>>> That's the best part: no objects ever get enqueued! We only called `clear()` right before losing the last reference to `SoftCacheEntry` (which is the `SoftReference`). When GC collects the `SoftReference`, it does not enqueue anything. GC only enqueues the `SoftReference` when it collects the referenced object (session / OCSP response) without collecting the `SoftReference` (cache entry) itself.
>>> This is [documented behavior](https://docs.oracle.com/javase/7/docs/api/java/lang/ref/package-summary.html): _If a registered reference becomes unreachable itself, then it will never be enqueued._
>>> 
>> 
>> I need more time for this section.
>> 
>>> > Could you edit the bug
>>> 
>>> I'd need an account on the bug tracker first.
>> 
>> Okay.  No worries, I will help you if we could get an agreement about the update.
>
>> I may think it differently. It may be hard to know the future frequency of an cached item based on the past behaviors. For the above case, I'm not sure that K=3 is used less frequently than K=1. Maybe, next few seconds, K=1 could be more frequently.
> 
> I agree that such prediction might not be 100% accurate. But, quick google search reveals that there are [many](https://www.usenix.org/system/files/hotstorage20_paper_eytan.pdf) [articles](https://link.springer.com/article/10.1007/PL00009255) that claim that LRU caches offer better hit rates than FIFO, especially for in-memory caches.
>> I would like a solution to following the timeout specification: keep the newer items if possible.
> 
> That's a trivial change; all we need to do is change `true` to `false` [here](https://github.com/openjdk/jdk/blob/abe0e238bd25adb1ddd2b655613899bfa063cd85/src/java.base/share/classes/sun/security/util/Cache.java#L268). But, as stated above, LRU is better than FIFO, so I wouldn't want to do that.
> 
> I could keep LRU and add another linked list that would store items in the order of their expiration dates; then we could quickly scan that list for expired items. Note: the order of expiration dates is not necessarily the order of insertion, because 1) `System.currentTimeMillis()` is not monotonic - it can move back when something changes the system time, 2) the expiration date is calculated at insertion time, so if someone changes the timeout on a non-empty cache, new items may have shorter expiration time than old ones. So, I'd either need to address that first (change `currentTimeMillis` to `nanoTime` and store creation time instead of expiration time), or use insertion sort for adding items (which would get very slow if either of the above mentioned situations happened).
> Let me know your thoughts.

Well, if removing all expired items before evicting live ones is a non-negotiable, implementing all operations in constant time is much easier with FIFO, where we only need to keep one item order.
The new commits contain the following changes:
- use `nanoTime` instead of `currentTimeMillis` to make sure that time never goes back
- store insertion time instead of expiration time, so that older items always expire before newer ones, even when timeout is changed
- change internal hash map to store (and evict) items in insertion (FIFO) order
- always stop scanning entries after finding the first non-expired item, because subsequent items are now guaranteed to have later expiration dates, and collected soft references are handled by reference queue.

tier1 and jdk_security tests passed; benchmark results show only minimal changes. I verified that none of the classes using `Cache` mentions LRU, looks like this was an implementation detail.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2255



More information about the security-dev mailing list