Potential optimizations on management of stack traces

Thu Dec 17 09:40:21 UTC 2020

Hi Mukilesh,

Short term plan is to look into using the stack watermark barrier mechanism introduced with JEP 376: ZGC: Concurrent Thread-Stack Processing [1]. 

We believe the cost of walking (and comparing) the whole stack can be avoided in many cases. Instead of walking 100 frames, it might be sufficient to walk 5 if we can prove the stack has not been popped beyond a 5 frame watermark. To make this work efficiently, the JFR stack data structure and hashing algorithm need to be changed. We believe it is best to implement/investigate this before doing other optimizations.

There are also ways we can reduce the cost of the method lookup that we have not yet looked into.

Cheers
Erik

[1] https://openjdk.java.net/jeps/376 

> On 16 Dec 2020, at 18:47, Mukilesh Sethu <mukilesh.sethu at gmail.com> wrote:
> 
> Hey,
> 
> We saw similar overhead with allocation events as mentioned in this thread (
> https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2020-July/001605.html)
> and we did see some improvements with the fix on hashing algorithm but
> still the overhead was quite high.
> 
> I was curious about your thoughts on further optimizing it ? Are there any
> plans for it ? I think new allocation event gives full control to the
> consumers so that it can be fine tuned as per their requirements but max
> cap would still be the same.
> 
> I had few optimizations in my mind but please correct me if I am wrong here
> because I am very much new to this codebase,
> 
> 1. Strip locking - As per my understanding, stack traces collected by JFR
> are stored in a fixed size global table where each entry is a linked list.
> Index to global table is determined by hash code of stack trace. One of the
> main reasons for the overhead is the global lock acquired when the table is
> updated. This includes comparing stack traces to avoid duplicates.
> 
> Potential optimization: Can we maintain a lock for each entry in the table
> so that they can be updated independently.
> 
> Caveat: Clearing table could be quite tricky.
> 
> 2. Can we take advantage of the fact new stack traces are added to the
> beginning of the linked list and no individual stack trace from an entry
> would be deleted independently ? We could have a workflow like,
> 
> - read last `n` stacktraces from an entry, where `n` is the current size of
> the linked list( `next` of these stacktraces are never changed).
> - compare it with existing stack trace.
> - if present, early return
> - take lock
> - compare with first m-n stacktraces, where `m` is the new size of the
> linked list.
> - if present, early return
> - else update.
> - unlock
> 
> Caveat: Clearing table could be quite tricky here as well. But I believe it
> can be handled as a special case considering it doesn't happen quite often.
> 
> Please let me know your thoughts.
> 
> Thanks,
> Mukilesh