Potential optimizations on management of stack traces

Thu Jan 14 17:49:48 UTC 2021

Hi Mukilesh,

You might be interested in taking a look at and maybe trying out https://bugs.openjdk.java.net/browse/JDK-8258826 .

The PR code should be working, at least last time I checked (before X-mas).

Thanks
Markus

-----Original Message-----
From: Mukilesh Sethu <mukilesh.sethu at gmail.com> 
Sent: den 14 januari 2021 17:46
To: Erik Gahlin <erik.gahlin at oracle.com>
Cc: hotspot-jfr-dev at openjdk.java.net
Subject: Re: Potential optimizations on management of stack traces

Hey,

Thank you for the response. It makes sense.

In the meantime do you recommend any mitigation steps? I was thinking of trying the following,

1. Reduce the sample set by disabling and enabling the events in a cyclic fashion. (I don't expect any overhead but concerned about the value of data) 2. Set `maxchunksize` to a lower value to rotate the disk frequently thereby clearing the stack repository frequently. (Expecting overhead due to frequent rotations)

Both would definitely have an impact but I would like to try it out and check if it is something we can live with.

Thanks,
Mukilesh

On Thu, Dec 17, 2020 at 9:40 AM Erik Gahlin <erik.gahlin at oracle.com> wrote:

> Hi Mukilesh,
>
> Short term plan is to look into using the stack watermark barrier 
> mechanism introduced with JEP 376: ZGC: Concurrent Thread-Stack 
> Processing [1].
>
> We believe the cost of walking (and comparing) the whole stack can be 
> avoided in many cases. Instead of walking 100 frames, it might be 
> sufficient to walk 5 if we can prove the stack has not been popped 
> beyond a
> 5 frame watermark. To make this work efficiently, the JFR stack data 
> structure and hashing algorithm need to be changed. We believe it is 
> best to implement/investigate this before doing other optimizations.
>
> There are also ways we can reduce the cost of the method lookup that 
> we have not yet looked into.
>
> Cheers
> Erik
>
> [1] https://openjdk.java.net/jeps/376
>
> > On 16 Dec 2020, at 18:47, Mukilesh Sethu <mukilesh.sethu at gmail.com>
> wrote:
> >
> > Hey,
> >
> > We saw similar overhead with allocation events as mentioned in this
> thread (
> >
> https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2020-July/0016
> 05.html
> )
> > and we did see some improvements with the fix on hashing algorithm 
> > but still the overhead was quite high.
> >
> > I was curious about your thoughts on further optimizing it ? Are 
> > there
> any
> > plans for it ? I think new allocation event gives full control to 
> > the consumers so that it can be fine tuned as per their requirements 
> > but max cap would still be the same.
> >
> > I had few optimizations in my mind but please correct me if I am 
> > wrong
> here
> > because I am very much new to this codebase,
> >
> > 1. Strip locking - As per my understanding, stack traces collected 
> > by JFR are stored in a fixed size global table where each entry is a 
> > linked
> list.
> > Index to global table is determined by hash code of stack trace. One 
> > of
> the
> > main reasons for the overhead is the global lock acquired when the 
> > table
> is
> > updated. This includes comparing stack traces to avoid duplicates.
> >
> > Potential optimization: Can we maintain a lock for each entry in the
> table
> > so that they can be updated independently.
> >
> > Caveat: Clearing table could be quite tricky.
> >
> > 2. Can we take advantage of the fact new stack traces are added to 
> > the beginning of the linked list and no individual stack trace from 
> > an entry would be deleted independently ? We could have a workflow 
> > like,
> >
> > - read last `n` stacktraces from an entry, where `n` is the current 
> > size
> of
> > the linked list( `next` of these stacktraces are never changed).
> > - compare it with existing stack trace.
> > - if present, early return
> > - take lock
> > - compare with first m-n stacktraces, where `m` is the new size of 
> > the linked list.
> > - if present, early return
> > - else update.
> > - unlock
> >
> > Caveat: Clearing table could be quite tricky here as well. But I 
> > believe
> it
> > can be handled as a special case considering it doesn't happen quite
> often.
> >
> > Please let me know your thoughts.
> >
> > Thanks,
> > Mukilesh
>
>