Potential optimizations on management of stack traces

Thu Jan 14 16:45:42 UTC 2021

Hey,

Thank you for the response. It makes sense.

In the meantime do you recommend any mitigation steps? I was thinking of
trying the following,

1. Reduce the sample set by disabling and enabling the events in a cyclic
fashion. (I don't expect any overhead but concerned about the value of data)
2. Set `maxchunksize` to a lower value to rotate the disk frequently
thereby clearing the stack repository frequently. (Expecting overhead due
to frequent rotations)

Both would definitely have an impact but I would like to try it out and
check if it is something we can live with.

Thanks,
Mukilesh

On Thu, Dec 17, 2020 at 9:40 AM Erik Gahlin <erik.gahlin at oracle.com> wrote:

> Hi Mukilesh,
>
> Short term plan is to look into using the stack watermark barrier
> mechanism introduced with JEP 376: ZGC: Concurrent Thread-Stack Processing
> [1].
>
> We believe the cost of walking (and comparing) the whole stack can be
> avoided in many cases. Instead of walking 100 frames, it might be
> sufficient to walk 5 if we can prove the stack has not been popped beyond a
> 5 frame watermark. To make this work efficiently, the JFR stack data
> structure and hashing algorithm need to be changed. We believe it is best
> to implement/investigate this before doing other optimizations.
>
> There are also ways we can reduce the cost of the method lookup that we
> have not yet looked into.
>
> Cheers
> Erik
>
> [1] https://openjdk.java.net/jeps/376
>
> > On 16 Dec 2020, at 18:47, Mukilesh Sethu <mukilesh.sethu at gmail.com>
> wrote:
> >
> > Hey,
> >
> > We saw similar overhead with allocation events as mentioned in this
> thread (
> >
> https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2020-July/001605.html
> )
> > and we did see some improvements with the fix on hashing algorithm but
> > still the overhead was quite high.
> >
> > I was curious about your thoughts on further optimizing it ? Are there
> any
> > plans for it ? I think new allocation event gives full control to the
> > consumers so that it can be fine tuned as per their requirements but max
> > cap would still be the same.
> >
> > I had few optimizations in my mind but please correct me if I am wrong
> here
> > because I am very much new to this codebase,
> >
> > 1. Strip locking - As per my understanding, stack traces collected by JFR
> > are stored in a fixed size global table where each entry is a linked
> list.
> > Index to global table is determined by hash code of stack trace. One of
> the
> > main reasons for the overhead is the global lock acquired when the table
> is
> > updated. This includes comparing stack traces to avoid duplicates.
> >
> > Potential optimization: Can we maintain a lock for each entry in the
> table
> > so that they can be updated independently.
> >
> > Caveat: Clearing table could be quite tricky.
> >
> > 2. Can we take advantage of the fact new stack traces are added to the
> > beginning of the linked list and no individual stack trace from an entry
> > would be deleted independently ? We could have a workflow like,
> >
> > - read last `n` stacktraces from an entry, where `n` is the current size
> of
> > the linked list( `next` of these stacktraces are never changed).
> > - compare it with existing stack trace.
> > - if present, early return
> > - take lock
> > - compare with first m-n stacktraces, where `m` is the new size of the
> > linked list.
> > - if present, early return
> > - else update.
> > - unlock
> >
> > Caveat: Clearing table could be quite tricky here as well. But I believe
> it
> > can be handled as a special case considering it doesn't happen quite
> often.
> >
> > Please let me know your thoughts.
> >
> > Thanks,
> > Mukilesh
>
>