Potential optimizations on management of stack traces
Markus Gronlund
markus.gronlund at oracle.com
Thu Jan 14 17:49:48 UTC 2021
Hi Mukilesh,
You might be interested in taking a look at and maybe trying out https://bugs.openjdk.java.net/browse/JDK-8258826 .
The PR code should be working, at least last time I checked (before X-mas).
Thanks
Markus
-----Original Message-----
From: Mukilesh Sethu <mukilesh.sethu at gmail.com>
Sent: den 14 januari 2021 17:46
To: Erik Gahlin <erik.gahlin at oracle.com>
Cc: hotspot-jfr-dev at openjdk.java.net
Subject: Re: Potential optimizations on management of stack traces
Hey,
Thank you for the response. It makes sense.
In the meantime do you recommend any mitigation steps? I was thinking of trying the following,
1. Reduce the sample set by disabling and enabling the events in a cyclic fashion. (I don't expect any overhead but concerned about the value of data) 2. Set `maxchunksize` to a lower value to rotate the disk frequently thereby clearing the stack repository frequently. (Expecting overhead due to frequent rotations)
Both would definitely have an impact but I would like to try it out and check if it is something we can live with.
Thanks,
Mukilesh
On Thu, Dec 17, 2020 at 9:40 AM Erik Gahlin <erik.gahlin at oracle.com> wrote:
> Hi Mukilesh,
>
> Short term plan is to look into using the stack watermark barrier
> mechanism introduced with JEP 376: ZGC: Concurrent Thread-Stack
> Processing [1].
>
> We believe the cost of walking (and comparing) the whole stack can be
> avoided in many cases. Instead of walking 100 frames, it might be
> sufficient to walk 5 if we can prove the stack has not been popped
> beyond a
> 5 frame watermark. To make this work efficiently, the JFR stack data
> structure and hashing algorithm need to be changed. We believe it is
> best to implement/investigate this before doing other optimizations.
>
> There are also ways we can reduce the cost of the method lookup that
> we have not yet looked into.
>
> Cheers
> Erik
>
> [1] https://openjdk.java.net/jeps/376
>
> > On 16 Dec 2020, at 18:47, Mukilesh Sethu <mukilesh.sethu at gmail.com>
> wrote:
> >
> > Hey,
> >
> > We saw similar overhead with allocation events as mentioned in this
> thread (
> >
> https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2020-July/0016
> 05.html
> )
> > and we did see some improvements with the fix on hashing algorithm
> > but still the overhead was quite high.
> >
> > I was curious about your thoughts on further optimizing it ? Are
> > there
> any
> > plans for it ? I think new allocation event gives full control to
> > the consumers so that it can be fine tuned as per their requirements
> > but max cap would still be the same.
> >
> > I had few optimizations in my mind but please correct me if I am
> > wrong
> here
> > because I am very much new to this codebase,
> >
> > 1. Strip locking - As per my understanding, stack traces collected
> > by JFR are stored in a fixed size global table where each entry is a
> > linked
> list.
> > Index to global table is determined by hash code of stack trace. One
> > of
> the
> > main reasons for the overhead is the global lock acquired when the
> > table
> is
> > updated. This includes comparing stack traces to avoid duplicates.
> >
> > Potential optimization: Can we maintain a lock for each entry in the
> table
> > so that they can be updated independently.
> >
> > Caveat: Clearing table could be quite tricky.
> >
> > 2. Can we take advantage of the fact new stack traces are added to
> > the beginning of the linked list and no individual stack trace from
> > an entry would be deleted independently ? We could have a workflow
> > like,
> >
> > - read last `n` stacktraces from an entry, where `n` is the current
> > size
> of
> > the linked list( `next` of these stacktraces are never changed).
> > - compare it with existing stack trace.
> > - if present, early return
> > - take lock
> > - compare with first m-n stacktraces, where `m` is the new size of
> > the linked list.
> > - if present, early return
> > - else update.
> > - unlock
> >
> > Caveat: Clearing table could be quite tricky here as well. But I
> > believe
> it
> > can be handled as a special case considering it doesn't happen quite
> often.
> >
> > Please let me know your thoughts.
> >
> > Thanks,
> > Mukilesh
>
>
More information about the hotspot-jfr-dev
mailing list