JEP 331: Low-Overhead Heap Profiling

Tue Apr 10 22:24:12 UTC 2018

Hi all,

Just wanted to come back to one point that was not answered. Aleksey, did
my change to the JEP text with the implementation details answer your
questions/concerns?

Are there any other question/concerns from other people on the mailing list?

Thanks!
Jc

On Thu, Apr 5, 2018 at 3:10 PM serguei.spitsyn at oracle.com <
serguei.spitsyn at oracle.com> wrote:

> Hi Jc,
>
> Ok, thanks!
> Serguei
>
>
> On 4/5/18 15:04, JC Beyler wrote:
>
> Added your comments to the text Serguei and changed the name of the
> section to JVMTI agent.
>
> Thanks!
> Jc
>
> On Thu, Apr 5, 2018 at 2:07 PM serguei.spitsyn at oracle.com <
> serguei.spitsyn at oracle.com> wrote:
>
>> Hi Jc and Aleksey,
>>
>> Please, find a couple of comments below.
>>
>>
>> On 4/5/18 09:05, JC Beyler wrote:
>> > Hi Aleksey,
>> >
>> > I inlined my answers :)
>> >
>> > On Thu, Apr 5, 2018 at 4:09 AM Aleksey Shipilev <shade at redhat.com>
>> wrote:
>> >
>> >> On 04/05/2018 12:23 AM, JC Beyler wrote:
>> >>> On Wed, Apr 4, 2018 at 3:41 AM Aleksey Shipilev <shade at redhat.com
>> >> <mailto:shade at redhat.com>> wrote:
>> >>> *) It would be nice to mention the implementation details in the JEP
>> >> itself, i.e. where are the
>> >>>     points it injects into GCs to sample? I assume it has to inject
>> into
>> >> CollectedHeap::allocate_tlab,
>> >>>     and it has to cap the max TLAB sizes to get into allocation
>> slowpath
>> >> often enough?
>> >>> My understanding was that a JEP was the idea and specification and
>> that
>> >> more technical information
>> >>> like that was out of scope for the JEP (implementations can change,
>> etc.)
>> >> Well, yes. But if you have the implementation ideas, it is better to
>> >> demonstrate them along with the
>> >> idea. Discussing implementation approaches serves several purposes: a)
>> it
>> >> empirically proves the
>> >> idea is implementable; b) it highlights tricky design decisions the
>> >> implementation has to force,
>> >> which aids the understanding of the scope; c) it prevents handwaving
>> >> against existing approaches :)
>> >>
>> > Fair enough, I've added an implementation design/state with a link to
>> the
>> > current webrev at the end of the JEP issue
>> > <https://bugs.openjdk.java.net/browse/JDK-8171119>. Let me know if
>> that is
>> > what you had in mind.
>> >
>> >
>> >>> It actually does not cap the max TLAB sizes, it changes the end
>> pointer
>> >> to force paths into
>> >>> "thinking" the tlab is full; then in the slowpath it samples and fixes
>> >> things the pointers up for
>> >>> the next sample.
>> >> Ooof. So this has implications for JEP scope, and thus should be
>> mentioned.
>> >>
>> > Perhaps, again I saw the JEP as a different level of abstraction and
>> this
>> > is more in the details of implementation. However, I have added a full
>> > explanation of this into the JEP at the end, let me know if it makes
>> sense.
>> > For convenience, let me copy-paste what I wrote there:
>> >
>> > "
>> > 2) The TLAB structure is augmented with a new allocation_end pointer
>> and a
>> > current_end pointer. If the sampling is disabled, the two pointers are
>> > always equal and the code performs as before. If the sampling is
>> enabled,
>> > the current_end is modified to be where the next sample point is
>> requested.
>> > Then, any fast path will "think" the TLAB is full at that point and go
>> down
>> > the slow path, which is explained in (3)
>> > "
>> >
>> >
>> >>
>> >>>      *) Since JC apparently has the prototype, it would be easier to
>> put
>> >> it somewhere, and link it into
>> >>>      the JEP itself. Webrevs are interesting, but they get outdated
>> >> pretty quickly, so maybe putting the
>> >>>      whole thing in JDK Sandbox [1] is the way to go.
>> >>>
>> >>> I've been keeping the webrevs up to date so there should be no real
>> >> problem. From what I read, you
>> >>> have to be a commiter for the JDK Sandbox, no? So I'm not sure that
>> >> would make sense there?
>> >>
>> >> Right, you have to be a Committer. Link the webrev to the JEP then?
>> >>
>> > I added the link and a few paragraphs on the implementation.
>> >
>> >
>> >>
>> >>>      *) Motivation says: "The downsides [of JFR] are that a) it is
>> tied
>> >> to a particular allocation
>> >>>      implementation (TLABs), and misses allocations that don’t meet
>> that
>> >> pattern; b) it doesn’t allow the
>> >>>      user to customize the sampling rate; c) it only logs
>> allocations, so
>> >> you cannot distinguish between
>> >>>      live and dead objects."
>> >>>
>> >>>       ...but then JEP apparently describes sampling the allocations
>> via
>> >> TLABs? So, the real difference is
>> >>>      (b), allowing to customize the sampling rate, or do I miss
>> something?
>> >>>
>> >>> There are various differences between the JFR tlab events and this
>> >> system. First the JFR system
>> >>> provides a buffer event system, meaning you can miss samples if the
>> >> event buffer is full and threw
>> >>> out a sampling event before a reader got to it. Second, you don't get
>> a
>> >> callback at the allocation
>> >>> spot, so you cannot have a means to do an action at that sampling
>> point,
>> >> which means you have no way
>> >>> of knowing when an object is effectively dead using the JFR events.
>> >> Hopefully that makes sense?
>> >>
>> >> This paragraph should be in JEP text then?
>> >>
>> > Done, I revamped and edited the section into this now:
>> >
>> >
>> > "There are multiple alternatives to the system presented in this JEP.
>> The
>> > introduction presented two already: The Java Flight Recorder
>> > <http://openjdk.java.net/jeps/328> system provides an interesting
>> > alternative but is not perfect due to it not allowing the sampling size
>> to
>> > be set and not providing a callback.
>> >
>> > The JFR system does use the TLAB creation as a means to track memory
>> > allocation but, instead of a callback, JFR events use a buffer system
>> that
>> > can lead to missing some sampled allocations. Finally, the JFR event
>> system
>> > does not provide a means to track objects that have been garbage
>> collected,
>> > which means it is not possible currently to have a system provide
>> > information about live and garbage collected objects using the JFR event
>> > system."
>>
>> I'd like to highlight an important difference with the JFR.
>> The JEP adds new feature into the JVMTI which is an important
>> API/framework for various development and monitoring tools.
>> Now, a JVMTI agent can use a low overhead heap profiling API along with
>> the rest of JVMTI functionality.
>> It provides great flexibility to the tools.
>> For instance, it is up to the agent to decide if a stack trace needs to
>> be collected at each event point.
>>
>>
>> > Let me know if that works for you.
>> >
>> >
>> >
>> >>>      *) Goals say: "Can give information about both live and dead Java
>> >> objects."
>> >>>       ...but there is not discussion what/how does it give information
>> >> about the dead Java objects. I am
>> >>>      struggling to imagine how allocation sampling would give this. Is
>> >> the goal too broad, or some API is
>> >>>      not described in the JEP?
>> >>>
>> >>> Originally the JEP provided a means to the user to get that
>> information
>> >> directly. Now because the
>> >>> sampling callback provides an oop, the user in the agent can add a
>> weak
>> >> reference and use that to
>> >>> determine liveness.
>> >
>> >> Ooof! I guess that technically works. Please mention it.
>> >>
>> > I did already add it here:
>> >
>> >
>> > "E) What the Java agent can do
>>
>> I'd suggest to replace the "Java agent" with the "JVMTI agent" for
>> accuracy.
>> The term "Java agent" is used for JPLIS agents that are based on the
>> java.lang.instrument API:
>>
>> https://docs.oracle.com/javase/8/docs/technotes/guides/instrumentation/index.html
>>
>>
>> Thanks,
>> Serguei
>>
>> > The user of the callback can then pick up a stacktrace at the moment of
>> the
>> > callback using the JVMTI GetStackTrace method for example. The oop
>> obtained
>> > by the callback can be also wrapped into a JNI weak reference to help
>> > determine when the object has been garbage collected. The idea behind
>> that
>> > is to provide data on what objects were sampled and are still considered
>> > live or garbage collected, which can be a good means to understand the
>> > job's behavior.
>> >
>> > The sampling rate will provide a different sampling precision but also
>> can
>> > be a means to mitigate overhead due to the profiling. Using a sampling
>> rate
>> > of 512k and the sampling solution, the overhead should be low enough
>> that a
>> > user could reasonably leave the system on by default."
>> >
>> >
>> > ------
>> >
>> >
>> > I also have the proof of concept in tests here
>> > <
>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorGCTest.java.html
>> >
>> > and
>> > the native implementation is here
>> > <
>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.html
>> >
>> > .
>> >
>> > Let me know if my additions to the JEP are what you were looking for
>> and is
>> > there anything else you think I should add information about!
>> >
>> > Thanks for reviewing it!
>> > Jc
>>
>>
>