Low-Overhead Heap Profiling

Fri Jun 23 16:58:48 UTC 2017

Hi Serguei and all,

Thanks Serguei for your review. I believe I have not answered many
questions but raised more but I hope this helps understand what I am trying
to do.

I have inlined my answers to make it easier to answer everything (and to be
honest, ask questions).

@Thomas: Thank you for your thorough answer as well, I'll be working on it
and will answer when I get the fixes and comments in a handled state, most
likely beginning of next week :)

On Fri, Jun 23, 2017 at 3:01 AM, serguei.spitsyn at oracle.com <
serguei.spitsyn at oracle.com> wrote:

> Hi JC,
>
> Some initial JVMTI-specific questions/comments.
>
> I was not able to apply your patch as the are many merge conflicts.
> Could you, please, re-base it on the latest JDK 10 sources?
>

This is weird, I did a pull right now and it worked. I hesitated I was
using hg correctly so I just did:
hg clone http://hg.openjdk.java.net/jdk10/hs jdk10hs
bash ./get_source.sh
wget http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/hotspot.patch
patch -p1 < hotspot.patch

(Maybe the patch is not the right way of doing things, there might be a hg
equivalent?).

And it seemed to work:
bash ./configure --with-boot-jdk=/path/to/jdk1.8.0-latest
--with-debug-level=slowdebug
make all images JOBS=48

Any idea why it is working for me and not you? I am assuming I am doing
something wrong but I don't know what!

>
> I think, it would make sense to introduce new optional capability,
> something like:
>    can_sample_heap_allocations
>

I will work on adding that for the next webrev, Robbin had mentioned I
should and I dropped the ball on that one.

>
> Do you consider this API to be used by a single agent or it is a
> multi-agent feature?
> What if one agent calls the StartHeapSampling() but another calls one of
> the others.
> The API specs needs to clarify this.
>
> I'm not sure you had a chance to carefully think on what JVMTI phases are
> allowed for new functions.
>

- I am pretty sure I have NOT really thought about this as carefully as you
have because this is the first time I am doing this and these questions
already make me go "hmmm" :)

I have a tendency to work through issues and try to keep the design simple
first. Your question seems to imply I can make this work for a single agent
world and have a means to explicitly disable it for multi-agent for now.

I looked quickly in the prims folder for anything mentioning single agent
vs multi agent but did not find anything. Does this get handled s this more
a documentation thing and tough luck for the user.

This page:
https://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html#environments

seems to say the latter. That the documentation should say that this is not
a multiple agent support (yet?) and there is nothing we can do. Is that
right?

>
>
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/src/
> share/vm/runtime/heapMonitoring.cpp.html
>
> I do not see the variable HeapMonitoring::_enabled is checked in the
> functions:
>     HeapMonitoring::get_live_traces(),
>     HeapMonitoring::get_garbage_traces(),
>     HeapMonitoring::get_frequent_garbage_traces(),
>     HeapMonitoring::release_traces()
>
>
So it is not checked there because the API allows you to:
  - Start sampling (which flips the enabled variable)
  - Stop sampling (which flips back the enabled variable)
  - Get the sampled traces at that point

I think the API documentation is not clear for this right now, so I'll
change it for the next webrev. Let me also know if you think this is a bad
idea. I thought it would be useful because it would allow you to start/stop
the sampling and then later on go back and see what was sampled at a later
time.

> A question about a multi-agent feature from above applies here as well.
>

I'm open to either supporting multi-agent if you think it is important or
doing whatever can be done to, as a first step, make it a single agent
support. Let me know how to do that.

>
>
> Thanks,
> Serguei
>

Thank you!
Jc

>
>
>
>
> On 6/21/17 13:45, JC Beyler wrote:
>
> Hi all,
>
> First off: Thanks again to Robbin and Thomas for their reviews :)
>
> Next, I've uploaded a new webrev:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>
> Here is an update:
>
> - @Robbin, I forgot to say that yes I need to look at implementing this
> for the other architectures and testing it before it is all ready to go. Is
> it common to have it working on all possible combinations or is there a
> subset that I should be doing first and we can do the others later?
> - I've tested slowdebug, built and ran the JTreg tests I wrote with
> slowdebug and fixed a few more issues
> - I've refactored a bit of the code following Thomas' comments
>    - I think I've handled all the comments from Thomas (I put comments
> inline below for the specifics)
>
> The biggest addition to this webrev is that there is more testing, I've
> added two tests for looking specifically at the two garbage sampling data
> Recent vs Frequent:
>    http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
> test/serviceability/jvmti/HeapMonitor/MyPackage/
> HeapMonitorRecentTest.java.patch
>    and
>    http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
> test/serviceability/jvmti/HeapMonitor/MyPackage/
> HeapMonitorFrequentTest.java.patch
>
> I've also refactored the JNI library a bit: http://cr.openjdk.java.net/~
> rasbold/8171119/webrev.06/test/serviceability/jvmti/
> HeapMonitor/libHeapMonitor.c.patch.
>    - Side question: I was looking at trying to make this a multi-file
> library so you would not have all the code in one spot. It seems this is
> not really done?
>       - Is there a solution to this? What I really wanted was:
>         - The core library that will help testing be easier
>         - The JNI code for each Java class separate and calling into the
> core library
>
> - Following Thomas' comments on statistics, I want to add some quality
> assurance tests and find that the easiest way would be to have a few
> counters of what is happening in the sampler and expose that to the user.
>    - I'll be adding that in the next version if no one sees any objections
> to that.
>    - This will allow me to add a sanity test in JTreg about number of
> samples and average of sampling rate
>
> @Thomas: I had a few questions that I inlined below but I will summarize
> the "bigger ones" here:
>    - You mentioned constants are not using the right conventions, I looked
> around and didn't see any convention except normal naming then for static
> constants. Is that right?
>    - You mentioned putting the weak_oops_do in a separate method and
> logging, I inlined my answer below and would appreciate your feedback there
> too.
>
> Thanks again for your reviews and patience!
> Jc
>
> PS: I've also inlined my answers to Thomas below:
>
> On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl <thomas.schatzl at oracle.com
> > wrote:
>
>> Hi all,
>>
>> On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote:
>> > Dear all,
>> >
>> > I've continued working on this and have done the following webrev:
>> > http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
>>
>> [...]
>> > Things I still need to do:
>> >    - Have to fix that TLAB case for the FastTLABRefill
>> >    - Have to start looking at the data to see that it is consistent
>> > and does gather the right samples, right frequency, etc.
>> >    - Have to check the GC elements and what that produces
>> >    - Run a slowdebug run and ensure I fixed all those issues you saw
>> > Robbin
>> >
>> > Thanks for looking at the webrev and have a great week!
>>
>>   scratching a bit on the surface of this change, so apologies for
>> rather shallow comments:
>>
>> - macroAssembler_x86.cpp:5604: while this is compiler code, and I am
>> not sure this is final, please avoid littering the code with TODO
>> remarks :) They tend to be candidates for later wtf moments only.
>>
>> Just file a CR for that.
>>
>>
> Newcomer question: what is a CR and not sure I have the rights to do that
> yet ? :)
>
>
>> - calling HeapMonitoring::do_weak_oops() (which should probably be
>> called weak_oops_do() like other similar methods) only if string
>> deduplication is enabled (in g1CollectedHeap.cpp:4511) seems wrong.
>
>
>> The call should be at least around 6 lines up outside the if.
>>
>> Preferentially in a method like process_weak_jni_handles(), including
>> additional logging. (No new (G1) gc phase without minimal logging :)).
>>
>
> Done but really not sure because:
>
> I put for logging:
>   log_develop_trace(gc, freelist)("G1ConcRegionFreeing [other] : heap
> monitoring");
>
> Since weak_jni_handles didn't have logging for me to be inspired from, I
> did that but unconvinced this is what should be done.
>
>
>>
>> Seeing the changes in referenceProcess.cpp, you need to add the call to
>> HeapMonitoring::do_weak_oops() exactly similar to
>> process_weak_jni_handles() in case there is no reference processing
>> done.
>>
>> - psParallelCompact.cpp:2172 similar to other places where the change
>> adds the call to HeapMonitoring::do_weak_oops(), remove the empty line.
>>
>
> Done from what I can tell in the whole webrev. (The only empty lines I
> still see are when I maintain an empty line, exception being
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/src/
> share/vm/gc/shared/collectedHeap.cpp.patch where I think it was "nicer")
>
>
>>
>> - the change doubles the size of
>> CollectedHeap::allocate_from_tlab_slow() above the "small and nice"
>> threshold. Maybe it could be refactored a bit.
>>
>
> Done I think, it looks better to me :).
>
>
>>
>> - referenceProcessor.cpp:40: please make sure that the includes are
>> sorted alphabetically (I did not check the other files yet).
>>
>
> Done and checked all other files normally.
>
>
>>
>> - referenceProcessor.cpp:261: the change should add logging about the
>> number of references encountered, maybe after the corresponding "JNI
>> weak reference count" log message.
>>
>
> Just to double check, are you saying that you'd like to have the heap
> sampler to keep in store how many sampled objects were encountered in the
> HeapMonitoring::weak_oops_do?
>    - Would a return of the method with the number of handled references
> and logging that work?
>    - Additionally, would you prefer it in a separate block with its
> GCTraceTime?
>
>
>> - threadLocalAllocBuffer.cpp:331: one more "TODO"
>>
>
> Removed it and added it to my personal todos to look at.
>
>
>>
>> - threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class
>> documentation should be updated about the sampling additions. I would
>> have no clue what the difference between "actual_end" and "end" would
>> be from the given information.
>>
>
> If you are talking about the comments in this file, I made them more clear
> I hope in the new webrev. If it was somewhere else, let me know where to
> change.
>
>
>>
>> - threadLocalAllocBuffer.hpp:130: extra whitespace ;)
>>
>
> Fixed :)
>
>
>>
>> - some copyrights :)
>>
>>
> I think I fixed all the issues, if you see specific issues, let me know.
>
>
>> - in heapMonitoring.hpp: there are some random comments about some code
>> that has been grabbed from "util/math/fastmath.[h|cc]". I can't tell
>> whether this is code that can be used but I assume that Noam Shazeer is
>> okay with that (i.e. that's all Google code).
>>
>
> Jeremy and I double checked and we can release that as I thought. I
> removed the comment from that piece of code entirely.
>
>
>>
>> - heapMonitoring.hpp/cpp static constant naming does not correspond to
>> Hotspot's. Additionally, in Hotspot static methods are cased like other
>> methods.
>>
>
> I think I fixed the methods to be cased the same way as all other methods.
> For static constants, I was not sure. I fixed a few other variables but I
> could not seem to really see a consistent trend for constants. I made them
> as variables but I'm not sure now.
>
>
>
>>
>> - in heapMonitoring.cpp there are a few cryptic comments at the top
>> that seem to refer to internal stuff that should probably be removed.
>>
>
> Sorry about that! My personal todos not cleared out.
>
>
>>
>> I did not think through the impact of the TLAB changes on collector
>> behavior yet (if there are). Also I did not check for problems with
>> concurrent mark and SATB/G1 (if there are).
>>
>
> I would love to know your thoughts on this, I think this is fine. I see
> issues with multiple threads right now hitting the stack storage instance.
> Previous webrevs had a mutex lock here but we took it out for simplificity
> (and only for now).
>
>
>>
>> Thanks,
>>   Thomas
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20170623/0279b1c7/attachment-0001.html>