Low-Overhead Heap Profiling

Fri May 5 06:03:59 UTC 2017

So first off: thank you so much for your diligent look at the patch. A lot
of the comments are due to me either using the wrong baseline (I was using
JDK9) it seems or either that I first worked on keeping things simple, so
maybe there are things that are still in flight and why things might seem
unclear. I will try to rectify most of those in the next webrev.

I will say again that I really do appreciate the time you took to give all
those comments. I was expecting a high level review at first and not an
in-depth one :)

I'll work on the assumption that my code base is wrong and figure out how
to get the right one tomorrow. I've inlined my comments which mostly are
questions to help me fix the code.

On Thu, May 4, 2017 at 2:13 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:

> Hi,
>
> To me the compiler changes looks what is expected.
> It would be good if someone from compiler could take a look at that.
> Added compiler to mail thread.
>
> Also adding Serguei, It would be good with his view also.
>
> My initial take on it, read through most of the code and took it for a
> ride.
>
> ##############################
> - Regarding the compiler changes: I think we need the 'TLAB end' trickery
> (mentioned by Tony P)
> instead of a separate check for sampling in fast path for the final
> version.
>
>
Agreed, I am still working on assessing it and the final version most
likely will be TLAB-based if I can prove to myself and to you the overhead
is acceptable. My initial data and the patch is simple enough that it seems
like it should work. I might send out a webrev with it just to show it and
people can play/comment on it.

- For simplicity, I've removed your actual patches but most are either
because I believe my JDK that I was basing off of was not the right one
compared to the one you are using. If you have a second and can provide me
a link or process on how you get the right mercurial repositories set up,
I'll set myself the same way you are doing it to align myself.

- I'll also double check slowdebug to ensure no problems.

- Classes should extend correct base class for which type of memory is used
> for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
> - The style in heapMonitoring.cpp is a bit different from normal vm-style,
> e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, os::malloc and
> new.
> - In jvmtiHeapTransition.hpp you use C cast instead.
>
>
Noted, I thought I had cleaned up most of them but seemingly not. I'll go
and clean it up and align myself with what I see the surrounding code is
doing between code from runtime and code for the jvmti.

> ##############################
> - This patch I had apply to get traces without setting an ‘unrelated’
> capability
> - Should this not be a new capability?
>
> diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
> --- a/src/share/vm/prims/forte.cpp      Fri Apr 28 15:15:16 2017 +0200
> +++ b/src/share/vm/prims/forte.cpp      Thu May 04 10:24:25 2017 +0200
> @@ -530,6 +530,6 @@
>
> -  if (!JvmtiExport::should_post_class_load()) {
> +/*  if (!JvmtiExport::should_post_class_load()) {
>      trace->num_frames = ticks_no_class_load; // -1
>      return;
> -  }
> +  }*/
>
>
So the way I've been testing all of this for now was by creating a Java
agent that sets everything up. I was waiting to have this conversation for
when we were a bit more into the conversations. Since you have brought it
up:

  caps.can_get_line_numbers = 1;
  caps.can_get_source_file_name = 1;

And I've added some callbacks too to help set things up:

  ernum = global_jvmti->SetEventCallbacks(&callbacks,
sizeof(jvmtiEventCallbacks));
  ernum = global_jvmti->SetEventNotificationMode(JVMTI_ENABLE,
JVMTI_EVENT_VM_INIT, NULL);
  ernum = global_jvmti->SetEventNotificationMode(JVMTI_ENABLE,
JVMTI_EVENT_CLASS_LOAD, NULL);

The class loading callback me to force the creation of the jmethod ids
for the class, which most likely solves your problem here:

Quick question: Does it make sense to put my agent code somewhere in
the webrev for all to see how I've been calling into this?

I was waiting a bit for things to settle a bit first but I could do
that easily. Is there a spot where that would make sense?

##############################
> - forte.cpp: (I know this is not part of your changes but)
> find_jmethod_id_or_null give me NULL for my test.
> It looks like we actually want the regular jmethod_id() ?
>
> Since we are the thread we are talking about (and in same ucontext) and
> thread is in vm and have a last java frame,
> I think most of the checks done in AsyncGetCallTrace is irrelevant, so you
> should be-able to call forte_fill_call_trace_given_top directly.
> But since we might need jmethod_id() if possible to avoid getting method
> id NULL,
> we need some fixes in forte code, or just do the vframStream loop inside
> heapMonitoring.cpp and not use forte.cpp.
>
> Something like:
>
>   if (jthread->has_last_Java_frame()) { // just to be safe
>     vframeStream vfst(jthread);
>     while (!vfst.at_end()) {
>       Method* m = vfst.method();
>       m->jmethod_id();
>       m->line_number_from_bci(vfst.bci());
>       vfst.next();
>     }
>

I have no idea about this and will have to try :). I am open to any
solution that will work and this seems to go around forte.cpp. We have a
few changes in forte.cpp that I have not put here that help get more frames
than the current implementation allows. Let me test this and see if it
works :)

>
> - This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
> Line number should be m->line_number_from_bci(bci);
> Do the heapMonitoring suppose to trace with bci or line number?
> I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or
> use another data structure which says bci.
>

Agreed, I just did not want to create another structure straight away
because I thought it was going to be confusing. + it seems that it is not
necessary automatically. For the bci instead of line number, it is possible
that, because this is code that has been ported over and over, we never
cleaned it up or knew that a method was now available. Let me test it out
and see how it works.

>
> ##############################
> - // TODO(jcbeyler): remove this extra code handling the extra trace for
> Please fix all these TODO's :)
>
>
Will do, they were my trackers of TODOs, I prefetched a bit when I sent out
the webrev. I will try to clean out most of them for the next webrev :)

> ##############################
> - heapMonitoring.hpp:
> // TODO(jcbeyler): is this algorithm acceptable in open source?
>
> Why is this comment here? What is the implication?
> Have you tested any simpler algorithm?
>

Yes sorry, this is me being paranoid and trying to figure out if the
licenses are compatible. This implementation exists in the open source
world but I wanted to double check via this TODO that the original code and
the JDK are compatible in licenses before bringing this in.

>From what I've seen here, this is a known fast log implementation/algorithm
and the license seems fine so I believe it is fine but still need to do the
due diligence. I will double check and ensure it is fine :)

>
> ##############################
> - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for
> building the agent)
>

Sounds good, I will look at how to do that :)

>
> ##############################
> - monitoring_period vs HeapMonitorRate, pick rate or period.
>

Agreed.

>
> ##############################
> - globals.hpp
> Why is MaxHeapTraces not settable/overridable from jvmti interface? That
> would be handy.
>

Just to be clear and complete: MaxHeapTraces will actually disappear in the
future webrev:
  1) This was here only to show how the whole system will look with
especially the general jvmti interface, compiler changes, and hooks
  2) In practice, we keep all live sampled objects and we free up the list
via the GC events of sampled objects
  3) In our implementation and the final one that would be shown here:
      - MaxHeapTraces disappears but we would have the analoguous
MaxGarbageHeapTraces
      - That would could use a flag to be settable/overridable

>
> ##############################
> - jvmtiStackTraceData + ASGCT_CallFrame memory
> Are the agent suppose to loop through and free all ASGCT_CallFrame?
> Wouldn't it be better with some kinda protocol, like:
> (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
> (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);
>

I will change it like you suggest. Our implementation did do the free in
the agent but in the general case, your solution is more logical.

>
> Also using another data structure that have num_traces inside it
> simplifies things.
> So I'm not convinced using the async structure is the best way forward.
>

I'm not yet sure about leaving the async entirely, I will double check on
that before commiting to it. I will let you know.

>
>
> I have more questions, but I think it's better if you respond and update
> the code first.
>

I will work on the code changes and adapt to all your suggestions here.
Hopefully, I've helped answer partially most of them and if you have more,
please feel free to ask them now :) Or, as you said, wait until I send my
next version out if you think that we should first iterate on this before
opening other questions and conversations!

Thank you again for taking the time to do such a thorough review and I hope
I have answered the initial questions well!
Jc

>
> Thanks!
>
> /Robbin
>
>
> On 04/21/2017 11:34 PM, JC Beyler wrote:
>
>> Hi all,
>>
>> I've added size information to the allocation sampling system. This
>> allows the callback to remember the size of each sampled allocation.
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/
>>
>> The new webrev.01 also adds the actual heap monitoring sampling system in
>> files:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/sh
>> are/vm/runtime/heapMonitoring.cpp.patch
>> and
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/sh
>> are/vm/runtime/heapMonitoring.hpp.patch
>>
>> My next step is to add the GC part to the webrev, which will allow users
>> to determine what objects are live and what are garbage.
>>
>> Thanks for your attention and let me know if there are any questions!
>>
>> Have a wonderful Friday!
>> Jc
>>
>> On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <jcbeyler at google.com <mailto:
>> jcbeyler at google.com>> wrote:
>>
>>     Hi all,
>>
>>     I worked on getting a few numbers for overhead and accuracy for my
>> feature. I'm unsure if here is the right place to provide the full data, so
>> I am just summarizing
>>     here for now.
>>
>>     - Overhead of the feature
>>
>>     Using the Dacapo benchmark (http://dacapobench.org/). My initial
>> results are that sampling provides 2.4% with a 512k sampling, 512k being
>> our default setting.
>>
>>     - Note: this was without the tradesoap, tradebeans and tomcat
>> benchmarks since they did not work with my JDK9 (issue between Dacapo and
>> JDK9 it seems)
>>     - I want to rerun next week to ensure number stability
>>
>>     - Accuracy of the feature
>>
>>     I wrote a small microbenchmark that allocates from two different
>> stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90%
>> from stacktrace S2. The
>>     microbenchmark was run 20 times, I averaged the results and looked
>> for accuracy. It seems that statistically it is sound since if I
>> allocated10% S1 and 90% S2, with a
>>     sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.
>>
>>     Let me know if there are any questions on the numbers and if you'd
>> like to see some more data.
>>
>>     Note: this was done using our internal JDK8 implementation since the
>> webrev provided by http://cr.openjdk.java.net/~ra
>> sbold/heapz/webrev.00/index.html
>>     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>> does not yet contain the whole implementation and therefore would have been
>> misleading.
>>
>>     Thanks,
>>     Jc
>>
>>
>>     On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <jcbeyler at google.com
>> <mailto:jcbeyler at google.com>> wrote:
>>
>>         Hi all,
>>
>>         To move the discussion forward, with Chuck Rasbold's help to make
>> a webrev, we pushed this:
>>         http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>>         415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg
>>
>>         This is not a final change that does the whole proposition from
>> the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
>>         <https://bugs.openjdk.java.net/browse/JDK-8177374>; what it does
>> show is parts of the implementation that is proposed and hopefully can
>> start the conversation going
>>         as I work through the details.
>>
>>         For example, the changes to C2 are done here for the allocations:
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/shar
>> e/vm/opto/macro.cpp.patch
>>         <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/sha
>> re/vm/opto/macro.cpp.patch>
>>
>>         Hopefully this all makes sense and thank you for all your future
>> comments!
>>         Jc
>>
>>
>>         On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <jcbeyler at google.com
>> <mailto:jcbeyler at google.com>> wrote:
>>
>>             Hello all,
>>
>>             This is a follow-up from Jeremy's initial email from last
>> year:
>>             http://mail.openjdk.java.net/pipermail/serviceability-dev/20
>> 15-June/017543.html <http://mail.openjdk.java.net/
>> pipermail/serviceability-dev/2015-June/017543.html>
>>
>>             I've gone ahead and started working on preparing this and
>> Jeremy and I went down the route of actually writing it up in JEP form:
>>             https://bugs.openjdk.java.net/browse/JDK-8171119
>>
>>             I think original conversation that happened last year in that
>> thread still holds true:
>>
>>               - We have a patch at Google that we think others might be
>> interested in
>>                  - It provides a means to understand where the allocation
>> hotspots are at a very low overhead
>>                  - Since it is at a low overhead, we can leave it on by
>> default
>>
>>             So I come to the mailing list with Jeremy's initial question:
>>             "I thought I would ask if there is any interest / if I should
>> write a JEP / if I should just forget it."
>>
>>             A year ago, it seemed some thought it was a good idea, is
>> this still true?
>>
>>             Thanks,
>>             Jc
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20170505/9527bbf5/attachment.html>