Low-Overhead Heap Profiling

Wed May 17 03:47:06 UTC 2017

Dear Robbin,

Thank you for the answers, much appreciated :)
Jc

On Tue, May 16, 2017 at 5:20 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:

> Just a few answers,
>
> On 05/15/2017 06:48 PM, JC Beyler wrote:
>
>> Dear all,
>>
>> I've updated the webrev to:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
>>
>
> I'll look at this later, thanks!
>
>
>> Robbin,
>> I believe I have addressed most of your items with webrev 02:
>>    - I added a JTreg test to show how it works:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_fi
>> les/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_f
>> iles/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
>>   - I've modified the code to use its own data structures both internally
>> and externally, this will make it easier to move out of AsyncGetCallTrace
>> as we move forward, that is still on my TODOs
>>   - I cleaned up the JVMTI API by passing a structure that handles the
>> num_traces and put in a ReleaseTraces as well
>>   - I cleaned up other issues as well.
>>
>> However, I have three questions, which are probably because I'm new in
>> this community:
>>   1) My previous webrevs were based off of JDK9 by mistake. When I took
>> JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10 <
>> http://hg.openjdk.java.net/jdk10/jdk10> jdk10
>>       - I don't see code compatible with what you were showing (ie your
>> patches don't make sense for that code base; ex: klass is still accessed
>> via klass() for example in collectedHeap.inline.hpp)
>>       - Would you know what is the right hg clone command so we are
>> working on the same code base?
>>
>
> We use jdk10-hs, e.g.
> hg tclone http://hg.openjdk.java.net/jdk10/hs 10-hs
>
> There is sporadic big merges going from jdk9->jdk10->jdk10-hs and
> jdk10-hs->jdk10, so 10 is moving...
>
>
>>   2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I
>> cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should
>> I use. It might be that I don't understand when one uses one or the other
>> but I see both used around the code base?
>>     - Is it that new is to be used for anything internal and
>> NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?
>>
>
> We overload new operator when you extend correct base class, e.g.
> CHeapObj<mtInternal> so use 'new'
> But for arrays you will need the macro NEW_C_HEAP_ARRAY.
>
>
>>   3) Casts: same kind question: which should I use. The code was using a
>> bit of everything, I'll refactor it entirely but I was not clear if I
>> should go to C casts or C++ casts as I see both in the codebase. What is
>> the convention I should use?
>>
>
> Just be consist, use what suites you, C++ casts might be preferable, if we
> are moving towards C++11.
> And use 'right' cast, e.g. going from Thread* to JavaThread* you should
> use C cast or static_cast, not reinterpret_cast I would say.
>
>
>> Final notes on this webrev:
>>    - I am still missing:
>>      - Putting a TLAB implementation so that we can compare both webrevs
>>      - Have not tried to circumvent AsyncGetCallTrace
>>      - Putting in the handling of GC'd objects
>>      - Fix a stack walker issue I have seen, I think I know the problem
>> and will test that theory out for the next webrev
>>
>> I will work on integrating those items for the next webrev!
>>
>
> Thanks!
>
>
>> Thanks for your help,
>> Jc
>>
>> Ps:  I tested this on a new repo:
>>
>> hg clone http://hg.openjdk.java.net/jdk10/jdk10 <
>> http://hg.openjdk.java.net/jdk10/jdk10> jdk10
>> ... building it
>> cd test
>> jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server
>> -release/support/test/hotspot/jtreg/native/lib/ -jdk
>> <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk
>> ../hotspot/test/serviceability/jvmti/HeapMonitor/
>>
>>
> I'll test it out!
>
> /Robbin
>
>
>>
>> On Thu, May 4, 2017 at 11:21 PM, serguei.spitsyn at oracle.com <mailto:
>> serguei.spitsyn at oracle.com> <serguei.spitsyn at oracle.com <mailto:
>> serguei.spitsyn at oracle.com>> wrote:
>>
>>     Robbin,
>>
>>     Thank you for forwarding!
>>     I will review it.
>>
>>     Thanks,
>>     Serguei
>>
>>
>>
>>     On 5/4/17 02:13, Robbin Ehn wrote:
>>
>>         Hi,
>>
>>         To me the compiler changes looks what is expected.
>>         It would be good if someone from compiler could take a look at
>> that.
>>         Added compiler to mail thread.
>>
>>         Also adding Serguei, It would be good with his view also.
>>
>>         My initial take on it, read through most of the code and took it
>> for a ride.
>>
>>         ##############################
>>         - Regarding the compiler changes: I think we need the 'TLAB end'
>> trickery (mentioned by Tony P)
>>         instead of a separate check for sampling in fast path for the
>> final version.
>>
>>         ##############################
>>         - This patch I had to apply to get it compile on JDK 10:
>>
>>         diff -r ac3ded340b35 src/share/vm/gc/shared/collect
>> edHeap.inline.hpp
>>         --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr
>> 28 14:31:38 2017 +0200
>>         +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May
>> 04 10:22:56 2017 +0200
>>         @@ -87,3 +87,3 @@
>>               // support for object alloc event (no-op most of the time)
>>         -    if (klass() != NULL && klass()->name() != NULL) {
>>         +    if (klass != NULL && klass->name() != NULL) {
>>                 Thread *base_thread = Thread::current();
>>         diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
>>         --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28
>> 14:31:38 2017 +0200
>>         +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04
>> 10:22:56 2017 +0200
>>         @@ -316,3 +316,3 @@
>>             JavaThread *thread = reinterpret_cast<JavaThread
>> *>(Thread::current());
>>         -  assert(o->size() << LogHeapWordSize == byte_size,
>>         +  assert(o->size() << LogHeapWordSize == (long)byte_size,
>>                    "Object size is incorrect.");
>>
>>         ##############################
>>         - This patch I had to apply to get it not asserting during
>> slowdebug:
>>
>>         --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28
>> 15:15:16 2017 +0200
>>         +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04
>> 10:24:25 2017 +0200
>>         @@ -32,3 +32,3 @@
>>           // TODO(jcbeyler): should we make this into a JVMTI structure?
>>         -struct StackTraceData {
>>         +struct StackTraceData : CHeapObj<mtInternal> {
>>             ASGCT_CallTrace *trace;
>>         @@ -143,3 +143,2 @@
>>           StackTraceStorage::StackTraceStorage() :
>>         -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
>>               _allocated_traces_size(MaxHeapTraces),
>>         @@ -147,2 +146,3 @@
>>               _allocated_count(0) {
>>         +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*,
>> MaxHeapTraces, mtInternal);
>>             memset(_allocated_traces, 0, sizeof(*_allocated_traces) *
>> MaxHeapTraces);
>>         @@ -152,3 +152,3 @@
>>           StackTraceStorage::~StackTraceStorage() {
>>         -  delete[] _allocated_traces;
>>         +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
>>           }
>>
>>         - Classes should extend correct base class for which type of
>> memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
>>         - The style in heapMonitoring.cpp is a bit different from normal
>> vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY,
>> os::malloc and new.
>>         - In jvmtiHeapTransition.hpp you use C cast instead.
>>
>>         ##############################
>>         - This patch I had apply to get traces without setting an
>> ‘unrelated’ capability
>>         - Should this not be a new capability?
>>
>>         diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
>>         --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017
>> +0200
>>         +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017
>> +0200
>>         @@ -530,6 +530,6 @@
>>
>>         -  if (!JvmtiExport::should_post_class_load()) {
>>         +/*  if (!JvmtiExport::should_post_class_load()) {
>>               trace->num_frames = ticks_no_class_load; // -1
>>               return;
>>         -  }
>>         +  }*/
>>
>>         ##############################
>>         - forte.cpp: (I know this is not part of your changes but)
>>         find_jmethod_id_or_null give me NULL for my test.
>>         It looks like we actually want the regular jmethod_id() ?
>>
>>         Since we are the thread we are talking about (and in same
>> ucontext) and thread is in vm and have a last java frame,
>>         I think most of the checks done in AsyncGetCallTrace is
>> irrelevant, so you should be-able to call forte_fill_call_trace_given_top
>> directly.
>>         But since we might need jmethod_id() if possible to avoid getting
>> method id NULL,
>>         we need some fixes in forte code, or just do the vframStream loop
>> inside heapMonitoring.cpp and not use forte.cpp.
>>
>>         Something like:
>>
>>            if (jthread->has_last_Java_frame()) { // just to be safe
>>              vframeStream vfst(jthread);
>>              while (!vfst.at_end()) {
>>                Method* m = vfst.method();
>>                m->jmethod_id();
>>                m->line_number_from_bci(vfst.bci());
>>                vfst.next();
>>              }
>>
>>         - This is a bit confusing in forte.cpp,
>> trace->frames[count].lineno = bci.
>>         Line number should be m->line_number_from_bci(bci);
>>         Do the heapMonitoring suppose to trace with bci or line number?
>>         I would say bci, meaning we should either rename
>> ASGCT_CallFrame→lineno or use another data structure which says bci.
>>
>>         ##############################
>>         - // TODO(jcbeyler): remove this extra code handling the extra
>> trace for
>>         Please fix all these TODO's :)
>>
>>         ##############################
>>         - heapMonitoring.hpp:
>>         // TODO(jcbeyler): is this algorithm acceptable in open source?
>>
>>         Why is this comment here? What is the implication?
>>         Have you tested any simpler algorithm?
>>
>>         ##############################
>>         - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk
>> for building the agent)
>>
>>         ##############################
>>         - monitoring_period vs HeapMonitorRate, pick rate or period.
>>
>>         ##############################
>>         - globals.hpp
>>         Why is MaxHeapTraces not settable/overridable from jvmti
>> interface? That would be handy.
>>
>>         ##############################
>>         - jvmtiStackTraceData + ASGCT_CallFrame memory
>>         Are the agent suppose to loop through and free all
>> ASGCT_CallFrame?
>>         Wouldn't it be better with some kinda protocol, like:
>>         (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
>>         (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);
>>
>>         Also using another data structure that have num_traces inside it
>> simplifies things.
>>         So I'm not convinced using the async structure is the best way
>> forward.
>>
>>
>>         I have more questions, but I think it's better if you respond and
>> update the code first.
>>
>>         Thanks!
>>
>>         /Robbin
>>
>>
>>         On 04/21/2017 11:34 PM, JC Beyler wrote:
>>
>>             Hi all,
>>
>>             I've added size information to the allocation sampling
>> system. This allows the callback to remember the size of each sampled
>> allocation.
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
>>
>>             The new webrev.01 also adds the actual heap monitoring
>> sampling system in files:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/sh
>> are/vm/runtime/heapMonitoring.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/s
>> hare/vm/runtime/heapMonitoring.cpp.patch>
>>             and
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/sh
>> are/vm/runtime/heapMonitoring.hpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/s
>> hare/vm/runtime/heapMonitoring.hpp.patch>
>>
>>             My next step is to add the GC part to the webrev, which will
>> allow users to determine what objects are live and what are garbage.
>>
>>             Thanks for your attention and let me know if there are any
>> questions!
>>
>>             Have a wonderful Friday!
>>             Jc
>>
>>             On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <
>> jcbeyler at google.com <mailto:jcbeyler at google.com> <mailto:
>> jcbeyler at google.com <mailto:jcbeyler at google.com>>> wrote:
>>
>>                  Hi all,
>>
>>                  I worked on getting a few numbers for overhead and
>> accuracy for my feature. I'm unsure if here is the right place to provide
>> the full data, so I am just
>>             summarizing
>>                  here for now.
>>
>>                  - Overhead of the feature
>>
>>                  Using the Dacapo benchmark (http://dacapobench.org/).
>> My initial results are that sampling provides 2.4% with a 512k sampling,
>> 512k being our default setting.
>>
>>                  - Note: this was without the tradesoap, tradebeans and
>> tomcat benchmarks since they did not work with my JDK9 (issue between
>> Dacapo and JDK9 it seems)
>>                  - I want to rerun next week to ensure number stability
>>
>>                  - Accuracy of the feature
>>
>>                  I wrote a small microbenchmark that allocates from two
>> different stacktraces at a given ratio. For example, 10% of stacktrace S1
>> and 90% from stacktrace
>>             S2. The
>>                  microbenchmark was run 20 times, I averaged the results
>> and looked for accuracy. It seems that statistically it is sound since if I
>> allocated10% S1 and 90%
>>             S2, with a
>>                  sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.
>>
>>                  Let me know if there are any questions on the numbers
>> and if you'd like to see some more data.
>>
>>                  Note: this was done using our internal JDK8
>> implementation since the webrev provided by
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.h
>> tml>
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.h
>> tml <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>> does not yet contain the whole
>>             implementation and therefore would have been misleading.
>>
>>                  Thanks,
>>                  Jc
>>
>>
>>                  On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <
>> jcbeyler at google.com <mailto:jcbeyler at google.com> <mailto:
>> jcbeyler at google.com <mailto:jcbeyler at google.com>>> wrote:
>>
>>                      Hi all,
>>
>>                      To move the discussion forward, with Chuck Rasbold's
>> help to make a webrev, we pushed this:
>>             http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.ht
>> ml <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.h
>> tml <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>>                      415 lines changed: 399 ins; 13 del; 3 mod; 51122
>> unchg
>>
>>                      This is not a final change that does the whole
>> proposition from the JBS entry: https://bugs.openjdk.java.net/
>> browse/JDK-8177374
>>             <https://bugs.openjdk.java.net/browse/JDK-8177374>
>>             <https://bugs.openjdk.java.net/browse/JDK-8177374 <
>> https://bugs.openjdk.java.net/browse/JDK-8177374>>; what it does show is
>> parts of the implementation that is
>>             proposed and hopefully can start the conversation going
>>                      as I work through the details.
>>
>>                      For example, the changes to C2 are done here for the
>> allocations: http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/shar
>> e/vm/opto/macro.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/sha
>> re/vm/opto/macro.cpp.patch>
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/sha
>> re/vm/opto/macro.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/sha
>> re/vm/opto/macro.cpp.patch>>
>>
>>                      Hopefully this all makes sense and thank you for all
>> your future comments!
>>                      Jc
>>
>>
>>                      On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <
>> jcbeyler at google.com <mailto:jcbeyler at google.com> <mailto:
>> jcbeyler at google.com <mailto:jcbeyler at google.com>>>
>>             wrote:
>>
>>                          Hello all,
>>
>>                          This is a follow-up from Jeremy's initial email
>> from last year:
>>             http://mail.openjdk.java.net/pipermail/serviceability-dev/20
>> 15-June/017543.html
>>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2
>> 015-June/017543.html>
>>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2
>> 015-June/017543.html <http://mail.openjdk.java.net/
>> pipermail/serviceability-dev/2015-June/017543.html>>
>>
>>                          I've gone ahead and started working on preparing
>> this and Jeremy and I went down the route of actually writing it up in JEP
>> form:
>>             https://bugs.openjdk.java.net/browse/JDK-8171119 <
>> https://bugs.openjdk.java.net/browse/JDK-8171119>
>>
>>                          I think original conversation that happened last
>> year in that thread still holds true:
>>
>>                            - We have a patch at Google that we think
>> others might be interested in
>>                               - It provides a means to understand where
>> the allocation hotspots are at a very low overhead
>>                               - Since it is at a low overhead, we can
>> leave it on by default
>>
>>                          So I come to the mailing list with Jeremy's
>> initial question:
>>                          "I thought I would ask if there is any interest
>> / if I should write a JEP / if I should just forget it."
>>
>>                          A year ago, it seemed some thought it was a good
>> idea, is this still true?
>>
>>                          Thanks,
>>                          Jc
>>
>>
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20170516/49af63b8/attachment-0001.html>