Low-Overhead Heap Profiling

Tue May 16 12:20:33 UTC 2017

Just a few answers,

On 05/15/2017 06:48 PM, JC Beyler wrote:
> Dear all,
> 
> I've updated the webrev to:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>

I'll look at this later, thanks!

> 
> Robbin,
> I believe I have addressed most of your items with webrev 02:
>    - I added a JTreg test to show how it works:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c 
> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
>   - I've modified the code to use its own data structures both internally and externally, this will make it easier to move out of AsyncGetCallTrace as we move forward, that 
> is still on my TODOs
>   - I cleaned up the JVMTI API by passing a structure that handles the num_traces and put in a ReleaseTraces as well
>   - I cleaned up other issues as well.
> 
> However, I have three questions, which are probably because I'm new in this community:
>   1) My previous webrevs were based off of JDK9 by mistake. When I took JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10> 
> jdk10
>       - I don't see code compatible with what you were showing (ie your patches don't make sense for that code base; ex: klass is still accessed via klass() for example in 
> collectedHeap.inline.hpp)
>       - Would you know what is the right hg clone command so we are working on the same code base?

We use jdk10-hs, e.g.
hg tclone http://hg.openjdk.java.net/jdk10/hs 10-hs

There is sporadic big merges going from jdk9->jdk10->jdk10-hs and jdk10-hs->jdk10, so 10 is moving...

> 
>   2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should I use. It might be that I 
> don't understand when one uses one or the other but I see both used around the code base?
>     - Is it that new is to be used for anything internal and NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?

We overload new operator when you extend correct base class, e.g. CHeapObj<mtInternal> so use 'new'
But for arrays you will need the macro NEW_C_HEAP_ARRAY.

> 
>   3) Casts: same kind question: which should I use. The code was using a bit of everything, I'll refactor it entirely but I was not clear if I should go to C casts or C++ 
> casts as I see both in the codebase. What is the convention I should use?

Just be consist, use what suites you, C++ casts might be preferable, if we are moving towards C++11.
And use 'right' cast, e.g. going from Thread* to JavaThread* you should use C cast or static_cast, not reinterpret_cast I would say.

> 
> Final notes on this webrev:
>    - I am still missing:
>      - Putting a TLAB implementation so that we can compare both webrevs
>      - Have not tried to circumvent AsyncGetCallTrace
>      - Putting in the handling of GC'd objects
>      - Fix a stack walker issue I have seen, I think I know the problem and will test that theory out for the next webrev
> 
> I will work on integrating those items for the next webrev!

Thanks!

> 
> Thanks for your help,
> Jc
> 
> Ps:  I tested this on a new repo:
> 
> hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10> jdk10
> ... building it
> cd test
> jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server-release/support/test/hotspot/jtreg/native/lib/ -jdk 
> <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk ../hotspot/test/serviceability/jvmti/HeapMonitor/
> 

I'll test it out!

/Robbin

> 
> 
> On Thu, May 4, 2017 at 11:21 PM, serguei.spitsyn at oracle.com <mailto:serguei.spitsyn at oracle.com> <serguei.spitsyn at oracle.com <mailto:serguei.spitsyn at oracle.com>> wrote:
> 
>     Robbin,
> 
>     Thank you for forwarding!
>     I will review it.
> 
>     Thanks,
>     Serguei
> 
> 
> 
>     On 5/4/17 02:13, Robbin Ehn wrote:
> 
>         Hi,
> 
>         To me the compiler changes looks what is expected.
>         It would be good if someone from compiler could take a look at that.
>         Added compiler to mail thread.
> 
>         Also adding Serguei, It would be good with his view also.
> 
>         My initial take on it, read through most of the code and took it for a ride.
> 
>         ##############################
>         - Regarding the compiler changes: I think we need the 'TLAB end' trickery (mentioned by Tony P)
>         instead of a separate check for sampling in fast path for the final version.
> 
>         ##############################
>         - This patch I had to apply to get it compile on JDK 10:
> 
>         diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
>         --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr 28 14:31:38 2017 +0200
>         +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May 04 10:22:56 2017 +0200
>         @@ -87,3 +87,3 @@
>               // support for object alloc event (no-op most of the time)
>         -    if (klass() != NULL && klass()->name() != NULL) {
>         +    if (klass != NULL && klass->name() != NULL) {
>                 Thread *base_thread = Thread::current();
>         diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
>         --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 14:31:38 2017 +0200
>         +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:22:56 2017 +0200
>         @@ -316,3 +316,3 @@
>             JavaThread *thread = reinterpret_cast<JavaThread *>(Thread::current());
>         -  assert(o->size() << LogHeapWordSize == byte_size,
>         +  assert(o->size() << LogHeapWordSize == (long)byte_size,
>                    "Object size is incorrect.");
> 
>         ##############################
>         - This patch I had to apply to get it not asserting during slowdebug:
> 
>         --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 15:15:16 2017 +0200
>         +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:24:25 2017 +0200
>         @@ -32,3 +32,3 @@
>           // TODO(jcbeyler): should we make this into a JVMTI structure?
>         -struct StackTraceData {
>         +struct StackTraceData : CHeapObj<mtInternal> {
>             ASGCT_CallTrace *trace;
>         @@ -143,3 +143,2 @@
>           StackTraceStorage::StackTraceStorage() :
>         -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
>               _allocated_traces_size(MaxHeapTraces),
>         @@ -147,2 +146,3 @@
>               _allocated_count(0) {
>         +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, MaxHeapTraces, mtInternal);
>             memset(_allocated_traces, 0, sizeof(*_allocated_traces) * MaxHeapTraces);
>         @@ -152,3 +152,3 @@
>           StackTraceStorage::~StackTraceStorage() {
>         -  delete[] _allocated_traces;
>         +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
>           }
> 
>         - Classes should extend correct base class for which type of memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
>         - The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, os::malloc and new.
>         - In jvmtiHeapTransition.hpp you use C cast instead.
> 
>         ##############################
>         - This patch I had apply to get traces without setting an ‘unrelated’ capability
>         - Should this not be a new capability?
> 
>         diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
>         --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 +0200
>         +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 +0200
>         @@ -530,6 +530,6 @@
> 
>         -  if (!JvmtiExport::should_post_class_load()) {
>         +/*  if (!JvmtiExport::should_post_class_load()) {
>               trace->num_frames = ticks_no_class_load; // -1
>               return;
>         -  }
>         +  }*/
> 
>         ##############################
>         - forte.cpp: (I know this is not part of your changes but)
>         find_jmethod_id_or_null give me NULL for my test.
>         It looks like we actually want the regular jmethod_id() ?
> 
>         Since we are the thread we are talking about (and in same ucontext) and thread is in vm and have a last java frame,
>         I think most of the checks done in AsyncGetCallTrace is irrelevant, so you should be-able to call forte_fill_call_trace_given_top directly.
>         But since we might need jmethod_id() if possible to avoid getting method id NULL,
>         we need some fixes in forte code, or just do the vframStream loop inside heapMonitoring.cpp and not use forte.cpp.
> 
>         Something like:
> 
>            if (jthread->has_last_Java_frame()) { // just to be safe
>              vframeStream vfst(jthread);
>              while (!vfst.at_end()) {
>                Method* m = vfst.method();
>                m->jmethod_id();
>                m->line_number_from_bci(vfst.bci());
>                vfst.next();
>              }
> 
>         - This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
>         Line number should be m->line_number_from_bci(bci);
>         Do the heapMonitoring suppose to trace with bci or line number?
>         I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use another data structure which says bci.
> 
>         ##############################
>         - // TODO(jcbeyler): remove this extra code handling the extra trace for
>         Please fix all these TODO's :)
> 
>         ##############################
>         - heapMonitoring.hpp:
>         // TODO(jcbeyler): is this algorithm acceptable in open source?
> 
>         Why is this comment here? What is the implication?
>         Have you tested any simpler algorithm?
> 
>         ##############################
>         - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building the agent)
> 
>         ##############################
>         - monitoring_period vs HeapMonitorRate, pick rate or period.
> 
>         ##############################
>         - globals.hpp
>         Why is MaxHeapTraces not settable/overridable from jvmti interface? That would be handy.
> 
>         ##############################
>         - jvmtiStackTraceData + ASGCT_CallFrame memory
>         Are the agent suppose to loop through and free all ASGCT_CallFrame?
>         Wouldn't it be better with some kinda protocol, like:
>         (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
>         (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);
> 
>         Also using another data structure that have num_traces inside it simplifies things.
>         So I'm not convinced using the async structure is the best way forward.
> 
> 
>         I have more questions, but I think it's better if you respond and update the code first.
> 
>         Thanks!
> 
>         /Robbin
> 
> 
>         On 04/21/2017 11:34 PM, JC Beyler wrote:
> 
>             Hi all,
> 
>             I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
> 
>             The new webrev.01 also adds the actual heap monitoring sampling system in files:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
>             and
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
> 
>             My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage.
> 
>             Thanks for your attention and let me know if there are any questions!
> 
>             Have a wonderful Friday!
>             Jc
> 
>             On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <jcbeyler at google.com <mailto:jcbeyler at google.com> <mailto:jcbeyler at google.com <mailto:jcbeyler at google.com>>> wrote:
> 
>                  Hi all,
> 
>                  I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full data, so I am just
>             summarizing
>                  here for now.
> 
>                  - Overhead of the feature
> 
>                  Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k being our default setting.
> 
>                  - Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo and JDK9 it seems)
>                  - I want to rerun next week to ensure number stability
> 
>                  - Accuracy of the feature
> 
>                  I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90% from stacktrace
>             S2. The
>                  microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I allocated10% S1 and 90%
>             S2, with a
>                  sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.
> 
>                  Let me know if there are any questions on the numbers and if you'd like to see some more data.
> 
>                  Note: this was done using our internal JDK8 implementation since the webrev provided by http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>> does not yet contain the whole
>             implementation and therefore would have been misleading.
> 
>                  Thanks,
>                  Jc
> 
> 
>                  On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <jcbeyler at google.com <mailto:jcbeyler at google.com> <mailto:jcbeyler at google.com <mailto:jcbeyler at google.com>>> wrote:
> 
>                      Hi all,
> 
>                      To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
>             http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>                      415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg
> 
>                      This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
>             <https://bugs.openjdk.java.net/browse/JDK-8177374>
>             <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>; what it does show is parts of the implementation that is
>             proposed and hopefully can start the conversation going
>                      as I work through the details.
> 
>                      For example, the changes to C2 are done here for the allocations: http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>
> 
>                      Hopefully this all makes sense and thank you for all your future comments!
>                      Jc
> 
> 
>                      On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <jcbeyler at google.com <mailto:jcbeyler at google.com> <mailto:jcbeyler at google.com <mailto:jcbeyler at google.com>>>
>             wrote:
> 
>                          Hello all,
> 
>                          This is a follow-up from Jeremy's initial email from last year:
>             http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>
> 
>                          I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:
>             https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119>
> 
>                          I think original conversation that happened last year in that thread still holds true:
> 
>                            - We have a patch at Google that we think others might be interested in
>                               - It provides a means to understand where the allocation hotspots are at a very low overhead
>                               - Since it is at a low overhead, we can leave it on by default
> 
>                          So I come to the mailing list with Jeremy's initial question:
>                          "I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."
> 
>                          A year ago, it seemed some thought it was a good idea, is this still true?
> 
>                          Thanks,
>                          Jc
> 
> 
> 
> 
> 
> 
>