Low-Overhead Heap Profiling
JC Beyler
jcbeyler at google.com
Wed May 17 03:47:06 UTC 2017
Dear Robbin,
Thank you for the answers, much appreciated :)
Jc
On Tue, May 16, 2017 at 5:20 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
> Just a few answers,
>
> On 05/15/2017 06:48 PM, JC Beyler wrote:
>
>> Dear all,
>>
>> I've updated the webrev to:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
>>
>
> I'll look at this later, thanks!
>
>
>> Robbin,
>> I believe I have addressed most of your items with webrev 02:
>> - I added a JTreg test to show how it works:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_fi
>> les/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_f
>> iles/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
>> - I've modified the code to use its own data structures both internally
>> and externally, this will make it easier to move out of AsyncGetCallTrace
>> as we move forward, that is still on my TODOs
>> - I cleaned up the JVMTI API by passing a structure that handles the
>> num_traces and put in a ReleaseTraces as well
>> - I cleaned up other issues as well.
>>
>> However, I have three questions, which are probably because I'm new in
>> this community:
>> 1) My previous webrevs were based off of JDK9 by mistake. When I took
>> JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10 <
>> http://hg.openjdk.java.net/jdk10/jdk10> jdk10
>> - I don't see code compatible with what you were showing (ie your
>> patches don't make sense for that code base; ex: klass is still accessed
>> via klass() for example in collectedHeap.inline.hpp)
>> - Would you know what is the right hg clone command so we are
>> working on the same code base?
>>
>
> We use jdk10-hs, e.g.
> hg tclone http://hg.openjdk.java.net/jdk10/hs 10-hs
>
> There is sporadic big merges going from jdk9->jdk10->jdk10-hs and
> jdk10-hs->jdk10, so 10 is moving...
>
>
>> 2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I
>> cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should
>> I use. It might be that I don't understand when one uses one or the other
>> but I see both used around the code base?
>> - Is it that new is to be used for anything internal and
>> NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?
>>
>
> We overload new operator when you extend correct base class, e.g.
> CHeapObj<mtInternal> so use 'new'
> But for arrays you will need the macro NEW_C_HEAP_ARRAY.
>
>
>> 3) Casts: same kind question: which should I use. The code was using a
>> bit of everything, I'll refactor it entirely but I was not clear if I
>> should go to C casts or C++ casts as I see both in the codebase. What is
>> the convention I should use?
>>
>
> Just be consist, use what suites you, C++ casts might be preferable, if we
> are moving towards C++11.
> And use 'right' cast, e.g. going from Thread* to JavaThread* you should
> use C cast or static_cast, not reinterpret_cast I would say.
>
>
>> Final notes on this webrev:
>> - I am still missing:
>> - Putting a TLAB implementation so that we can compare both webrevs
>> - Have not tried to circumvent AsyncGetCallTrace
>> - Putting in the handling of GC'd objects
>> - Fix a stack walker issue I have seen, I think I know the problem
>> and will test that theory out for the next webrev
>>
>> I will work on integrating those items for the next webrev!
>>
>
> Thanks!
>
>
>> Thanks for your help,
>> Jc
>>
>> Ps: I tested this on a new repo:
>>
>> hg clone http://hg.openjdk.java.net/jdk10/jdk10 <
>> http://hg.openjdk.java.net/jdk10/jdk10> jdk10
>> ... building it
>> cd test
>> jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server
>> -release/support/test/hotspot/jtreg/native/lib/ -jdk
>> <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk
>> ../hotspot/test/serviceability/jvmti/HeapMonitor/
>>
>>
> I'll test it out!
>
> /Robbin
>
>
>>
>> On Thu, May 4, 2017 at 11:21 PM, serguei.spitsyn at oracle.com <mailto:
>> serguei.spitsyn at oracle.com> <serguei.spitsyn at oracle.com <mailto:
>> serguei.spitsyn at oracle.com>> wrote:
>>
>> Robbin,
>>
>> Thank you for forwarding!
>> I will review it.
>>
>> Thanks,
>> Serguei
>>
>>
>>
>> On 5/4/17 02:13, Robbin Ehn wrote:
>>
>> Hi,
>>
>> To me the compiler changes looks what is expected.
>> It would be good if someone from compiler could take a look at
>> that.
>> Added compiler to mail thread.
>>
>> Also adding Serguei, It would be good with his view also.
>>
>> My initial take on it, read through most of the code and took it
>> for a ride.
>>
>> ##############################
>> - Regarding the compiler changes: I think we need the 'TLAB end'
>> trickery (mentioned by Tony P)
>> instead of a separate check for sampling in fast path for the
>> final version.
>>
>> ##############################
>> - This patch I had to apply to get it compile on JDK 10:
>>
>> diff -r ac3ded340b35 src/share/vm/gc/shared/collect
>> edHeap.inline.hpp
>> --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp Fri Apr
>> 28 14:31:38 2017 +0200
>> +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp Thu May
>> 04 10:22:56 2017 +0200
>> @@ -87,3 +87,3 @@
>> // support for object alloc event (no-op most of the time)
>> - if (klass() != NULL && klass()->name() != NULL) {
>> + if (klass != NULL && klass->name() != NULL) {
>> Thread *base_thread = Thread::current();
>> diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
>> --- a/src/share/vm/runtime/heapMonitoring.cpp Fri Apr 28
>> 14:31:38 2017 +0200
>> +++ b/src/share/vm/runtime/heapMonitoring.cpp Thu May 04
>> 10:22:56 2017 +0200
>> @@ -316,3 +316,3 @@
>> JavaThread *thread = reinterpret_cast<JavaThread
>> *>(Thread::current());
>> - assert(o->size() << LogHeapWordSize == byte_size,
>> + assert(o->size() << LogHeapWordSize == (long)byte_size,
>> "Object size is incorrect.");
>>
>> ##############################
>> - This patch I had to apply to get it not asserting during
>> slowdebug:
>>
>> --- a/src/share/vm/runtime/heapMonitoring.cpp Fri Apr 28
>> 15:15:16 2017 +0200
>> +++ b/src/share/vm/runtime/heapMonitoring.cpp Thu May 04
>> 10:24:25 2017 +0200
>> @@ -32,3 +32,3 @@
>> // TODO(jcbeyler): should we make this into a JVMTI structure?
>> -struct StackTraceData {
>> +struct StackTraceData : CHeapObj<mtInternal> {
>> ASGCT_CallTrace *trace;
>> @@ -143,3 +143,2 @@
>> StackTraceStorage::StackTraceStorage() :
>> - _allocated_traces(new StackTraceData*[MaxHeapTraces]),
>> _allocated_traces_size(MaxHeapTraces),
>> @@ -147,2 +146,3 @@
>> _allocated_count(0) {
>> + _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*,
>> MaxHeapTraces, mtInternal);
>> memset(_allocated_traces, 0, sizeof(*_allocated_traces) *
>> MaxHeapTraces);
>> @@ -152,3 +152,3 @@
>> StackTraceStorage::~StackTraceStorage() {
>> - delete[] _allocated_traces;
>> + FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
>> }
>>
>> - Classes should extend correct base class for which type of
>> memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
>> - The style in heapMonitoring.cpp is a bit different from normal
>> vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY,
>> os::malloc and new.
>> - In jvmtiHeapTransition.hpp you use C cast instead.
>>
>> ##############################
>> - This patch I had apply to get traces without setting an
>> ‘unrelated’ capability
>> - Should this not be a new capability?
>>
>> diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
>> --- a/src/share/vm/prims/forte.cpp Fri Apr 28 15:15:16 2017
>> +0200
>> +++ b/src/share/vm/prims/forte.cpp Thu May 04 10:24:25 2017
>> +0200
>> @@ -530,6 +530,6 @@
>>
>> - if (!JvmtiExport::should_post_class_load()) {
>> +/* if (!JvmtiExport::should_post_class_load()) {
>> trace->num_frames = ticks_no_class_load; // -1
>> return;
>> - }
>> + }*/
>>
>> ##############################
>> - forte.cpp: (I know this is not part of your changes but)
>> find_jmethod_id_or_null give me NULL for my test.
>> It looks like we actually want the regular jmethod_id() ?
>>
>> Since we are the thread we are talking about (and in same
>> ucontext) and thread is in vm and have a last java frame,
>> I think most of the checks done in AsyncGetCallTrace is
>> irrelevant, so you should be-able to call forte_fill_call_trace_given_top
>> directly.
>> But since we might need jmethod_id() if possible to avoid getting
>> method id NULL,
>> we need some fixes in forte code, or just do the vframStream loop
>> inside heapMonitoring.cpp and not use forte.cpp.
>>
>> Something like:
>>
>> if (jthread->has_last_Java_frame()) { // just to be safe
>> vframeStream vfst(jthread);
>> while (!vfst.at_end()) {
>> Method* m = vfst.method();
>> m->jmethod_id();
>> m->line_number_from_bci(vfst.bci());
>> vfst.next();
>> }
>>
>> - This is a bit confusing in forte.cpp,
>> trace->frames[count].lineno = bci.
>> Line number should be m->line_number_from_bci(bci);
>> Do the heapMonitoring suppose to trace with bci or line number?
>> I would say bci, meaning we should either rename
>> ASGCT_CallFrame→lineno or use another data structure which says bci.
>>
>> ##############################
>> - // TODO(jcbeyler): remove this extra code handling the extra
>> trace for
>> Please fix all these TODO's :)
>>
>> ##############################
>> - heapMonitoring.hpp:
>> // TODO(jcbeyler): is this algorithm acceptable in open source?
>>
>> Why is this comment here? What is the implication?
>> Have you tested any simpler algorithm?
>>
>> ##############################
>> - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk
>> for building the agent)
>>
>> ##############################
>> - monitoring_period vs HeapMonitorRate, pick rate or period.
>>
>> ##############################
>> - globals.hpp
>> Why is MaxHeapTraces not settable/overridable from jvmti
>> interface? That would be handy.
>>
>> ##############################
>> - jvmtiStackTraceData + ASGCT_CallFrame memory
>> Are the agent suppose to loop through and free all
>> ASGCT_CallFrame?
>> Wouldn't it be better with some kinda protocol, like:
>> (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
>> (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);
>>
>> Also using another data structure that have num_traces inside it
>> simplifies things.
>> So I'm not convinced using the async structure is the best way
>> forward.
>>
>>
>> I have more questions, but I think it's better if you respond and
>> update the code first.
>>
>> Thanks!
>>
>> /Robbin
>>
>>
>> On 04/21/2017 11:34 PM, JC Beyler wrote:
>>
>> Hi all,
>>
>> I've added size information to the allocation sampling
>> system. This allows the callback to remember the size of each sampled
>> allocation.
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
>>
>> The new webrev.01 also adds the actual heap monitoring
>> sampling system in files:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/sh
>> are/vm/runtime/heapMonitoring.cpp.patch
>> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/s
>> hare/vm/runtime/heapMonitoring.cpp.patch>
>> and
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/sh
>> are/vm/runtime/heapMonitoring.hpp.patch
>> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/s
>> hare/vm/runtime/heapMonitoring.hpp.patch>
>>
>> My next step is to add the GC part to the webrev, which will
>> allow users to determine what objects are live and what are garbage.
>>
>> Thanks for your attention and let me know if there are any
>> questions!
>>
>> Have a wonderful Friday!
>> Jc
>>
>> On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <
>> jcbeyler at google.com <mailto:jcbeyler at google.com> <mailto:
>> jcbeyler at google.com <mailto:jcbeyler at google.com>>> wrote:
>>
>> Hi all,
>>
>> I worked on getting a few numbers for overhead and
>> accuracy for my feature. I'm unsure if here is the right place to provide
>> the full data, so I am just
>> summarizing
>> here for now.
>>
>> - Overhead of the feature
>>
>> Using the Dacapo benchmark (http://dacapobench.org/).
>> My initial results are that sampling provides 2.4% with a 512k sampling,
>> 512k being our default setting.
>>
>> - Note: this was without the tradesoap, tradebeans and
>> tomcat benchmarks since they did not work with my JDK9 (issue between
>> Dacapo and JDK9 it seems)
>> - I want to rerun next week to ensure number stability
>>
>> - Accuracy of the feature
>>
>> I wrote a small microbenchmark that allocates from two
>> different stacktraces at a given ratio. For example, 10% of stacktrace S1
>> and 90% from stacktrace
>> S2. The
>> microbenchmark was run 20 times, I averaged the results
>> and looked for accuracy. It seems that statistically it is sound since if I
>> allocated10% S1 and 90%
>> S2, with a
>> sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.
>>
>> Let me know if there are any questions on the numbers
>> and if you'd like to see some more data.
>>
>> Note: this was done using our internal JDK8
>> implementation since the webrev provided by
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.h
>> tml>
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.h
>> tml <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>> does not yet contain the whole
>> implementation and therefore would have been misleading.
>>
>> Thanks,
>> Jc
>>
>>
>> On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <
>> jcbeyler at google.com <mailto:jcbeyler at google.com> <mailto:
>> jcbeyler at google.com <mailto:jcbeyler at google.com>>> wrote:
>>
>> Hi all,
>>
>> To move the discussion forward, with Chuck Rasbold's
>> help to make a webrev, we pushed this:
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.ht
>> ml <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.h
>> tml <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>> 415 lines changed: 399 ins; 13 del; 3 mod; 51122
>> unchg
>>
>> This is not a final change that does the whole
>> proposition from the JBS entry: https://bugs.openjdk.java.net/
>> browse/JDK-8177374
>> <https://bugs.openjdk.java.net/browse/JDK-8177374>
>> <https://bugs.openjdk.java.net/browse/JDK-8177374 <
>> https://bugs.openjdk.java.net/browse/JDK-8177374>>; what it does show is
>> parts of the implementation that is
>> proposed and hopefully can start the conversation going
>> as I work through the details.
>>
>> For example, the changes to C2 are done here for the
>> allocations: http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/shar
>> e/vm/opto/macro.cpp.patch
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/sha
>> re/vm/opto/macro.cpp.patch>
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/sha
>> re/vm/opto/macro.cpp.patch
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/sha
>> re/vm/opto/macro.cpp.patch>>
>>
>> Hopefully this all makes sense and thank you for all
>> your future comments!
>> Jc
>>
>>
>> On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <
>> jcbeyler at google.com <mailto:jcbeyler at google.com> <mailto:
>> jcbeyler at google.com <mailto:jcbeyler at google.com>>>
>> wrote:
>>
>> Hello all,
>>
>> This is a follow-up from Jeremy's initial email
>> from last year:
>> http://mail.openjdk.java.net/pipermail/serviceability-dev/20
>> 15-June/017543.html
>> <http://mail.openjdk.java.net/pipermail/serviceability-dev/2
>> 015-June/017543.html>
>> <http://mail.openjdk.java.net/pipermail/serviceability-dev/2
>> 015-June/017543.html <http://mail.openjdk.java.net/
>> pipermail/serviceability-dev/2015-June/017543.html>>
>>
>> I've gone ahead and started working on preparing
>> this and Jeremy and I went down the route of actually writing it up in JEP
>> form:
>> https://bugs.openjdk.java.net/browse/JDK-8171119 <
>> https://bugs.openjdk.java.net/browse/JDK-8171119>
>>
>> I think original conversation that happened last
>> year in that thread still holds true:
>>
>> - We have a patch at Google that we think
>> others might be interested in
>> - It provides a means to understand where
>> the allocation hotspots are at a very low overhead
>> - Since it is at a low overhead, we can
>> leave it on by default
>>
>> So I come to the mailing list with Jeremy's
>> initial question:
>> "I thought I would ask if there is any interest
>> / if I should write a JEP / if I should just forget it."
>>
>> A year ago, it seemed some thought it was a good
>> idea, is this still true?
>>
>> Thanks,
>> Jc
>>
>>
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20170516/49af63b8/attachment-0001.html>
More information about the serviceability-dev
mailing list