Low-Overhead Heap Profiling
Robbin Ehn
robbin.ehn at oracle.com
Tue May 16 12:20:33 UTC 2017
Just a few answers,
On 05/15/2017 06:48 PM, JC Beyler wrote:
> Dear all,
>
> I've updated the webrev to:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
I'll look at this later, thanks!
>
> Robbin,
> I believe I have addressed most of your items with webrev 02:
> - I added a JTreg test to show how it works:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
> - I've modified the code to use its own data structures both internally and externally, this will make it easier to move out of AsyncGetCallTrace as we move forward, that
> is still on my TODOs
> - I cleaned up the JVMTI API by passing a structure that handles the num_traces and put in a ReleaseTraces as well
> - I cleaned up other issues as well.
>
> However, I have three questions, which are probably because I'm new in this community:
> 1) My previous webrevs were based off of JDK9 by mistake. When I took JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>
> jdk10
> - I don't see code compatible with what you were showing (ie your patches don't make sense for that code base; ex: klass is still accessed via klass() for example in
> collectedHeap.inline.hpp)
> - Would you know what is the right hg clone command so we are working on the same code base?
We use jdk10-hs, e.g.
hg tclone http://hg.openjdk.java.net/jdk10/hs 10-hs
There is sporadic big merges going from jdk9->jdk10->jdk10-hs and jdk10-hs->jdk10, so 10 is moving...
>
> 2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should I use. It might be that I
> don't understand when one uses one or the other but I see both used around the code base?
> - Is it that new is to be used for anything internal and NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?
We overload new operator when you extend correct base class, e.g. CHeapObj<mtInternal> so use 'new'
But for arrays you will need the macro NEW_C_HEAP_ARRAY.
>
> 3) Casts: same kind question: which should I use. The code was using a bit of everything, I'll refactor it entirely but I was not clear if I should go to C casts or C++
> casts as I see both in the codebase. What is the convention I should use?
Just be consist, use what suites you, C++ casts might be preferable, if we are moving towards C++11.
And use 'right' cast, e.g. going from Thread* to JavaThread* you should use C cast or static_cast, not reinterpret_cast I would say.
>
> Final notes on this webrev:
> - I am still missing:
> - Putting a TLAB implementation so that we can compare both webrevs
> - Have not tried to circumvent AsyncGetCallTrace
> - Putting in the handling of GC'd objects
> - Fix a stack walker issue I have seen, I think I know the problem and will test that theory out for the next webrev
>
> I will work on integrating those items for the next webrev!
Thanks!
>
> Thanks for your help,
> Jc
>
> Ps: I tested this on a new repo:
>
> hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10> jdk10
> ... building it
> cd test
> jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server-release/support/test/hotspot/jtreg/native/lib/ -jdk
> <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk ../hotspot/test/serviceability/jvmti/HeapMonitor/
>
I'll test it out!
/Robbin
>
>
> On Thu, May 4, 2017 at 11:21 PM, serguei.spitsyn at oracle.com <mailto:serguei.spitsyn at oracle.com> <serguei.spitsyn at oracle.com <mailto:serguei.spitsyn at oracle.com>> wrote:
>
> Robbin,
>
> Thank you for forwarding!
> I will review it.
>
> Thanks,
> Serguei
>
>
>
> On 5/4/17 02:13, Robbin Ehn wrote:
>
> Hi,
>
> To me the compiler changes looks what is expected.
> It would be good if someone from compiler could take a look at that.
> Added compiler to mail thread.
>
> Also adding Serguei, It would be good with his view also.
>
> My initial take on it, read through most of the code and took it for a ride.
>
> ##############################
> - Regarding the compiler changes: I think we need the 'TLAB end' trickery (mentioned by Tony P)
> instead of a separate check for sampling in fast path for the final version.
>
> ##############################
> - This patch I had to apply to get it compile on JDK 10:
>
> diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
> --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp Fri Apr 28 14:31:38 2017 +0200
> +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp Thu May 04 10:22:56 2017 +0200
> @@ -87,3 +87,3 @@
> // support for object alloc event (no-op most of the time)
> - if (klass() != NULL && klass()->name() != NULL) {
> + if (klass != NULL && klass->name() != NULL) {
> Thread *base_thread = Thread::current();
> diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
> --- a/src/share/vm/runtime/heapMonitoring.cpp Fri Apr 28 14:31:38 2017 +0200
> +++ b/src/share/vm/runtime/heapMonitoring.cpp Thu May 04 10:22:56 2017 +0200
> @@ -316,3 +316,3 @@
> JavaThread *thread = reinterpret_cast<JavaThread *>(Thread::current());
> - assert(o->size() << LogHeapWordSize == byte_size,
> + assert(o->size() << LogHeapWordSize == (long)byte_size,
> "Object size is incorrect.");
>
> ##############################
> - This patch I had to apply to get it not asserting during slowdebug:
>
> --- a/src/share/vm/runtime/heapMonitoring.cpp Fri Apr 28 15:15:16 2017 +0200
> +++ b/src/share/vm/runtime/heapMonitoring.cpp Thu May 04 10:24:25 2017 +0200
> @@ -32,3 +32,3 @@
> // TODO(jcbeyler): should we make this into a JVMTI structure?
> -struct StackTraceData {
> +struct StackTraceData : CHeapObj<mtInternal> {
> ASGCT_CallTrace *trace;
> @@ -143,3 +143,2 @@
> StackTraceStorage::StackTraceStorage() :
> - _allocated_traces(new StackTraceData*[MaxHeapTraces]),
> _allocated_traces_size(MaxHeapTraces),
> @@ -147,2 +146,3 @@
> _allocated_count(0) {
> + _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, MaxHeapTraces, mtInternal);
> memset(_allocated_traces, 0, sizeof(*_allocated_traces) * MaxHeapTraces);
> @@ -152,3 +152,3 @@
> StackTraceStorage::~StackTraceStorage() {
> - delete[] _allocated_traces;
> + FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
> }
>
> - Classes should extend correct base class for which type of memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
> - The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, os::malloc and new.
> - In jvmtiHeapTransition.hpp you use C cast instead.
>
> ##############################
> - This patch I had apply to get traces without setting an ‘unrelated’ capability
> - Should this not be a new capability?
>
> diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
> --- a/src/share/vm/prims/forte.cpp Fri Apr 28 15:15:16 2017 +0200
> +++ b/src/share/vm/prims/forte.cpp Thu May 04 10:24:25 2017 +0200
> @@ -530,6 +530,6 @@
>
> - if (!JvmtiExport::should_post_class_load()) {
> +/* if (!JvmtiExport::should_post_class_load()) {
> trace->num_frames = ticks_no_class_load; // -1
> return;
> - }
> + }*/
>
> ##############################
> - forte.cpp: (I know this is not part of your changes but)
> find_jmethod_id_or_null give me NULL for my test.
> It looks like we actually want the regular jmethod_id() ?
>
> Since we are the thread we are talking about (and in same ucontext) and thread is in vm and have a last java frame,
> I think most of the checks done in AsyncGetCallTrace is irrelevant, so you should be-able to call forte_fill_call_trace_given_top directly.
> But since we might need jmethod_id() if possible to avoid getting method id NULL,
> we need some fixes in forte code, or just do the vframStream loop inside heapMonitoring.cpp and not use forte.cpp.
>
> Something like:
>
> if (jthread->has_last_Java_frame()) { // just to be safe
> vframeStream vfst(jthread);
> while (!vfst.at_end()) {
> Method* m = vfst.method();
> m->jmethod_id();
> m->line_number_from_bci(vfst.bci());
> vfst.next();
> }
>
> - This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
> Line number should be m->line_number_from_bci(bci);
> Do the heapMonitoring suppose to trace with bci or line number?
> I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use another data structure which says bci.
>
> ##############################
> - // TODO(jcbeyler): remove this extra code handling the extra trace for
> Please fix all these TODO's :)
>
> ##############################
> - heapMonitoring.hpp:
> // TODO(jcbeyler): is this algorithm acceptable in open source?
>
> Why is this comment here? What is the implication?
> Have you tested any simpler algorithm?
>
> ##############################
> - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building the agent)
>
> ##############################
> - monitoring_period vs HeapMonitorRate, pick rate or period.
>
> ##############################
> - globals.hpp
> Why is MaxHeapTraces not settable/overridable from jvmti interface? That would be handy.
>
> ##############################
> - jvmtiStackTraceData + ASGCT_CallFrame memory
> Are the agent suppose to loop through and free all ASGCT_CallFrame?
> Wouldn't it be better with some kinda protocol, like:
> (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
> (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);
>
> Also using another data structure that have num_traces inside it simplifies things.
> So I'm not convinced using the async structure is the best way forward.
>
>
> I have more questions, but I think it's better if you respond and update the code first.
>
> Thanks!
>
> /Robbin
>
>
> On 04/21/2017 11:34 PM, JC Beyler wrote:
>
> Hi all,
>
> I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
>
> The new webrev.01 also adds the actual heap monitoring sampling system in files:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
> and
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
>
> My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage.
>
> Thanks for your attention and let me know if there are any questions!
>
> Have a wonderful Friday!
> Jc
>
> On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <jcbeyler at google.com <mailto:jcbeyler at google.com> <mailto:jcbeyler at google.com <mailto:jcbeyler at google.com>>> wrote:
>
> Hi all,
>
> I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full data, so I am just
> summarizing
> here for now.
>
> - Overhead of the feature
>
> Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k being our default setting.
>
> - Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo and JDK9 it seems)
> - I want to rerun next week to ensure number stability
>
> - Accuracy of the feature
>
> I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90% from stacktrace
> S2. The
> microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I allocated10% S1 and 90%
> S2, with a
> sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.
>
> Let me know if there are any questions on the numbers and if you'd like to see some more data.
>
> Note: this was done using our internal JDK8 implementation since the webrev provided by http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>> does not yet contain the whole
> implementation and therefore would have been misleading.
>
> Thanks,
> Jc
>
>
> On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <jcbeyler at google.com <mailto:jcbeyler at google.com> <mailto:jcbeyler at google.com <mailto:jcbeyler at google.com>>> wrote:
>
> Hi all,
>
> To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
> 415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg
>
> This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
> <https://bugs.openjdk.java.net/browse/JDK-8177374>
> <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>; what it does show is parts of the implementation that is
> proposed and hopefully can start the conversation going
> as I work through the details.
>
> For example, the changes to C2 are done here for the allocations: http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>
>
> Hopefully this all makes sense and thank you for all your future comments!
> Jc
>
>
> On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <jcbeyler at google.com <mailto:jcbeyler at google.com> <mailto:jcbeyler at google.com <mailto:jcbeyler at google.com>>>
> wrote:
>
> Hello all,
>
> This is a follow-up from Jeremy's initial email from last year:
> http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
> <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
> <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>
>
> I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:
> https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119>
>
> I think original conversation that happened last year in that thread still holds true:
>
> - We have a patch at Google that we think others might be interested in
> - It provides a means to understand where the allocation hotspots are at a very low overhead
> - Since it is at a low overhead, we can leave it on by default
>
> So I come to the mailing list with Jeremy's initial question:
> "I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."
>
> A year ago, it seemed some thought it was a good idea, is this still true?
>
> Thanks,
> Jc
>
>
>
>
>
>
>
More information about the serviceability-dev
mailing list