Low-Overhead Heap Profiling

Fri May 5 06:21:55 UTC 2017

Robbin,

Thank you for forwarding!
I will review it.

Thanks,
Serguei

On 5/4/17 02:13, Robbin Ehn wrote:
> Hi,
>
> To me the compiler changes looks what is expected.
> It would be good if someone from compiler could take a look at that.
> Added compiler to mail thread.
>
> Also adding Serguei, It would be good with his view also.
>
> My initial take on it, read through most of the code and took it for a 
> ride.
>
> ##############################
> - Regarding the compiler changes: I think we need the 'TLAB end' 
> trickery (mentioned by Tony P)
> instead of a separate check for sampling in fast path for the final 
> version.
>
> ##############################
> - This patch I had to apply to get it compile on JDK 10:
>
> diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
> --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr 28 
> 14:31:38 2017 +0200
> +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May 04 
> 10:22:56 2017 +0200
> @@ -87,3 +87,3 @@
>      // support for object alloc event (no-op most of the time)
> -    if (klass() != NULL && klass()->name() != NULL) {
> +    if (klass != NULL && klass->name() != NULL) {
>        Thread *base_thread = Thread::current();
> diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
> --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 14:31:38 
> 2017 +0200
> +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:22:56 
> 2017 +0200
> @@ -316,3 +316,3 @@
>    JavaThread *thread = reinterpret_cast<JavaThread 
> *>(Thread::current());
> -  assert(o->size() << LogHeapWordSize == byte_size,
> +  assert(o->size() << LogHeapWordSize == (long)byte_size,
>           "Object size is incorrect.");
>
> ##############################
> - This patch I had to apply to get it not asserting during slowdebug:
>
> --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 15:15:16 
> 2017 +0200
> +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:24:25 
> 2017 +0200
> @@ -32,3 +32,3 @@
>  // TODO(jcbeyler): should we make this into a JVMTI structure?
> -struct StackTraceData {
> +struct StackTraceData : CHeapObj<mtInternal> {
>    ASGCT_CallTrace *trace;
> @@ -143,3 +143,2 @@
>  StackTraceStorage::StackTraceStorage() :
> -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
>      _allocated_traces_size(MaxHeapTraces),
> @@ -147,2 +146,3 @@
>      _allocated_count(0) {
> +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, 
> MaxHeapTraces, mtInternal);
>    memset(_allocated_traces, 0, sizeof(*_allocated_traces) * 
> MaxHeapTraces);
> @@ -152,3 +152,3 @@
>  StackTraceStorage::~StackTraceStorage() {
> -  delete[] _allocated_traces;
> +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
>  }
>
> - Classes should extend correct base class for which type of memory is 
> used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
> - The style in heapMonitoring.cpp is a bit different from normal 
> vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, 
> os::malloc and new.
> - In jvmtiHeapTransition.hpp you use C cast instead.
>
> ##############################
> - This patch I had apply to get traces without setting an ‘unrelated’ 
> capability
> - Should this not be a new capability?
>
> diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
> --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 +0200
> +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 +0200
> @@ -530,6 +530,6 @@
>
> -  if (!JvmtiExport::should_post_class_load()) {
> +/*  if (!JvmtiExport::should_post_class_load()) {
>      trace->num_frames = ticks_no_class_load; // -1
>      return;
> -  }
> +  }*/
>
> ##############################
> - forte.cpp: (I know this is not part of your changes but)
> find_jmethod_id_or_null give me NULL for my test.
> It looks like we actually want the regular jmethod_id() ?
>
> Since we are the thread we are talking about (and in same ucontext) 
> and thread is in vm and have a last java frame,
> I think most of the checks done in AsyncGetCallTrace is irrelevant, so 
> you should be-able to call forte_fill_call_trace_given_top directly.
> But since we might need jmethod_id() if possible to avoid getting 
> method id NULL,
> we need some fixes in forte code, or just do the vframStream loop 
> inside heapMonitoring.cpp and not use forte.cpp.
>
> Something like:
>
>   if (jthread->has_last_Java_frame()) { // just to be safe
>     vframeStream vfst(jthread);
>     while (!vfst.at_end()) {
>       Method* m = vfst.method();
>       m->jmethod_id();
>       m->line_number_from_bci(vfst.bci());
>       vfst.next();
>     }
>
> - This is a bit confusing in forte.cpp, trace->frames[count].lineno = 
> bci.
> Line number should be m->line_number_from_bci(bci);
> Do the heapMonitoring suppose to trace with bci or line number?
> I would say bci, meaning we should either rename 
> ASGCT_CallFrame→lineno or use another data structure which says bci.
>
> ##############################
> - // TODO(jcbeyler): remove this extra code handling the extra trace for
> Please fix all these TODO's :)
>
> ##############################
> - heapMonitoring.hpp:
> // TODO(jcbeyler): is this algorithm acceptable in open source?
>
> Why is this comment here? What is the implication?
> Have you tested any simpler algorithm?
>
> ##############################
> - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for 
> building the agent)
>
> ##############################
> - monitoring_period vs HeapMonitorRate, pick rate or period.
>
> ##############################
> - globals.hpp
> Why is MaxHeapTraces not settable/overridable from jvmti interface? 
> That would be handy.
>
> ##############################
> - jvmtiStackTraceData + ASGCT_CallFrame memory
> Are the agent suppose to loop through and free all ASGCT_CallFrame?
> Wouldn't it be better with some kinda protocol, like:
> (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
> (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);
>
> Also using another data structure that have num_traces inside it 
> simplifies things.
> So I'm not convinced using the async structure is the best way forward.
>
>
> I have more questions, but I think it's better if you respond and 
> update the code first.
>
> Thanks!
>
> /Robbin
>
>
> On 04/21/2017 11:34 PM, JC Beyler wrote:
>> Hi all,
>>
>> I've added size information to the allocation sampling system. This 
>> allows the callback to remember the size of each sampled allocation.
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/
>>
>> The new webrev.01 also adds the actual heap monitoring sampling 
>> system in files:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch 
>>
>> and
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch 
>>
>>
>> My next step is to add the GC part to the webrev, which will allow 
>> users to determine what objects are live and what are garbage.
>>
>> Thanks for your attention and let me know if there are any questions!
>>
>> Have a wonderful Friday!
>> Jc
>>
>> On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <jcbeyler at google.com 
>> <mailto:jcbeyler at google.com>> wrote:
>>
>>     Hi all,
>>
>>     I worked on getting a few numbers for overhead and accuracy for 
>> my feature. I'm unsure if here is the right place to provide the full 
>> data, so I am just summarizing
>>     here for now.
>>
>>     - Overhead of the feature
>>
>>     Using the Dacapo benchmark (http://dacapobench.org/). My initial 
>> results are that sampling provides 2.4% with a 512k sampling, 512k 
>> being our default setting.
>>
>>     - Note: this was without the tradesoap, tradebeans and tomcat 
>> benchmarks since they did not work with my JDK9 (issue between Dacapo 
>> and JDK9 it seems)
>>     - I want to rerun next week to ensure number stability
>>
>>     - Accuracy of the feature
>>
>>     I wrote a small microbenchmark that allocates from two different 
>> stacktraces at a given ratio. For example, 10% of stacktrace S1 and 
>> 90% from stacktrace S2. The
>>     microbenchmark was run 20 times, I averaged the results and 
>> looked for accuracy. It seems that statistically it is sound since if 
>> I allocated10% S1 and 90% S2, with a
>>     sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.
>>
>>     Let me know if there are any questions on the numbers and if 
>> you'd like to see some more data.
>>
>>     Note: this was done using our internal JDK8 implementation since 
>> the webrev provided by 
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html> does 
>> not yet contain the whole implementation and therefore would have 
>> been misleading.
>>
>>     Thanks,
>>     Jc
>>
>>
>>     On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <jcbeyler at google.com 
>> <mailto:jcbeyler at google.com>> wrote:
>>
>>         Hi all,
>>
>>         To move the discussion forward, with Chuck Rasbold's help to 
>> make a webrev, we pushed this:
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html 
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>>         415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg
>>
>>         This is not a final change that does the whole proposition 
>> from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
>> <https://bugs.openjdk.java.net/browse/JDK-8177374>; what it does show 
>> is parts of the implementation that is proposed and hopefully can 
>> start the conversation going
>>         as I work through the details.
>>
>>         For example, the changes to C2 are done here for the 
>> allocations: 
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>>
>>         Hopefully this all makes sense and thank you for all your 
>> future comments!
>>         Jc
>>
>>
>>         On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler 
>> <jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
>>
>>             Hello all,
>>
>>             This is a follow-up from Jeremy's initial email from last 
>> year:
>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html 
>> <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html> 
>>
>>
>>             I've gone ahead and started working on preparing this and 
>> Jeremy and I went down the route of actually writing it up in JEP form:
>>             https://bugs.openjdk.java.net/browse/JDK-8171119
>>
>>             I think original conversation that happened last year in 
>> that thread still holds true:
>>
>>               - We have a patch at Google that we think others might 
>> be interested in
>>                  - It provides a means to understand where the 
>> allocation hotspots are at a very low overhead
>>                  - Since it is at a low overhead, we can leave it on 
>> by default
>>
>>             So I come to the mailing list with Jeremy's initial 
>> question:
>>             "I thought I would ask if there is any interest / if I 
>> should write a JEP / if I should just forget it."
>>
>>             A year ago, it seemed some thought it was a good idea, is 
>> this still true?
>>
>>             Thanks,
>>             Jc
>>
>>
>>
>>
>>