Low-Overhead Heap Profiling
serguei.spitsyn at oracle.com
serguei.spitsyn at oracle.com
Fri May 5 06:21:55 UTC 2017
Robbin,
Thank you for forwarding!
I will review it.
Thanks,
Serguei
On 5/4/17 02:13, Robbin Ehn wrote:
> Hi,
>
> To me the compiler changes looks what is expected.
> It would be good if someone from compiler could take a look at that.
> Added compiler to mail thread.
>
> Also adding Serguei, It would be good with his view also.
>
> My initial take on it, read through most of the code and took it for a
> ride.
>
> ##############################
> - Regarding the compiler changes: I think we need the 'TLAB end'
> trickery (mentioned by Tony P)
> instead of a separate check for sampling in fast path for the final
> version.
>
> ##############################
> - This patch I had to apply to get it compile on JDK 10:
>
> diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
> --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp Fri Apr 28
> 14:31:38 2017 +0200
> +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp Thu May 04
> 10:22:56 2017 +0200
> @@ -87,3 +87,3 @@
> // support for object alloc event (no-op most of the time)
> - if (klass() != NULL && klass()->name() != NULL) {
> + if (klass != NULL && klass->name() != NULL) {
> Thread *base_thread = Thread::current();
> diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
> --- a/src/share/vm/runtime/heapMonitoring.cpp Fri Apr 28 14:31:38
> 2017 +0200
> +++ b/src/share/vm/runtime/heapMonitoring.cpp Thu May 04 10:22:56
> 2017 +0200
> @@ -316,3 +316,3 @@
> JavaThread *thread = reinterpret_cast<JavaThread
> *>(Thread::current());
> - assert(o->size() << LogHeapWordSize == byte_size,
> + assert(o->size() << LogHeapWordSize == (long)byte_size,
> "Object size is incorrect.");
>
> ##############################
> - This patch I had to apply to get it not asserting during slowdebug:
>
> --- a/src/share/vm/runtime/heapMonitoring.cpp Fri Apr 28 15:15:16
> 2017 +0200
> +++ b/src/share/vm/runtime/heapMonitoring.cpp Thu May 04 10:24:25
> 2017 +0200
> @@ -32,3 +32,3 @@
> // TODO(jcbeyler): should we make this into a JVMTI structure?
> -struct StackTraceData {
> +struct StackTraceData : CHeapObj<mtInternal> {
> ASGCT_CallTrace *trace;
> @@ -143,3 +143,2 @@
> StackTraceStorage::StackTraceStorage() :
> - _allocated_traces(new StackTraceData*[MaxHeapTraces]),
> _allocated_traces_size(MaxHeapTraces),
> @@ -147,2 +146,3 @@
> _allocated_count(0) {
> + _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*,
> MaxHeapTraces, mtInternal);
> memset(_allocated_traces, 0, sizeof(*_allocated_traces) *
> MaxHeapTraces);
> @@ -152,3 +152,3 @@
> StackTraceStorage::~StackTraceStorage() {
> - delete[] _allocated_traces;
> + FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
> }
>
> - Classes should extend correct base class for which type of memory is
> used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
> - The style in heapMonitoring.cpp is a bit different from normal
> vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY,
> os::malloc and new.
> - In jvmtiHeapTransition.hpp you use C cast instead.
>
> ##############################
> - This patch I had apply to get traces without setting an ‘unrelated’
> capability
> - Should this not be a new capability?
>
> diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
> --- a/src/share/vm/prims/forte.cpp Fri Apr 28 15:15:16 2017 +0200
> +++ b/src/share/vm/prims/forte.cpp Thu May 04 10:24:25 2017 +0200
> @@ -530,6 +530,6 @@
>
> - if (!JvmtiExport::should_post_class_load()) {
> +/* if (!JvmtiExport::should_post_class_load()) {
> trace->num_frames = ticks_no_class_load; // -1
> return;
> - }
> + }*/
>
> ##############################
> - forte.cpp: (I know this is not part of your changes but)
> find_jmethod_id_or_null give me NULL for my test.
> It looks like we actually want the regular jmethod_id() ?
>
> Since we are the thread we are talking about (and in same ucontext)
> and thread is in vm and have a last java frame,
> I think most of the checks done in AsyncGetCallTrace is irrelevant, so
> you should be-able to call forte_fill_call_trace_given_top directly.
> But since we might need jmethod_id() if possible to avoid getting
> method id NULL,
> we need some fixes in forte code, or just do the vframStream loop
> inside heapMonitoring.cpp and not use forte.cpp.
>
> Something like:
>
> if (jthread->has_last_Java_frame()) { // just to be safe
> vframeStream vfst(jthread);
> while (!vfst.at_end()) {
> Method* m = vfst.method();
> m->jmethod_id();
> m->line_number_from_bci(vfst.bci());
> vfst.next();
> }
>
> - This is a bit confusing in forte.cpp, trace->frames[count].lineno =
> bci.
> Line number should be m->line_number_from_bci(bci);
> Do the heapMonitoring suppose to trace with bci or line number?
> I would say bci, meaning we should either rename
> ASGCT_CallFrame→lineno or use another data structure which says bci.
>
> ##############################
> - // TODO(jcbeyler): remove this extra code handling the extra trace for
> Please fix all these TODO's :)
>
> ##############################
> - heapMonitoring.hpp:
> // TODO(jcbeyler): is this algorithm acceptable in open source?
>
> Why is this comment here? What is the implication?
> Have you tested any simpler algorithm?
>
> ##############################
> - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for
> building the agent)
>
> ##############################
> - monitoring_period vs HeapMonitorRate, pick rate or period.
>
> ##############################
> - globals.hpp
> Why is MaxHeapTraces not settable/overridable from jvmti interface?
> That would be handy.
>
> ##############################
> - jvmtiStackTraceData + ASGCT_CallFrame memory
> Are the agent suppose to loop through and free all ASGCT_CallFrame?
> Wouldn't it be better with some kinda protocol, like:
> (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
> (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);
>
> Also using another data structure that have num_traces inside it
> simplifies things.
> So I'm not convinced using the async structure is the best way forward.
>
>
> I have more questions, but I think it's better if you respond and
> update the code first.
>
> Thanks!
>
> /Robbin
>
>
> On 04/21/2017 11:34 PM, JC Beyler wrote:
>> Hi all,
>>
>> I've added size information to the allocation sampling system. This
>> allows the callback to remember the size of each sampled allocation.
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/
>>
>> The new webrev.01 also adds the actual heap monitoring sampling
>> system in files:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>>
>> and
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>>
>>
>> My next step is to add the GC part to the webrev, which will allow
>> users to determine what objects are live and what are garbage.
>>
>> Thanks for your attention and let me know if there are any questions!
>>
>> Have a wonderful Friday!
>> Jc
>>
>> On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <jcbeyler at google.com
>> <mailto:jcbeyler at google.com>> wrote:
>>
>> Hi all,
>>
>> I worked on getting a few numbers for overhead and accuracy for
>> my feature. I'm unsure if here is the right place to provide the full
>> data, so I am just summarizing
>> here for now.
>>
>> - Overhead of the feature
>>
>> Using the Dacapo benchmark (http://dacapobench.org/). My initial
>> results are that sampling provides 2.4% with a 512k sampling, 512k
>> being our default setting.
>>
>> - Note: this was without the tradesoap, tradebeans and tomcat
>> benchmarks since they did not work with my JDK9 (issue between Dacapo
>> and JDK9 it seems)
>> - I want to rerun next week to ensure number stability
>>
>> - Accuracy of the feature
>>
>> I wrote a small microbenchmark that allocates from two different
>> stacktraces at a given ratio. For example, 10% of stacktrace S1 and
>> 90% from stacktrace S2. The
>> microbenchmark was run 20 times, I averaged the results and
>> looked for accuracy. It seems that statistically it is sound since if
>> I allocated10% S1 and 90% S2, with a
>> sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.
>>
>> Let me know if there are any questions on the numbers and if
>> you'd like to see some more data.
>>
>> Note: this was done using our internal JDK8 implementation since
>> the webrev provided by
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html> does
>> not yet contain the whole implementation and therefore would have
>> been misleading.
>>
>> Thanks,
>> Jc
>>
>>
>> On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <jcbeyler at google.com
>> <mailto:jcbeyler at google.com>> wrote:
>>
>> Hi all,
>>
>> To move the discussion forward, with Chuck Rasbold's help to
>> make a webrev, we pushed this:
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>> 415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg
>>
>> This is not a final change that does the whole proposition
>> from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
>> <https://bugs.openjdk.java.net/browse/JDK-8177374>; what it does show
>> is parts of the implementation that is proposed and hopefully can
>> start the conversation going
>> as I work through the details.
>>
>> For example, the changes to C2 are done here for the
>> allocations:
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>>
>> Hopefully this all makes sense and thank you for all your
>> future comments!
>> Jc
>>
>>
>> On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler
>> <jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
>>
>> Hello all,
>>
>> This is a follow-up from Jeremy's initial email from last
>> year:
>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>> <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>>
>>
>> I've gone ahead and started working on preparing this and
>> Jeremy and I went down the route of actually writing it up in JEP form:
>> https://bugs.openjdk.java.net/browse/JDK-8171119
>>
>> I think original conversation that happened last year in
>> that thread still holds true:
>>
>> - We have a patch at Google that we think others might
>> be interested in
>> - It provides a means to understand where the
>> allocation hotspots are at a very low overhead
>> - Since it is at a low overhead, we can leave it on
>> by default
>>
>> So I come to the mailing list with Jeremy's initial
>> question:
>> "I thought I would ask if there is any interest / if I
>> should write a JEP / if I should just forget it."
>>
>> A year ago, it seemed some thought it was a good idea, is
>> this still true?
>>
>> Thanks,
>> Jc
>>
>>
>>
>>
>>
More information about the serviceability-dev
mailing list