JDK-8171119: Low-Overhead Heap Profiling

Wed Feb 7 06:43:28 UTC 2018

Hi David,

The new webrev is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.06/

The incremental one is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a_06/

Notes:
- I've also added two new tests that test the
HeapMonitorStatObjectCorrectnessTest
but with the ParallelGC, the SerialGC, and CMS.
- I still don't know what would be better for the naming and that was
addressed in the answer to Erik
- I cleaned up a bit more the code here and there

Thank you as well for the reviews! I've inlined my answers here as well:

On Mon, Feb 5, 2018 at 8:40 PM, David Holmes <david.holmes at oracle.com>
wrote:

> Hi Jc,
>
> I've just been browsing this and have a few comments/queries
>
> src/hotspot/share/prims/jvmtiHeapTransition.hpp
>
> In HeapThreadTransition what are the possible entry states? The primary
> state is presumably _in_native, but what else is expected? You can't
> transition from arbitrary states to _in_vm
>
> Also in that file the include guard has the wrong name:
>
>  #ifndef SHARE_VM_PRIMS_JVMTIHEAPSAMPLING_HPP
>
> etc.
>
>
Done, I fixed both the jvmtiHeapTransition and the heapMonitoring files,
they were the only ones I saw that.

> ---
>
> src/hotspot/share/runtime/heapMonitoring.cpp
>
> Can you change StackTraceData from a struct to a class please.
>
> StackTraceData::free_data could still be an instance method even if not
> done in the destructor. I also think the caller should do the delete rather
> than hiding it in here, as this obscures the lifecycle of the instances.
>
>
Done, I cleaned it up to be classes, refactored the decrementation to be in
a single static method, which calls delete if need be. I think it still
looks similar to what you did not like but perhaps the renaming and the
fact it is now more integrated makes it better? The other solution I have
is to just inline that code across the board to always have the caller do:
   a) Decrement the reference count
   b) check the reference count
   c) if it is 0, get the trace data, free it; then free the instance

But I find that messier so I left to a similar system as before... any
better ideas?

Also, due to this, it exposes the need to use get_oop_addr at the
weak_oops_do stage for the closure. Is there a way around this? This is
what I have:

       oop value = trace.load_oop();
      if (is_alive->do_object_b(value)) {
        // Update the oop to point to the new object if it is still alive.
        f->do_oop(trace.get_oop_addr());

But I imagine I could/should perhaps do:
       oop value = trace.load_oop();
      if (is_alive->do_object_b(value)) {
        // Update the oop to point to the new object if it is still alive.
        f->do_oop(&value);
        trace.store_oop(&value);

Which do you think is better in this case?

 215   void initialize(int max_storage) {
>  216     MutexLocker mu(HeapMonitorStorage_lock);
>  217     allocate_storage(max_storage);
>  218     memset(&_stats, 0, sizeof(_stats));
>  219   }
>
> You're using a lock so obviously expect multiple threads to try this, but
> you aren't checking if initialization has already been done. That check is
> inside allocate_storage, but you're doing the memset unconditionally.
>

Yes sorry, I added the _stats later and did not see this. Thanks for
catching it. (The truth is that this is also protected by the
HeapMonitoring class above that will only call the initialization once at
enabling so it is protected by that too, no one could call this multiple
times in a row but still, better being consistent).

>
> You're inconsistent with accessing/modifying _initialized under the
> HeapMonitorStorage_lock. The destructor calls free_storage with no lock
> (obviously) held. It's unclear how this lock-free destructor fits into the
> concurrent usage expected of this class.

StackTraceStorage is a singleton created once and then never destroyed
during the lifetime of the program. Only if you start/stop the profiling
would you call initialize/stop, which do have a mutex hold. The destructor
did not have one due to an oversight that it probably should have one for
when the program ends to ensure nothing is happening concurrently at that
time. I've reviewed the usage of that class and the mutex and fixed it up
to be consistent (I believe)

> I see from your reply to Erik that you've now added some orderAccess usage
> to be "paranoid". But we don't want paranoid, we want correct. If there is
> expected to be a lock-free, but potentially concurrent path, then you will
> need to use OrderAccess appropriately. If everything is actually being done
> under a lock, then you don't need to use OrderAccess. But you need to be
> clear on exactly how concurrency comes into play with your code.
>

That is a fair point and I was not trying to convey that kind of meaning
when I said it that way. I apologize for that. I've gone through the code
and ensured to my best ability:
  - StackTraceStorage has no lock-free concurrent path to _initialized;
therefore I removed the OrderAccess for it
  - HeapMonitoring::_enabled did however via the enabled method so I now
load/store it via OrderAccess

Basically what it amounts to:

HeapMonitoring has a mutex that controls enabling/disabling and uses
OrderAccess to get the value of the enabled flag. All other public
HeapMonitoring methods are going directly to the StackTraceStorage and hit
the second mutex there if needed, which protects the internal data.

>  397   _allocated_traces = new (ResourceObj::C_HEAP, mtInternal)
>  398       GrowableArray<StackTraceData>(128, true);
>
> These are allocations that will abort the VM if they fail - correct? If so
> that seems rather user-unfriendly for a profiling tool!
>

I looked around to see what could be done here but failed to see an example
of code that is nicer (in the sense would not just fail). Do you have a
pointer to where I could inspire myself to not have it fail? In general, on
our production code, this is on by default and turned on at start and never
turned off, so this case does not arise but I can add code to handle the
case if you have an example.

>
>  422 void StackTraceStorage::weak_oops_do(BoolObjectClosure *is_alive,
>  423                                      OopClosure *f) {
>  424   MutexLocker mu(HeapMonitorStorage_lock);
>
> IIUC oops-do methods get called at safepoints, but you are then taking a
> Mutex unconditionally. So any other code that acquires the same mutex must
> be guaranteed not to hit a safepoint check - so NoSafepointVerifier should
> be used to check that.
>

Agreed, Robbin pointed me to that and I fixed that in a previous version.

>
>  575 void StackTraceStorage::store_garbage_trace(const StackTraceData
> &trace) {
>  576   StackTraceData *new_trace = new StackTraceData();
>  577   *new_trace = trace;
>
> This looks quite odd to me. Shouldn't this just be using a
> copy-constructor?
>
>
Agreed, this got fixed before (I don't remember when). This used to be a
C-ism that never got fixed. As soon as I created a StackTraceDataWithOop
and one without, this disappeared.

> Thanks,
> David
>

Thanks again!
Jc

> -----
>
>
> On 27/01/2018 2:01 PM, JC Beyler wrote:
>
>> (Change of subject to contain the bug number :)).
>>
>> New webrev is here:
>> Incremental:
>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03_04/
>>
>> Full webrev:
>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04/
>>
>> Inlined are my answers again :).
>>
>> On Fri, Jan 26, 2018 at 5:51 AM, Robbin Ehn <robbin.ehn at oracle.com>
>> wrote:
>>
>>> Hi JC, (dropping compiler on mail thread)
>>>
>>> On 01/26/2018 06:45 AM, JC Beyler wrote:
>>>
>>>>
>>>> Thanks Robbin for the reviews :)
>>>>
>>>> The new full webrev is here:
>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/
>>>> The incremental webrev is here:
>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/
>>>>
>>>
>>>
>>> Thanks!
>>>
>>> I got this compile issue:
>>> /home/rehn/source/jdk/small/open/src/hotspot/share/gc/shared
>>> /threadLocalAllocBuffer.hpp:
>>> In static member function 'static void
>>> G1ResManTLABCache::put(ThreadLocalAllocBuffer&, AllocationContext_t)':
>>> /home/rehn/source/jdk/small/open/src/hotspot/share/gc/shared
>>> /threadLocalAllocBuffer.hpp:97:13:
>>> error: 'HeapWord* ThreadLocalAllocBuffer::hard_end()' is private
>>>     HeapWord* hard_end();
>>>               ^
>>> /home/rehn/source/jdk/small/closed/src/hotspot/share/gc/g1/g
>>> 1ResManTLABCache.cpp:52:50:
>>> error: within this context
>>>     size_t remaining = pointer_delta(tlab.hard_end(), tlab.top());
>>>
>>
>> This is strange but I'm assuming it is because we are not working on
>> the same repo?
>>
>> I used:
>> hg clone http://hg.openjdk.java.net/jdk/hs jdkhs-heap
>>
>> I'll try a new clone on Monday and see. My new version moved hard_end
>> back to public so it should work now.
>>
>>
>>
>>
>>>
>>>> I inlined my answers:
>>>>
>>>> On Thu, Jan 25, 2018 at 1:15 AM, Robbin Ehn <robbin.ehn at oracle.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi JC, great to see another revision!
>>>>>
>>>>> ####
>>>>> heapMonitoring.cpp
>>>>>
>>>>> StackTraceData should not contain the oop for 'safety' reasons.
>>>>> When StackTraceData is moved from _allocated_traces:
>>>>> L452 store_garbage_trace(trace);
>>>>> it contains a dead oop.
>>>>> _allocated_traces could instead be a tupel of oop and StackTraceData
>>>>> thus
>>>>> dead oops are not kept.
>>>>>
>>>>
>>>>
>>>> Done I used inheritance to make the copier work regardless but the
>>>> idea is the same.
>>>>
>>>
>>>
>>> Looks good.
>>>
>>>
>>>>
>>>>> You should use the new Access API for loading the oop, something like
>>>>> this:
>>>>> RootAccess<ON_PHANTOM_OOP_REF | AS_NO_KEEPALIVE>::load(...)
>>>>> I don't think you need to use Access API for clearing the oop, but it
>>>>> would
>>>>> look nicer. And you shouldn't probably be using:
>>>>> Universe::heap()->is_in_reserved(value)
>>>>>
>>>>
>>>>
>>>> I am unfamiliar with this but I think I did do it like you wanted me
>>>> to (all tests pass so that's a start). I'm not sure how to clear the
>>>> oop exactly, is there somewhere that does that, which I can use to do
>>>> the same?
>>>>
>>>> I removed the is_in_reserved, this came from our internal version, I
>>>> don't know why it was there but my tests work without so I removed it
>>>> :)
>>>>
>>>
>>>
>>> I talked a bit with GC folks about this today.
>>> So actually the oop should be in the oopStorage and your should have a
>>> weakhandle to that oop (at least in the near future).
>>> But this should work for now...
>>> In the future when we have the oop in oppStorage you will not get
>>> called, so
>>> you will not know when the oops are dead, either GC must trigger a
>>> concurrent vm operation (e.g. _no_ safepoint operation) so we can clear
>>> dead
>>> oops or do periodic scanning.
>>>
>>> It would be good with some input here from Thomas that knows what you are
>>> doing and knows GC.
>>>
>>
>> Fair enough, hopefully Thomas will chime in. Are you saying that this
>> first version could go in and we can work on a refinement? Or are you
>> saying I should work on this now at the same time and fix it before
>> this V1 goes in? (Just so I know :))
>>
>>
>>>
>>>>
>>>>
>>>>> The lock:
>>>>> L424   MutexLocker mu(HeapMonitorStorage_lock);
>>>>> Is not needed as far as I can see.
>>>>> weak_oops_do is called in a safepoint, no TLAB allocation can happen
>>>>> and
>>>>> JVMTI thread can't access these data-structures. Is there something
>>>>> more
>>>>> to
>>>>> this lock that I'm missing?
>>>>>
>>>>
>>>>
>>>> Since a thread can call the JVMTI getLiveTraces (or any of the other
>>>> ones), it can get to the point of trying to copying the
>>>> _allocated_traces. I imagine it is possible that this is happening
>>>> during a GC or that it can be started and a GC happens afterwards.
>>>> Therefore, it seems to me that you want this protected, no?
>>>>
>>>
>> Looks like it yes, I removed it, ran my tests and they work so the new
>> webrev no longer has that mutex at all.
>>
>>
>>>
>>> A thread calling getLiveTraces will be stopped in the
>>> HeapThreadTransition.
>>> A safepoint don't allow any threads to be in_vm or to be in_java during
>>> safepoint. Only threads in native can execute, but they will be stopped
>>> on
>>> any transition. If a thread is in_vm the safepoint waits to a transition
>>> (locking a mutex is also transition to blocked). So if weak_oops is
>>> called
>>> you have an guarantee that no threads are executing inside the VM or
>>> executing Java code. (not counting GC threads of course)
>>> This also means that the lock can never be contented when weak_oops is
>>> called, so it's not harmful.
>>>
>>>
>>>>
>>>>
>>>>> ####
>>>>> You have 6 files without any changes in them (any more):
>>>>> g1CollectedHeap.cpp
>>>>> psMarkSweep.cpp
>>>>> psParallelCompact.cpp
>>>>> genCollectedHeap.cpp
>>>>> referenceProcessor.cpp
>>>>> thread.hpp
>>>>>
>>>>>
>>>> Done.
>>>>
>>>> ####
>>>>> I have not looked closely, but is it possible to hide heap sampling in
>>>>> AllocTracer ? (with some minor changes to the AllocTracer API)
>>>>>
>>>>>
>>>> I am imagining that you are saying to move the code that does the
>>>> sampling code (change the tlab end, do the call to HeapMonitoring,
>>>> etc.) into the AllocTracer code itself? I think that is right and I'll
>>>> look if that is possible and prepare a webrev to show what would be
>>>> needed to make that happen.
>>>>
>>>
>>>
>>> Sounds good.
>>>
>>
>> I'll look at this on Monday then!
>>
>> Thanks for the reply and have a great weekend!
>> Jc
>>
>>
>>>
>>>> ####
>>>>> Minor nit, when declaring pointer there is a little mix of having the
>>>>> pointer adjacent by type name and data name. (Most hotspot code is by
>>>>> type
>>>>> name)
>>>>> E.g.
>>>>> heapMonitoring.cpp:711     jvmtiStackTrace *trace = ....
>>>>> heapMonitoring.cpp:733         Method* m = vfst.method();
>>>>> (not just this file)
>>>>>
>>>>>
>>>> Done!
>>>>
>>>> ####
>>>>> HeapMonitorThreadOnOffTest.java:77
>>>>> I would make g_tmp volatile, otherwise the assignment in loop may
>>>>> theoretical be skipped.
>>>>>
>>>>>
>>>> Also done!
>>>>
>>>
>>>
>>> Looks good, thanks!
>>>
>>> /Robbin
>>>
>>>
>>>> Thanks again!
>>>> Jc
>>>>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20180206/f90aa965/attachment-0001.html>