RFR: ZGC: Add support for JFR leak profiler

Wed Dec 4 16:16:40 UTC 2019

Hi Stefan,

Thanks for the review.

On 2019-12-04 14:55, Stefan Karlsson wrote:
> Hi Erik,
>
> This looks good to me!

Thanks!

> I think it would be good for the code to limit the number of 
> UnifedOopRef::addr calls, since those are unsafe ways into the the 
> addr and _could_ be abused to go around the UnifiedOopRef 
> encapsulation. Maybe something for the future.

Yeah. I wrote that function because I noticed that the address was used 
with literally every single
address looking type I could imagine, in different places (uintptr_t, 
address, char*, oop*, void*,
you name it). So I just wrote the addr<T> function to be able to perform 
this change as mechanically
as possible.

> Just a small nit:
>
> https://cr.openjdk.java.net/~eosterlund/8235174/webrev.00/src/hotspot/share/jfr/leakprofiler/checkpoint/eventEmitter.cpp.frames.html 
>
>
> Now that you've limited the use of object_addr, maybe you could also 
> limit its scope by moving
>
> 113   const oop* object_addr = sample->object_addr();
>
> to around:
>
> 123 edge = edge_store->put(UnifiedOopRef::encode_in_native(object_addr));

Absolutely.

Here is a new webrev with your proposed change, and with some changes 
from Markus (big thanks to Markus for looking at this in great detail):
http://cr.openjdk.java.net/~eosterlund/8235174/webrev.01/

Incremental:
http://cr.openjdk.java.net/~eosterlund/8235174/webrev.00_01/

His changes mostly massage some types and order some includes. For 
example Markus prefers passing around UnifiedOopRef
as non-const (now that it is a POD, it's like passing around an integer).

The most important change though, was fixing a bug in my patch by making 
sure that ObjectSampler::weak_oops_do
actually clears the oops that are dead. I thought it did, like the other 
weak_oops_do functions. Turns out
that it has not been needed until now. The oops were not cleared, yet 
the ObjectSample is reused as some
kind of cached object (presumably to call malloc less). The consequence 
was that when the ObjectSample gets subsequently reused with a dead oop 
in it, set_object() now using NativeAccess<ON_PHANTOM_OOP_REF>::oop_store
with my patch crashes things, because the G1 barrier backend SATB 
enqueues what was there in memory before
(which is dead garbage memory).

Now one might argue that it's really silly of G1 barriers to SATB 
enqueue on non-strong stores (because the very
snapshot that SATB tracks is the strong graph, which is the graph being 
marked concurrently, nothing else). But I
would like to change that heuristic weirdness in a separate change. In 
this patch I am happy with the object simply
being cleared in weak_oops_do, like it is in all other weak_oops_do 
functions. That fixes the issue.

Before there was a separate _dead boolean to track that the ObjectSample 
is dead. Now that the oop is cleared
by the GC when it dies, the boolean is no longer needed, because the 
is_dead() function can simply look at the oop.

Thanks,
/Erik

> Thanks,
> StefanK
>
> On 2019-12-02 11:09, erik.osterlund at oracle.com wrote:
>> Hi,
>>
>> The JFR leak profiler is currently not supported on ZGC, because it 
>> has not been accessorized appropriately.
>> This patch adds the required Access API sprinkling necessary, to 
>> enable this for ZGC. I did the following:
>>
>> 1) Renamed UnifiedOop to UnifiedOopRef, because it describes the 
>> reference to an oop, not the oop itself.
>> 2) Changed UnifiedOopRef to be a POD, instead of an AllStatic that 
>> passes around its encoded data as
>>    const oop* (even though it might be a tagged pointer to a narrow 
>> oop), improving type safty.
>> 3) Added another tag bit to UnifiedOopRef, allowing it to keep track 
>> of whether the described location is
>>    IN_NATIVE or IN_HEAP. The dereference function performs the 
>> appropriate NativeAccess/HeapAccess with the
>>    AS_NO_KEEPALIVE decotator, which is fine because the leak profiler 
>> does not create any edges to oops
>>    found during traversal.
>> 4) Removed code blob oop iteration, because it can't be supported 
>> appropriately by the current UnifiedOopRef
>>    mechanism (because oops are misaligned, and UnifiedOopRef requires 
>> tag bits - I believe this has never
>>    worked properly)
>> 6) Accessorized loads on strong roots to 
>> NativeAccess<AS_NO_KEEPALIVE>::oop_load. JFR leak profiler never
>>    creates new references to oops that are visited, so it does not 
>> need to keep things alive
>> 7) Explicitly told the oop iteration closures to not visit referents. 
>> The default referent strategy used
>>    before is DO_DISCOVERY, which doesn't seem right.
>> 8) Fixed a pre-existing issue where the array index reported in the 
>> leak chain was calculated incorrectly
>>    as the byte offset from the array base. It should be scaled by the 
>> element length of the array.
>>
>> Ran some JFR leak profiler tests, and some ad-hoc leak profiling in 
>> various applications. Also checked that
>> there were no observable perf regressions in the iterations, and 
>> there were not.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8235174
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.00/
>>
>> Thanks,
>> /Erik
>