RFR: ZGC: Add support for JFR leak profiler

Markus Gronlund markus.gronlund at oracle.com
Tue Dec 10 11:47:03 UTC 2019


Hi again Erik,

Looks good.

Thanks
Markus

-----Original Message-----
From: Erik Österlund 
Sent: den 9 december 2019 10:17
To: Stefan Karlsson <stefan.karlsson at oracle.com>; hotspot-dev developers <hotspot-dev at openjdk.java.net>
Cc: Markus Gronlund <markus.gronlund at oracle.com>; Erik Gahlin <erik.gahlin at oracle.com>
Subject: Re: RFR: ZGC: Add support for JFR leak profiler

Hi,

I renamed some functions from oops_do to weak_oops_do to follow our naming conventions, and changed the filtering in the (now) weak_oops_do function to check if the raw oop is not null, instead of is_dead() which now has a new meaning since last webrev.

Incremental:
http://cr.openjdk.java.net/~eosterlund/8235174/webrev.01_02/

Full:
http://cr.openjdk.java.net/~eosterlund/8235174/webrev.02/

This version has passed tier1-5.

Thanks,
/Erik

On 2019-12-04 17:16, Erik Österlund wrote:
> Hi Stefan,
>
> Thanks for the review.
>
> On 2019-12-04 14:55, Stefan Karlsson wrote:
>> Hi Erik,
>>
>> This looks good to me!
>
> Thanks!
>
>> I think it would be good for the code to limit the number of 
>> UnifedOopRef::addr calls, since those are unsafe ways into the the 
>> addr and _could_ be abused to go around the UnifiedOopRef 
>> encapsulation. Maybe something for the future.
>
> Yeah. I wrote that function because I noticed that the address was 
> used with literally every single address looking type I could imagine, 
> in different places (uintptr_t, address, char*, oop*, void*, you name 
> it). So I just wrote the addr<T> function to be able to perform this 
> change as mechanically as possible.
>
>> Just a small nit:
>>
>> https://cr.openjdk.java.net/~eosterlund/8235174/webrev.00/src/hotspot
>> /share/jfr/leakprofiler/checkpoint/eventEmitter.cpp.frames.html
>>
>>
>> Now that you've limited the use of object_addr, maybe you could also 
>> limit its scope by moving
>>
>> 113   const oop* object_addr = sample->object_addr();
>>
>> to around:
>>
>> 123 edge =
>> edge_store->put(UnifiedOopRef::encode_in_native(object_addr));
>
> Absolutely.
>
> Here is a new webrev with your proposed change, and with some changes 
> from Markus (big thanks to Markus for looking at this in great detail):
> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.01/
>
> Incremental:
> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.00_01/
>
> His changes mostly massage some types and order some includes. For 
> example Markus prefers passing around UnifiedOopRef as non-const (now 
> that it is a POD, it's like passing around an integer).
>
> The most important change though, was fixing a bug in my patch by 
> making sure that ObjectSampler::weak_oops_do actually clears the oops 
> that are dead. I thought it did, like the other weak_oops_do 
> functions. Turns out that it has not been needed until now. The oops 
> were not cleared, yet the ObjectSample is reused as some kind of 
> cached object (presumably to call malloc less). The consequence was 
> that when the ObjectSample gets subsequently reused with a dead oop in 
> it, set_object() now using NativeAccess<ON_PHANTOM_OOP_REF>::oop_store
> with my patch crashes things, because the G1 barrier backend SATB 
> enqueues what was there in memory before (which is dead garbage 
> memory).
>
> Now one might argue that it's really silly of G1 barriers to SATB 
> enqueue on non-strong stores (because the very snapshot that SATB 
> tracks is the strong graph, which is the graph being marked 
> concurrently, nothing else). But I would like to change that heuristic 
> weirdness in a separate change. In this patch I am happy with the 
> object simply being cleared in weak_oops_do, like it is in all other 
> weak_oops_do functions. That fixes the issue.
>
> Before there was a separate _dead boolean to track that the 
> ObjectSample is dead. Now that the oop is cleared by the GC when it 
> dies, the boolean is no longer needed, because the
> is_dead() function can simply look at the oop.
>
> Thanks,
> /Erik
>
>> Thanks,
>> StefanK
>>
>> On 2019-12-02 11:09, erik.osterlund at oracle.com wrote:
>>> Hi,
>>>
>>> The JFR leak profiler is currently not supported on ZGC, because it 
>>> has not been accessorized appropriately.
>>> This patch adds the required Access API sprinkling necessary, to 
>>> enable this for ZGC. I did the following:
>>>
>>> 1) Renamed UnifiedOop to UnifiedOopRef, because it describes the 
>>> reference to an oop, not the oop itself.
>>> 2) Changed UnifiedOopRef to be a POD, instead of an AllStatic that 
>>> passes around its encoded data as
>>>    const oop* (even though it might be a tagged pointer to a narrow 
>>> oop), improving type safty.
>>> 3) Added another tag bit to UnifiedOopRef, allowing it to keep track 
>>> of whether the described location is
>>>    IN_NATIVE or IN_HEAP. The dereference function performs the 
>>> appropriate NativeAccess/HeapAccess with the
>>>    AS_NO_KEEPALIVE decotator, which is fine because the leak 
>>> profiler does not create any edges to oops
>>>    found during traversal.
>>> 4) Removed code blob oop iteration, because it can't be supported 
>>> appropriately by the current UnifiedOopRef
>>>    mechanism (because oops are misaligned, and UnifiedOopRef 
>>> requires tag bits - I believe this has never
>>>    worked properly)
>>> 6) Accessorized loads on strong roots to 
>>> NativeAccess<AS_NO_KEEPALIVE>::oop_load. JFR leak profiler never
>>>    creates new references to oops that are visited, so it does not 
>>> need to keep things alive
>>> 7) Explicitly told the oop iteration closures to not visit 
>>> referents. The default referent strategy used
>>>    before is DO_DISCOVERY, which doesn't seem right.
>>> 8) Fixed a pre-existing issue where the array index reported in the 
>>> leak chain was calculated incorrectly
>>>    as the byte offset from the array base. It should be scaled by 
>>> the element length of the array.
>>>
>>> Ran some JFR leak profiler tests, and some ad-hoc leak profiling in 
>>> various applications. Also checked that there were no observable 
>>> perf regressions in the iterations, and there were not.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8235174
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.00/
>>>
>>> Thanks,
>>> /Erik
>>
>



More information about the hotspot-dev mailing list