RFR: ZGC: Add support for JFR leak profiler

Tue Dec 10 13:05:59 UTC 2019

Hi Stefan,

On 12/10/19 12:46 PM, Stefan Karlsson wrote:
> Looks good.
>
> I think it would be nice to retain/add a comment that we don't follow 
> code cache roots:
> https://cr.openjdk.java.net/~eosterlund/8235174/webrev.02/src/hotspot/share/jfr/leakprofiler/chains/rootSetClosure.cpp.udiff.html 
>

I will add a comment before pushing.

> We could probably include the misaligned code cache oops by creating 
> temporary, aligned "surrogate" oop*s in, for example, a growable 
> array. Maybe create a bug/rfe for that?

Sure, I'll file an RFE.

Thanks for the review!

/Erik

> Thanks,
> StefanK
>
> On 2019-12-09 10:16, Erik Österlund wrote:
>> Hi,
>>
>> I renamed some functions from oops_do to weak_oops_do to follow our 
>> naming conventions, and changed the filtering
>> in the (now) weak_oops_do function to check if the raw oop is not 
>> null, instead of is_dead() which now has a new
>> meaning since last webrev.
>>
>> Incremental:
>> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.01_02/
>>
>> Full:
>> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.02/
>>
>> This version has passed tier1-5.
>>
>> Thanks,
>> /Erik
>>
>> On 2019-12-04 17:16, Erik Österlund wrote:
>>> Hi Stefan,
>>>
>>> Thanks for the review.
>>>
>>> On 2019-12-04 14:55, Stefan Karlsson wrote:
>>>> Hi Erik,
>>>>
>>>> This looks good to me!
>>>
>>> Thanks!
>>>
>>>> I think it would be good for the code to limit the number of 
>>>> UnifedOopRef::addr calls, since those are unsafe ways into the the 
>>>> addr and _could_ be abused to go around the UnifiedOopRef 
>>>> encapsulation. Maybe something for the future.
>>>
>>> Yeah. I wrote that function because I noticed that the address was 
>>> used with literally every single
>>> address looking type I could imagine, in different places 
>>> (uintptr_t, address, char*, oop*, void*,
>>> you name it). So I just wrote the addr<T> function to be able to 
>>> perform this change as mechanically
>>> as possible.
>>>
>>>> Just a small nit:
>>>>
>>>> https://cr.openjdk.java.net/~eosterlund/8235174/webrev.00/src/hotspot/share/jfr/leakprofiler/checkpoint/eventEmitter.cpp.frames.html 
>>>>
>>>>
>>>> Now that you've limited the use of object_addr, maybe you could 
>>>> also limit its scope by moving
>>>>
>>>> 113   const oop* object_addr = sample->object_addr();
>>>>
>>>> to around:
>>>>
>>>> 123 edge = 
>>>> edge_store->put(UnifiedOopRef::encode_in_native(object_addr));
>>>
>>> Absolutely.
>>>
>>> Here is a new webrev with your proposed change, and with some 
>>> changes from Markus (big thanks to Markus for looking at this in 
>>> great detail):
>>> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.01/
>>>
>>> Incremental:
>>> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.00_01/
>>>
>>> His changes mostly massage some types and order some includes. For 
>>> example Markus prefers passing around UnifiedOopRef
>>> as non-const (now that it is a POD, it's like passing around an 
>>> integer).
>>>
>>> The most important change though, was fixing a bug in my patch by 
>>> making sure that ObjectSampler::weak_oops_do
>>> actually clears the oops that are dead. I thought it did, like the 
>>> other weak_oops_do functions. Turns out
>>> that it has not been needed until now. The oops were not cleared, 
>>> yet the ObjectSample is reused as some
>>> kind of cached object (presumably to call malloc less). The 
>>> consequence was that when the ObjectSample gets subsequently reused 
>>> with a dead oop in it, set_object() now using 
>>> NativeAccess<ON_PHANTOM_OOP_REF>::oop_store
>>> with my patch crashes things, because the G1 barrier backend SATB 
>>> enqueues what was there in memory before
>>> (which is dead garbage memory).
>>>
>>> Now one might argue that it's really silly of G1 barriers to SATB 
>>> enqueue on non-strong stores (because the very
>>> snapshot that SATB tracks is the strong graph, which is the graph 
>>> being marked concurrently, nothing else). But I
>>> would like to change that heuristic weirdness in a separate change. 
>>> In this patch I am happy with the object simply
>>> being cleared in weak_oops_do, like it is in all other weak_oops_do 
>>> functions. That fixes the issue.
>>>
>>> Before there was a separate _dead boolean to track that the 
>>> ObjectSample is dead. Now that the oop is cleared
>>> by the GC when it dies, the boolean is no longer needed, because the 
>>> is_dead() function can simply look at the oop.
>>>
>>> Thanks,
>>> /Erik
>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>> On 2019-12-02 11:09, erik.osterlund at oracle.com wrote:
>>>>> Hi,
>>>>>
>>>>> The JFR leak profiler is currently not supported on ZGC, because 
>>>>> it has not been accessorized appropriately.
>>>>> This patch adds the required Access API sprinkling necessary, to 
>>>>> enable this for ZGC. I did the following:
>>>>>
>>>>> 1) Renamed UnifiedOop to UnifiedOopRef, because it describes the 
>>>>> reference to an oop, not the oop itself.
>>>>> 2) Changed UnifiedOopRef to be a POD, instead of an AllStatic that 
>>>>> passes around its encoded data as
>>>>>    const oop* (even though it might be a tagged pointer to a 
>>>>> narrow oop), improving type safty.
>>>>> 3) Added another tag bit to UnifiedOopRef, allowing it to keep 
>>>>> track of whether the described location is
>>>>>    IN_NATIVE or IN_HEAP. The dereference function performs the 
>>>>> appropriate NativeAccess/HeapAccess with the
>>>>>    AS_NO_KEEPALIVE decotator, which is fine because the leak 
>>>>> profiler does not create any edges to oops
>>>>>    found during traversal.
>>>>> 4) Removed code blob oop iteration, because it can't be supported 
>>>>> appropriately by the current UnifiedOopRef
>>>>>    mechanism (because oops are misaligned, and UnifiedOopRef 
>>>>> requires tag bits - I believe this has never
>>>>>    worked properly)
>>>>> 6) Accessorized loads on strong roots to 
>>>>> NativeAccess<AS_NO_KEEPALIVE>::oop_load. JFR leak profiler never
>>>>>    creates new references to oops that are visited, so it does not 
>>>>> need to keep things alive
>>>>> 7) Explicitly told the oop iteration closures to not visit 
>>>>> referents. The default referent strategy used
>>>>>    before is DO_DISCOVERY, which doesn't seem right.
>>>>> 8) Fixed a pre-existing issue where the array index reported in 
>>>>> the leak chain was calculated incorrectly
>>>>>    as the byte offset from the array base. It should be scaled by 
>>>>> the element length of the array.
>>>>>
>>>>> Ran some JFR leak profiler tests, and some ad-hoc leak profiling 
>>>>> in various applications. Also checked that
>>>>> there were no observable perf regressions in the iterations, and 
>>>>> there were not.
>>>>>
>>>>> Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8235174
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.00/
>>>>>
>>>>> Thanks,
>>>>> /Erik
>>>>
>>>
>>