RFR: ZGC: Add support for JFR leak profiler
erik.osterlund at oracle.com
erik.osterlund at oracle.com
Tue Dec 10 13:05:59 UTC 2019
Hi Stefan,
On 12/10/19 12:46 PM, Stefan Karlsson wrote:
> Looks good.
>
> I think it would be nice to retain/add a comment that we don't follow
> code cache roots:
> https://cr.openjdk.java.net/~eosterlund/8235174/webrev.02/src/hotspot/share/jfr/leakprofiler/chains/rootSetClosure.cpp.udiff.html
>
I will add a comment before pushing.
> We could probably include the misaligned code cache oops by creating
> temporary, aligned "surrogate" oop*s in, for example, a growable
> array. Maybe create a bug/rfe for that?
Sure, I'll file an RFE.
Thanks for the review!
/Erik
> Thanks,
> StefanK
>
> On 2019-12-09 10:16, Erik Österlund wrote:
>> Hi,
>>
>> I renamed some functions from oops_do to weak_oops_do to follow our
>> naming conventions, and changed the filtering
>> in the (now) weak_oops_do function to check if the raw oop is not
>> null, instead of is_dead() which now has a new
>> meaning since last webrev.
>>
>> Incremental:
>> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.01_02/
>>
>> Full:
>> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.02/
>>
>> This version has passed tier1-5.
>>
>> Thanks,
>> /Erik
>>
>> On 2019-12-04 17:16, Erik Österlund wrote:
>>> Hi Stefan,
>>>
>>> Thanks for the review.
>>>
>>> On 2019-12-04 14:55, Stefan Karlsson wrote:
>>>> Hi Erik,
>>>>
>>>> This looks good to me!
>>>
>>> Thanks!
>>>
>>>> I think it would be good for the code to limit the number of
>>>> UnifedOopRef::addr calls, since those are unsafe ways into the the
>>>> addr and _could_ be abused to go around the UnifiedOopRef
>>>> encapsulation. Maybe something for the future.
>>>
>>> Yeah. I wrote that function because I noticed that the address was
>>> used with literally every single
>>> address looking type I could imagine, in different places
>>> (uintptr_t, address, char*, oop*, void*,
>>> you name it). So I just wrote the addr<T> function to be able to
>>> perform this change as mechanically
>>> as possible.
>>>
>>>> Just a small nit:
>>>>
>>>> https://cr.openjdk.java.net/~eosterlund/8235174/webrev.00/src/hotspot/share/jfr/leakprofiler/checkpoint/eventEmitter.cpp.frames.html
>>>>
>>>>
>>>> Now that you've limited the use of object_addr, maybe you could
>>>> also limit its scope by moving
>>>>
>>>> 113 const oop* object_addr = sample->object_addr();
>>>>
>>>> to around:
>>>>
>>>> 123 edge =
>>>> edge_store->put(UnifiedOopRef::encode_in_native(object_addr));
>>>
>>> Absolutely.
>>>
>>> Here is a new webrev with your proposed change, and with some
>>> changes from Markus (big thanks to Markus for looking at this in
>>> great detail):
>>> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.01/
>>>
>>> Incremental:
>>> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.00_01/
>>>
>>> His changes mostly massage some types and order some includes. For
>>> example Markus prefers passing around UnifiedOopRef
>>> as non-const (now that it is a POD, it's like passing around an
>>> integer).
>>>
>>> The most important change though, was fixing a bug in my patch by
>>> making sure that ObjectSampler::weak_oops_do
>>> actually clears the oops that are dead. I thought it did, like the
>>> other weak_oops_do functions. Turns out
>>> that it has not been needed until now. The oops were not cleared,
>>> yet the ObjectSample is reused as some
>>> kind of cached object (presumably to call malloc less). The
>>> consequence was that when the ObjectSample gets subsequently reused
>>> with a dead oop in it, set_object() now using
>>> NativeAccess<ON_PHANTOM_OOP_REF>::oop_store
>>> with my patch crashes things, because the G1 barrier backend SATB
>>> enqueues what was there in memory before
>>> (which is dead garbage memory).
>>>
>>> Now one might argue that it's really silly of G1 barriers to SATB
>>> enqueue on non-strong stores (because the very
>>> snapshot that SATB tracks is the strong graph, which is the graph
>>> being marked concurrently, nothing else). But I
>>> would like to change that heuristic weirdness in a separate change.
>>> In this patch I am happy with the object simply
>>> being cleared in weak_oops_do, like it is in all other weak_oops_do
>>> functions. That fixes the issue.
>>>
>>> Before there was a separate _dead boolean to track that the
>>> ObjectSample is dead. Now that the oop is cleared
>>> by the GC when it dies, the boolean is no longer needed, because the
>>> is_dead() function can simply look at the oop.
>>>
>>> Thanks,
>>> /Erik
>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>> On 2019-12-02 11:09, erik.osterlund at oracle.com wrote:
>>>>> Hi,
>>>>>
>>>>> The JFR leak profiler is currently not supported on ZGC, because
>>>>> it has not been accessorized appropriately.
>>>>> This patch adds the required Access API sprinkling necessary, to
>>>>> enable this for ZGC. I did the following:
>>>>>
>>>>> 1) Renamed UnifiedOop to UnifiedOopRef, because it describes the
>>>>> reference to an oop, not the oop itself.
>>>>> 2) Changed UnifiedOopRef to be a POD, instead of an AllStatic that
>>>>> passes around its encoded data as
>>>>> const oop* (even though it might be a tagged pointer to a
>>>>> narrow oop), improving type safty.
>>>>> 3) Added another tag bit to UnifiedOopRef, allowing it to keep
>>>>> track of whether the described location is
>>>>> IN_NATIVE or IN_HEAP. The dereference function performs the
>>>>> appropriate NativeAccess/HeapAccess with the
>>>>> AS_NO_KEEPALIVE decotator, which is fine because the leak
>>>>> profiler does not create any edges to oops
>>>>> found during traversal.
>>>>> 4) Removed code blob oop iteration, because it can't be supported
>>>>> appropriately by the current UnifiedOopRef
>>>>> mechanism (because oops are misaligned, and UnifiedOopRef
>>>>> requires tag bits - I believe this has never
>>>>> worked properly)
>>>>> 6) Accessorized loads on strong roots to
>>>>> NativeAccess<AS_NO_KEEPALIVE>::oop_load. JFR leak profiler never
>>>>> creates new references to oops that are visited, so it does not
>>>>> need to keep things alive
>>>>> 7) Explicitly told the oop iteration closures to not visit
>>>>> referents. The default referent strategy used
>>>>> before is DO_DISCOVERY, which doesn't seem right.
>>>>> 8) Fixed a pre-existing issue where the array index reported in
>>>>> the leak chain was calculated incorrectly
>>>>> as the byte offset from the array base. It should be scaled by
>>>>> the element length of the array.
>>>>>
>>>>> Ran some JFR leak profiler tests, and some ad-hoc leak profiling
>>>>> in various applications. Also checked that
>>>>> there were no observable perf regressions in the iterations, and
>>>>> there were not.
>>>>>
>>>>> Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8235174
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~eosterlund/8235174/webrev.00/
>>>>>
>>>>> Thanks,
>>>>> /Erik
>>>>
>>>
>>
More information about the hotspot-dev
mailing list