RFR: 8229189: Improve JFR leak profiler tracing to deal with discontiguous heaps

Erik Gahlin erik.gahlin at oracle.com
Wed Sep 4 15:26:55 UTC 2019


Looks good.

Erik
>
> On 31/08/2019 12:25 am, Erik Österlund wrote:
>> Hi,
>>
>> Ping. This patch is a prerequisite for pushing the now reviewed
>> 8224815: Remove non-GC uses of CollectedHeap::is_in_reserved()
>>
>> And that is needed for the ZGC mac port.
>>
>> Thanks,
>> /Erik
>>
>> On 2019-08-07 10:18, Erik Österlund wrote:
>>> Hi,
>>>
>>> The JFR leak profiler has marking bit maps that assume a contiguous 
>>> Java heap. ZGC is discontiguous, and therefore does not work with 
>>> JFR. If one tried to use the JFR leak profiler with ZGC, it would 
>>> allocate a bit map for the multi-terabyte "reserved region", even 
>>> though perhaps only 64 MB is used, spread out across this address 
>>> space. That is one of the reason the leakprofiler is turned off for 
>>> ZGC.
>>>
>>> In order to enable leakprofiler support on ZGC, the tracing must 
>>> also use the Access API instead of raw oop loads. But that is 
>>> outside the scope of this RFE; here we deal only with the 
>>> discontiguous address space problem.
>>>
>>> My solution involves implementing a segmented bit map, that makes no 
>>> assumptions about the layout of the Java heap. Given an address, it 
>>> looks up a bitmap fragment from a hash table, given the high order 
>>> bits of a pointer. If there is no such fragment, it is populated to 
>>> the table. The low order bits (shifted by LogMinObjAlignmentInBytes) 
>>> are used to find the bit in the bit map for marking an object in the 
>>> traversal.
>>>
>>> In order to not cause regressions in the speed, some optimizations 
>>> have been made:
>>>
>>> 1) The table uses & instead of % to lookup buckets, ensuring the 
>>> table is always a power of two size.
>>> 2) The hot paths get inlined.
>>> 3) There is a cache for the last fragment, as the probability of two 
>>> subsequent bit accesses for two objects found during tracing in the 
>>> heap do not cross the set up fragment granule (64 MB heap memory) 
>>> boundary. This is something G1 exploits for the cross region check, 
>>> and the same general idea is applied here. The code also asks first 
>>> if a bit is marked and then marks it, as two calls. The cache + 
>>> inlining allows the compiler to lookup the fragment only once for 
>>> the two operations.
>>> 4) Keep the table sparse.
>>>
>>> As a result, no regressions that are outside of the noise can be 
>>> noticed with this new more GC-agnostic approach.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8229189
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8229189/webrev.00/
>>>
>>> Thanks,
>>> /Erik



More information about the hotspot-runtime-dev mailing list