RFR: 8332455: Improve G1/ParallelGC tasks to not override array lengths
Aleksey Shipilev
shade at openjdk.org
Mon Jul 8 16:07:37 UTC 2024
On Fri, 5 Jul 2024 09:09:57 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
>> In order to not cause excessive traffic on task queues when scanning large object arrays, G1 and ParallelGC use a way of slicing those arrays into smaller pieces. It overrides the from-space array's length field to track the array slices.
>>
>> I think it would be cleaner if we improve the tasks such that array slices can be fully encoded in the task and does not require overriding the array length.
>>
>> This PR borrows the principal encoding and slicing algorithm from Shenandoah (originally written by @shipilev). It also unifies the slicing implementations of the young GC and concurrent marking GC and ParallelGC.
>>
>> For a description of the encoding and slicing algorithm, see top of arraySlicer.hpp.
>>
>> On x86 (32-bit) we don't have enough bits in the single-word task to encode the slicing, so I'm extending the task to 64 bits (pointer and two int32 fields).
>>
>> I put in some efforts to make sure the shared array slicing uses the user-configurable flags ParGCArrayScanChunk and ObjArrayMarkingStride just as before, but TBH, I don't see the point of having those flags as product flags to begin with. I would probably deprecate and remove ParGCArrayScanChunk, and use the develop flag ObjArrayMarkingStride everywhere. YMMV.
>>
>> Testing:
>> - [x] hotspot_gc
>> - [x] tier1
>> - [x] tier2
>
> src/hotspot/share/gc/shared/arraySlicer.hpp line 74:
>
>> 72: // 10 bits for slice: max 1024 blocks per array
>> 73: // 5 bits for power: max 2^32 array
>> 74: // 49 bits for oop: max 512 TB of addressable space
>
> This encoding is incompatible with Linux Large Virtual Address space:
> https://www.kernel.org/doc/html/v5.8/arm64/memory.html
> which has a 52 bit address space. I also don't know of any reason why future address space
> configuration couldn't support the full non-tagged range (so 56 bits). I think that makes this
> scheme not viable.
I think hardware support is an orthogonal issue. It would have been an issue if we just blindly casted the pointer to `oop`, and relied on hardware to only treat the lowest bits for the actual address.
But since we are masking out the `oop` explicitly (see `oop_extract_mask`), we do not actually care what hardware supports. What we get as the limit is how much of the Java heap we can represent, given that we encode `oop`-s. Shenandoah checks this on startup, for example: https://github.com/openjdk/jdk/blob/d8c1c6ab0543c986280dcfa1c6c79e010a7b35fb/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#L202-L212
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/19282#discussion_r1668915067
More information about the hotspot-gc-dev
mailing list