generational zgc issues

Stefan Karlsson stefan.karlsson at oracle.com
Fri Dec 1 12:34:08 UTC 2023



On 2023-12-01 13:16, Alen Vrečko wrote:
> Hi Stefan,
>
> I looked why JOL v0.10 had issues with generational. It failed to 
> attach Serviceability Agent to the process. In that case it uses a 
> fallback method to calculate various JVM parameters. Among them it got 
> object alignment totally wrong.
>
> It uses this method to guess object alignment: It allocates objects, 
> gets their memory address and does Math#GCD. I think pretty clever.
>
> https://github.com/openjdk/jol/blob/d9e890652b4cec9d155b28a8849dcdaa2706e058/jol-core/src/main/java/org/openjdk/jol/vm/HotspotUnsafe.java#L405 
> <https://urldefense.com/v3/__https://github.com/openjdk/jol/blob/d9e890652b4cec9d155b28a8849dcdaa2706e058/jol-core/src/main/java/org/openjdk/jol/vm/HotspotUnsafe.java*L405__;Iw!!ACWV5N9M2RV99hQ!Pdh92iRfc7i53RQB_6amv4xSQ0vQjNjJXv6xBjou7074VgD4oxZRRrJwKGfnUCclidEcQZjGPjFsKj5OUN8sLNC5$>
>
> The JOL method guesses 65536 for object alignment when using generational.

The problem seems to be JOL's implementation of addressOf:

objectAddress = U.getLong(array, arrayObjectBase);

It reads the object pointers as longs, which skips using load barriers, 
and therefore doesn't shave of the ZGC colors:

https://github.com/openjdk/jdk/blob/8f1d40b48bf145144ae90b1d147d418d3905661b/src/hotspot/share/gc/z/zAddress.hpp#L44

// A zpointer is a combination of the address bits (heap base bit + offset)
// and two low-order metadata bytes, with the following layout:

> I tried my own version of this approach. I get 16 when generational is 
> enabled.
>
> I am curious to know why are the results like this. Does young gen 
> have a different object alignment than the old gen?

I don't know why you got 16. The alignment is supposed to be 8 for both 
old gen and young gen.

Cheers,
StefanK

>
> Thanks
> Alen
>
> V V sre., 29. nov. 2023 ob 19:37 je oseba Alen Vrečko 
> <alen.vrecko at gmail.com> napisala:
>
>     Hi Stefan,
>
>     all good. Finally got around to it. My bad in both cases.
>
>     o) adding System.gc() solved the problem. Indeed, not a good idea
>     to have expectations when working with java.lang.ref.Cleaner.
>     Preferably not use it at all.
>
>     o) for the corrupted byte[], got a chance to look into it. Not
>     just speculate on log output. The issue was in Java Object Layout
>     library (used v0.10). It returned something like 500K for the size
>     of an object if Generational is enabled (should be in the range of
>     < 100B). This caused a failure while processing byte[] and why I
>     assumed that the byte[] is corrupted. I updated the jol library to
>     0.17 and it works fine now. Interesting that it looks like JOL
>     v0.10 works fine on CentOS 7 with generational but not Alma 9.2
>     with generational - same 21 jdk.
>
>     Time to fix some bad first impressions.
>
>     Thanks
>     Alen
>
>     V V pon., 13. nov. 2023 ob 22:21 je oseba Alen Vrečko
>     <alen.vrecko at gmail.com> napisala:
>
>         Thanks for the fast reply Stefan.
>
>         For the reference issue. Looks like I misunderstood. Most
>         probably issue with timing in the toy program with major
>         collections. For both G1 and ZGC (non generational) both
>         counters for new Foo() and Cleaner(foo)#clean match after a
>         short while. But not for generational ZGC. I'll add
>         System.gc() call in there and see what happens. Most probably
>         a non-issue then and a misunderstanding on my part.
>
>         For the corrupted byte[]. Will see how much time I have on my
>         hands to look into it. Like mentioned vanilla ZGC works fine,
>         with generational ZGC seeing funny stuff with byte[].
>
>         Alen
>
>         V V pon., 13. nov. 2023 ob 20:28 je oseba Stefan Karlsson
>         <stefan.karlsson at oracle.com> napisala:
>
>             Hi Alen,
>
>             On 2023-11-13 19:05, Alen Vrečko wrote:
>>             Hello everyone,
>>
>>             o) young gen reference processor
>>
>>             A bit puzzled by reading in a thread on the list:
>>
>>             > mentioning that we decided to not ship a young
>>             generation reference processor for 21
>>             Unless you made changes to ByteBuffer#allocateDirect it
>>             uses reference processor to free native memory. If I am
>>             not mistaking just using standard library API such as
>>             Files.readAllBytes will in some cases do
>>             BB#allocateDirect in the internals.
>>             Or maybe I am misunderstanding something? I made a toy
>>             program and indeed I could easily get a situation where
>>             20% of reference handlers are not called like ever.
>>             This will cause issues for code that is using reference
>>             handlers.
>
>             The reference processing will happen when the GC performs
>             a major collection, which collects both the young and old
>             generation. If you add a System.gc() you should see that
>             the reference processor is kicking in for your program.
>             Could you share your toy program?
>
>>             o) seeing weird byte[] corruption in production
>>             On CentOS 7 Generational works fine. No issues observed.
>>             But on Alma Linux 9.2 either reading byte[] from file or
>>             sending byte[] over the network corrupts the byte[].
>>             Didn't investigate at all. Just observed corruption in
>>             some cases for some byte[] arrays - not all - just some.
>>             On the same Alma Linux 9.2 without generational zgc no
>>             byte[] corruption is observed and everything works fine
>>             as before.
>
>             It's hard to say if this is a ZGC bug, compiler bug, OS
>             bug, etc. Here are some suggestions for how to help
>             pin-point the problem:
>             1) Could you provide the output from 'java -version'?
>             2) Is it possible to reproduce this with a small reproducer?
>             3) What CPU is this running on?
>             4) Does it happen with -XX:UseAVX=0
>             5) Do you know the sizes of the corrupted byte[]s? Do you
>             know the offset to where it is corrupted?
>
>             StefanK
>
>>             To me Generational ZGC looks more like an experimental
>>             feature for now. I am a bit surprised it doesn't require
>>             the extra flag to unlock experimental features.
>>             Thanks
>>             Alen
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/zgc-dev/attachments/20231201/7b87b454/attachment-0001.htm>


More information about the zgc-dev mailing list