generational zgc issues

Stefan Karlsson stefan.karlsson at oracle.com
Fri Dec 1 12:36:49 UTC 2023


Hi Alen,

I'm glad that you figured out what was happening. FWIW, I ran a whole 
bunch of tests on Alma 9.2 and couldn't reproduce any issues.

Cheers,
StefanK

On 2023-11-29 19:37, Alen Vrečko wrote:
> Hi Stefan,
>
> all good. Finally got around to it. My bad in both cases.
>
> o) adding System.gc() solved the problem. Indeed, not a good idea to 
> have expectations when working with java.lang.ref.Cleaner. Preferably 
> not use it at all.
>
> o) for the corrupted byte[], got a chance to look into it. Not just 
> speculate on log output. The issue was in Java Object Layout library 
> (used v0.10). It returned something like 500K for the size of an 
> object if Generational is enabled (should be in the range of < 100B). 
> This caused a failure while processing byte[] and why I assumed that 
> the byte[] is corrupted. I updated the jol library to 0.17 and it 
> works fine now. Interesting that it looks like JOL v0.10 works fine on 
> CentOS 7 with generational but not Alma 9.2 with generational - same 
> 21 jdk.
>
> Time to fix some bad first impressions.
>
> Thanks
> Alen
>
> V V pon., 13. nov. 2023 ob 22:21 je oseba Alen Vrečko 
> <alen.vrecko at gmail.com> napisala:
>
>     Thanks for the fast reply Stefan.
>
>     For the reference issue. Looks like I misunderstood. Most probably
>     issue with timing in the toy program with major collections. For
>     both G1 and ZGC (non generational) both counters for new Foo() and
>     Cleaner(foo)#clean match after a short while. But not for
>     generational ZGC. I'll add System.gc() call in there and see what
>     happens. Most probably a non-issue then and a misunderstanding on
>     my part.
>
>     For the corrupted byte[]. Will see how much time I have on my
>     hands to look into it. Like mentioned vanilla ZGC works fine, with
>     generational ZGC seeing funny stuff with byte[].
>
>     Alen
>
>     V V pon., 13. nov. 2023 ob 20:28 je oseba Stefan Karlsson
>     <stefan.karlsson at oracle.com> napisala:
>
>         Hi Alen,
>
>         On 2023-11-13 19:05, Alen Vrečko wrote:
>>         Hello everyone,
>>
>>         o) young gen reference processor
>>
>>         A bit puzzled by reading in a thread on the list:
>>
>>         > mentioning that we decided to not ship a young generation
>>         reference processor for 21
>>         Unless you made changes to ByteBuffer#allocateDirect it uses
>>         reference processor to free native memory. If I am not
>>         mistaking just using standard library API such as
>>         Files.readAllBytes will in some cases do BB#allocateDirect in
>>         the internals.
>>         Or maybe I am misunderstanding something? I made a toy
>>         program and indeed I could easily get a situation where 20%
>>         of reference handlers are not called like ever.
>>         This will cause issues for code that is using reference handlers.
>
>         The reference processing will happen when the GC performs a
>         major collection, which collects both the young and old
>         generation. If you add a System.gc() you should see that the
>         reference processor is kicking in for your program. Could you
>         share your toy program?
>
>>         o) seeing weird byte[] corruption in production
>>         On CentOS 7 Generational works fine. No issues observed. But
>>         on Alma Linux 9.2 either reading byte[] from file or sending
>>         byte[] over the network corrupts the byte[]. Didn't
>>         investigate at all. Just observed corruption in some cases
>>         for some byte[] arrays - not all - just some. On the same
>>         Alma Linux 9.2 without generational zgc no byte[] corruption
>>         is observed and everything works fine as before.
>
>         It's hard to say if this is a ZGC bug, compiler bug, OS bug,
>         etc. Here are some suggestions for how to help pin-point the
>         problem:
>         1) Could you provide the output from 'java -version'?
>         2) Is it possible to reproduce this with a small reproducer?
>         3) What CPU is this running on?
>         4) Does it happen with -XX:UseAVX=0
>         5) Do you know the sizes of the corrupted byte[]s? Do you know
>         the offset to where it is corrupted?
>
>         StefanK
>
>>         To me Generational ZGC looks more like an experimental
>>         feature for now. I am a bit surprised it doesn't require the
>>         extra flag to unlock experimental features.
>>         Thanks
>>         Alen
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/zgc-dev/attachments/20231201/24b6ad95/attachment.htm>


More information about the zgc-dev mailing list