[PATCH] JDK-23: Fix for relocation set selecting in ZGC
yifan zhang
yifanzhang765 at gmail.com
Fri Oct 10 02:44:04 UTC 2025
Hi,
I apologize for the errors in my wording in the previous email. What I
wrote as “fail to promote objects” in my previous email is not related to
the bug I meant to report.(I made a typo here when sending the email.)
The real issue I want to report is the one in ZGC’s relocation set
selection. I’ll now give a more detailed description of the related bug.
* Problem:
The main problem occurs in the function
"ZRelocationSetSelectorGroup::select_inner". This function sorts the pages
in "_live_pages", iterates through them, and finally adds all pages with
indexes smaller than "selected_from" to the relocation set.
Observing the loop that traverses the pages, we can see that
"selected_from" is modified only when the condition "diff_reclaimable >
_fragmentation_limit" is met. In the same branch, "npages_selected" and
"selected_live_bytes" are also increased by the pages processed in the
current iteration.
For example, after sorting, the top three pages are page1, page2, and
page3. When iterating to page3, the condition branch "diff_reclaimable >
_fragmentation_limit" is entered. At this point, it means that page1,
page2, and page3 have all been added to the relocation set, but the count
of live pages in the corresponding statistics only records page3.
The statistical error here is most directly reflected in the logs. When I
enable the virtual machine option -Xlog:gc*,gc+reloc=trace:.. , the number
of Selected pages in the logs does not match the number of original pages,
which I specifically tracked in the virtual machine, that belong to the
objects already relocated.
The patch I submitted is intended to fix this bug. The additional
attachment this time is the related log. (In the attached log, the
"Selected" value is shown as 2, but in fact, the relocated objects in small
pages come from three different pages, which means the relocation set
should be 3.)
Thanks,
Yifan Zhang
On Thu, Oct 9, 2025 at 5:56 PM Axel Boldt-Christmas <
axel.boldt-christmas at oracle.com> wrote:
> Hi,
>
> When you say that this causes ZGC to fail promotion. Do you mean that we
> do not select pages causing “memory leaks” because of fragmentation which
> larger than the ZYoungCompactionLimit? (And did you also mean old to old
> relocation selection?)
>
> The selection code was taken from non-generational ZGC. So it also a bit
> inaccurate because it does not understand that promotions can only age the
> an object by one and we track age on a heap region basis.
>
> I have a couple of branches where I worked on prototypes for improving
> this:
> https://github.com/openjdk/jdk/compare/master...xmas92:jdk:relocation_set_selection_cleanups
> (Which also aims to fixe the mismatch in the logs which you have observed.)
>
> I was aware that our logic is fuzzy and that we have a mismatch in our
> logging. But I did not think we “failed to promote objects”. Maybe you can
> elaborate what you mean by this.
>
> Thanks in advance,
> Axel Boldt-Christmas
>
>
> On 9 Oct 2025, at 11:16, yifan zhang <yifanzhang765 at gmail.com> wrote:
>
> Hi ZGC Developers,
>
> I've encountered an issue where ZGC might fail to promote objects under a specific condition, leading to a memory leak. This patch aims to fix it.
>
> * Problem:
> The number of selected pages in the ZGC log is incorrect.
>
> * Fix:
> The root cause lies in the execution of function ZRelocationSetSelectorGroup::select_inner, where the count of selected pages only includes the pages that satisfy the condition diff_reclaimable > _fragmentation_limit, while ignoring the previously accumulated pages.
> This results in a mismatch between the number of pages actually reclaimed during relocation and the count reflected in the log.
> I have attached my modifications in the attachment.
>
> * Testing:
> I've tested this fix by jtreg.
>
> I would like to ask ZGC developers to review whether this can be created as an issue. Once it is created, how should I commit my modifications for it? Furthermore, how can I become an OpenJDK author?
> Please help to review this patch. Any feedback is appreciated.
>
> Thanks,
> Yifan Zhang
>
> <zgc_fix.patch>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/zgc-dev/attachments/20251010/ae5a2bdb/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zgc_fix.patch
Type: application/octet-stream
Size: 4021 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/zgc-dev/attachments/20251010/ae5a2bdb/zgc_fix.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: related_log.log
Type: application/octet-stream
Size: 526 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/zgc-dev/attachments/20251010/ae5a2bdb/related_log.log>
More information about the zgc-dev
mailing list