From yifanzhang765 at gmail.com Thu Oct 9 09:16:49 2025 From: yifanzhang765 at gmail.com (yifan zhang) Date: Thu, 9 Oct 2025 17:16:49 +0800 Subject: [PATCH] JDK-23: Fix for relocation set selecting in ZGC Message-ID: Hi ZGC Developers, I've encountered an issue where ZGC might fail to promote objects under a specific condition, leading to a memory leak. This patch aims to fix it. * Problem: The number of selected pages in the ZGC log is incorrect. * Fix: The root cause lies in the execution of function ZRelocationSetSelectorGroup::select_inner, where the count of selected pages only includes the pages that satisfy the condition diff_reclaimable > _fragmentation_limit, while ignoring the previously accumulated pages. This results in a mismatch between the number of pages actually reclaimed during relocation and the count reflected in the log. I have attached my modifications in the attachment. * Testing: I've tested this fix by jtreg. I would like to ask ZGC developers to review whether this can be created as an issue. Once it is created, how should I commit my modifications for it? Furthermore, how can I become an OpenJDK author? Please help to review this patch. Any feedback is appreciated. Thanks, Yifan Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: zgc_fix.patch Type: application/octet-stream Size: 4021 bytes Desc: not available URL: From axel.boldt-christmas at oracle.com Thu Oct 9 09:56:22 2025 From: axel.boldt-christmas at oracle.com (Axel Boldt-Christmas) Date: Thu, 9 Oct 2025 09:56:22 +0000 Subject: [PATCH] JDK-23: Fix for relocation set selecting in ZGC In-Reply-To: References: Message-ID: Hi, When you say that this causes ZGC to fail promotion. Do you mean that we do not select pages causing ?memory leaks? because of fragmentation which larger than the ZYoungCompactionLimit? (And did you also mean old to old relocation selection?) The selection code was taken from non-generational ZGC. So it also a bit inaccurate because it does not understand that promotions can only age the an object by one and we track age on a heap region basis. I have a couple of branches where I worked on prototypes for improving this: https://github.com/openjdk/jdk/compare/master...xmas92:jdk:relocation_set_selection_cleanups (Which also aims to fixe the mismatch in the logs which you have observed.) I was aware that our logic is fuzzy and that we have a mismatch in our logging. But I did not think we ?failed to promote objects?. Maybe you can elaborate what you mean by this. Thanks in advance, Axel Boldt-Christmas On 9 Oct 2025, at 11:16, yifan zhang wrote: Hi ZGC Developers, I've encountered an issue where ZGC might fail to promote objects under a specific condition, leading to a memory leak. This patch aims to fix it. * Problem: The number of selected pages in the ZGC log is incorrect. * Fix: The root cause lies in the execution of function ZRelocationSetSelectorGroup::select_inner, where the count of selected pages only includes the pages that satisfy the condition diff_reclaimable > _fragmentation_limit, while ignoring the previously accumulated pages. This results in a mismatch between the number of pages actually reclaimed during relocation and the count reflected in the log. I have attached my modifications in the attachment. * Testing: I've tested this fix by jtreg. I would like to ask ZGC developers to review whether this can be created as an issue. Once it is created, how should I commit my modifications for it? Furthermore, how can I become an OpenJDK author? Please help to review this patch. Any feedback is appreciated. Thanks, Yifan Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: From yifanzhang765 at gmail.com Fri Oct 10 02:44:04 2025 From: yifanzhang765 at gmail.com (yifan zhang) Date: Fri, 10 Oct 2025 10:44:04 +0800 Subject: [PATCH] JDK-23: Fix for relocation set selecting in ZGC In-Reply-To: References: Message-ID: Hi, I apologize for the errors in my wording in the previous email. What I wrote as ?fail to promote objects? in my previous email is not related to the bug I meant to report.(I made a typo here when sending the email.) The real issue I want to report is the one in ZGC?s relocation set selection. I?ll now give a more detailed description of the related bug. * Problem: The main problem occurs in the function "ZRelocationSetSelectorGroup::select_inner". This function sorts the pages in "_live_pages", iterates through them, and finally adds all pages with indexes smaller than "selected_from" to the relocation set. Observing the loop that traverses the pages, we can see that "selected_from" is modified only when the condition "diff_reclaimable > _fragmentation_limit" is met. In the same branch, "npages_selected" and "selected_live_bytes" are also increased by the pages processed in the current iteration. For example, after sorting, the top three pages are page1, page2, and page3. When iterating to page3, the condition branch "diff_reclaimable > _fragmentation_limit" is entered. At this point, it means that page1, page2, and page3 have all been added to the relocation set, but the count of live pages in the corresponding statistics only records page3. The statistical error here is most directly reflected in the logs. When I enable the virtual machine option -Xlog:gc*,gc+reloc=trace:.. , the number of Selected pages in the logs does not match the number of original pages, which I specifically tracked in the virtual machine, that belong to the objects already relocated. The patch I submitted is intended to fix this bug. The additional attachment this time is the related log. (In the attached log, the "Selected" value is shown as 2, but in fact, the relocated objects in small pages come from three different pages, which means the relocation set should be 3.) Thanks, Yifan Zhang On Thu, Oct 9, 2025 at 5:56?PM Axel Boldt-Christmas < axel.boldt-christmas at oracle.com> wrote: > Hi, > > When you say that this causes ZGC to fail promotion. Do you mean that we > do not select pages causing ?memory leaks? because of fragmentation which > larger than the ZYoungCompactionLimit? (And did you also mean old to old > relocation selection?) > > The selection code was taken from non-generational ZGC. So it also a bit > inaccurate because it does not understand that promotions can only age the > an object by one and we track age on a heap region basis. > > I have a couple of branches where I worked on prototypes for improving > this: > https://github.com/openjdk/jdk/compare/master...xmas92:jdk:relocation_set_selection_cleanups > (Which also aims to fixe the mismatch in the logs which you have observed.) > > I was aware that our logic is fuzzy and that we have a mismatch in our > logging. But I did not think we ?failed to promote objects?. Maybe you can > elaborate what you mean by this. > > Thanks in advance, > Axel Boldt-Christmas > > > On 9 Oct 2025, at 11:16, yifan zhang wrote: > > Hi ZGC Developers, > > I've encountered an issue where ZGC might fail to promote objects under a specific condition, leading to a memory leak. This patch aims to fix it. > > * Problem: > The number of selected pages in the ZGC log is incorrect. > > * Fix: > The root cause lies in the execution of function ZRelocationSetSelectorGroup::select_inner, where the count of selected pages only includes the pages that satisfy the condition diff_reclaimable > _fragmentation_limit, while ignoring the previously accumulated pages. > This results in a mismatch between the number of pages actually reclaimed during relocation and the count reflected in the log. > I have attached my modifications in the attachment. > > * Testing: > I've tested this fix by jtreg. > > I would like to ask ZGC developers to review whether this can be created as an issue. Once it is created, how should I commit my modifications for it? Furthermore, how can I become an OpenJDK author? > Please help to review this patch. Any feedback is appreciated. > > Thanks, > Yifan Zhang > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: zgc_fix.patch Type: application/octet-stream Size: 4021 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: related_log.log Type: application/octet-stream Size: 526 bytes Desc: not available URL: