<div dir="ltr">Hi,<div><span style="white-space:unset"><br></span></div><div><span style="white-space:unset">I apologize for the errors in my wording in the previous email. </span><span style="white-space:unset">What I wrote as “fail to promote objects” in my previous email is not related to the bug I meant to report.(</span><span style="white-space:unset">I made a typo here when sending the email.</span><span style="white-space:unset">) </span></div><div><span style="white-space:unset"><br></span></div><div><span style="white-space:unset">The real issue I want to report is the one in ZGC’s relocation set selection. </span><span style="white-space:unset">I’ll now give a more detailed description of the related bug.</span></div><div><span style="white-space:unset"><br></span></div><div><span style="color:rgb(0,0,0);font-family:monospace;font-size:14px">* Problem:</span><span style="white-space:unset"></span></div><div><span style="color:rgb(0,0,0);font-family:monospace;font-size:14px"> </span><span style="white-space:unset">The main problem occurs in the function </span><span style="white-space:unset">"ZRelocationSetSelectorGroup::select_inner". </span><span style="white-space:unset">This function sorts the pages in "_live_pages", </span><span style="white-space:unset">iterates through them, and finally adds all pages with indexes smaller than "selected_from" </span><span style="white-space:unset">to the relocation set.</span></div><div><span style="white-space:unset"> </span><span style="white-space:unset">Observing the loop that traverses the pages, we can see that "selected_from" </span><span style="white-space:unset">is modified only when the condition "</span><span style="white-space:unset">diff_reclaimable > _fragmentation_limit</span><span style="white-space:unset">" </span><span style="white-space:unset">is met. In the same branch, "</span><span style="white-space:unset">npages_selected" and "selected_live_bytes</span><span style="white-space:unset">" </span><span style="white-space:unset">are also increased by the pages processed in the current iteration.</span></div><div><span style="white-space:unset"> </span><span style="white-space:unset">For example, after sorting, the top three pages are page1, page2, and page3. When iterating to page3, the condition branch "</span><span style="white-space:unset">diff_reclaimable > _fragmentation_limit</span><span style="white-space:unset">" </span><span style="white-space:unset">is entered. At this point, it means that page1, page2, and page3 have all been added to the relocation set, but the count of live pages in the corresponding statistics only records page3.</span></div><div><div class="gmail-Message_messageTextContainer__w64Sc"><div class="gmail-Message_selectableText__SQ8WH"><div class="gmail-Markdown_markdownContainer__Tz3HQ"><div class="gmail-Prose_prose__7AjXb gmail-Prose_presets_prose__H9VRM gmail-Prose_presets_theme-hi-contrast__LQyM9"><p>The statistical error here is most directly reflected in the logs. When I enable the virtual machine option
-Xlog:gc*,gc+reloc=trace:.. , the number of Selected pages in the logs does not match the number of original pages, which I specifically tracked in the virtual machine, that belong to the objects already relocated.</p><p>The patch I submitted is intended to fix this bug. The additional attachment this time is the related log. (<span style="white-space:unset">In the attached log, the "Selected" </span><span style="white-space:unset">value is shown as 2, but in fact, the relocated objects in small pages come from three different pages, which means the relocation set should be 3.</span><span style="white-space:unset">)</span></p><p>Thanks,<br>Yifan Zhang</p></div></div></div></div><div class="gmail-Message_messageMetadataContainer__nBPq7"><span class="gmail-Message_messageMetadataText__FxY5_"></span></div></div><div class="gmail-Message_messageMetadataContainer__nBPq7"><span class="gmail-Message_messageMetadataText__FxY5_"></span></div><div><span style="color:rgb(0,0,0);font-family:monospace;font-size:14px"><br></span></div><div><br></div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Oct 9, 2025 at 5:56 PM Axel Boldt-Christmas <<a href="mailto:axel.boldt-christmas@oracle.com">axel.boldt-christmas@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
Hi,
<div><br>
</div>
<div>When you say that this causes ZGC to fail promotion. Do you mean that we do not select pages causing “memory leaks” because of fragmentation which larger than the ZYoungCompactionLimit? (And did you also mean old to old relocation selection?)</div>
<div><br>
</div>
<div>The selection code was taken from non-generational ZGC. So it also a bit inaccurate because it does not understand that promotions can only age the an object by one and we track age on a heap region basis. </div>
<div><br>
</div>
<div>I have a couple of branches where I worked on prototypes for improving this: <a href="https://github.com/openjdk/jdk/compare/master...xmas92:jdk:relocation_set_selection_cleanups" target="_blank">https://github.com/openjdk/jdk/compare/master...xmas92:jdk:relocation_set_selection_cleanups</a></div>
<div>(Which also aims to fixe the mismatch in the logs which you have observed.)</div>
<div><br>
</div>
<div>I was aware that our logic is fuzzy and that we have a mismatch in our logging. But I did not think we “failed to promote objects”. Maybe you can elaborate what you mean by this.</div>
<div><br>
</div>
<div>Thanks in advance,</div>
<div>Axel Boldt-Christmas</div>
<div><br id="m_8021705863311350947lineBreakAtBeginningOfMessage">
<div><br>
<blockquote type="cite">
<div>On 9 Oct 2025, at 11:16, yifan zhang <<a href="mailto:yifanzhang765@gmail.com" target="_blank">yifanzhang765@gmail.com</a>> wrote:</div>
<br>
<div>
<div dir="ltr">
<div style="font-size:14px;font-family:-apple-system,BlinkMacSystemFont,system-ui,"PingFang SC","Microsoft YaHei UI","Microsoft YaHei","Source Han Sans CN","Noto Sans CJK SC",sans-serif">
<span style="font-family:monospace;white-space:pre-wrap">Hi ZGC Developers,</span></div>
<div style="font-size:14px;color:rgb(19,24,29);line-height:1.43;word-break:normal;min-height:100px;padding:20px 0px;box-sizing:border-box;overflow:auto hidden;font-family:-apple-system,BlinkMacSystemFont,system-ui,"PingFang SC","Microsoft YaHei UI","Microsoft YaHei","Source Han Sans CN","Noto Sans CJK SC",sans-serif">
<div style="min-height:526px">
<pre style="margin:0px;line-height:1.43"><div style="margin:0px;line-height:1.43"><span>I've encountered an issue where ZGC might fail to promote objects under a specific condition, leading to a memory leak. This patch aims to fix it.
* Problem:
The number of selected pages in the ZGC log is incorrect.
* Fix:
The root cause lies in the execution of function ZRelocationSetSelectorGroup::select_inner, where the count of selected pages only includes the pages that satisfy the condition <code>diff_reclaimable > _fragmentation_limit</code>, while ignoring the previously accumulated pages. </span></div><div style="margin:0px;line-height:1.43"><span> This results in a mismatch between the number of pages actually reclaimed during relocation and the count reflected in the log.</span></div><div style="margin:0px;line-height:1.43"><span> I have attached my modifications in the attachment.
* Testing:
I've tested this fix by jtreg.</span></div><p style="margin:0px;line-height:1.43"><span>
</span></p><div style="margin:0px;line-height:1.43"><span>I would like to ask ZGC developers to review whether this can be created as an issue. Once it is created, how should I commit my modifications for it? Furthermore, how can I become an OpenJDK author?
Please help to review this patch. Any feedback is appreciated.
Thanks,
</span></div><div style="line-height:1.43">Yifan Zhang</div></pre>
</div>
</div>
</div>
<span id="m_8021705863311350947cid:f_mgj7e89f0"><zgc_fix.patch></span></div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote></div>