<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Thomas,</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Thanks for the reply. Inline.</div> <br><p class="airmail_on">On January 13, 2016 at 5:08:04 AM, Thomas Schatzl (<a href="mailto:thomas.schatzl@oracle.com">thomas.schatzl@oracle.com</a>) wrote:</p> <div><blockquote type="cite" class="clean_bq" style="font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span><div><div></div><div>Hi,<span class="Apple-converted-space"> </span><br><br>On Tue, 2016-01-12 at 13:15 -0500, Tony Printezis wrote:<span class="Apple-converted-space"> </span><br>> Thomas,<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Inline.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> On January 12, 2016 at 7:00:45 AM, Thomas Schatzl (<span class="Apple-converted-space"> </span><br>> thomas.schatzl@oracle.com) wrote:<span class="Apple-converted-space"> </span><br>> ><span class="Apple-converted-space"> </span><br>[...]<span class="Apple-converted-space"> </span><br>> ><span class="Apple-converted-space"> </span><br>> > > The fix is to use a different default mark value when biased<span class="Apple-converted-space"> </span><br>> > > locking is enabled (0x5) or disabled (0x1, as it is now). During<span class="Apple-converted-space"> </span><br>> > > promotion failures, marks are not preserved if they are equal to<span class="Apple-converted-space"> </span><br><br>> > > the default value and the mark of forwarded objects is set to the<span class="Apple-converted-space"> </span><br>> > > default value post promotion failure and before the preserved<span class="Apple-converted-space"> </span><br>> > > marks are re-instated.<span class="Apple-converted-space"> </span><br>> ><span class="Apple-converted-space"> </span><br>> > You mean the value of the mark as it is set during promotion<span class="Apple-converted-space"> </span><br>> > failure for the new objects?<span class="Apple-converted-space"> </span><br>> Not sure what you mean by “for new objects”.<span class="Apple-converted-space"> </span><br>> Current state: When we encounter promotion failures, we check whether<span class="Apple-converted-space"> </span><br>> the mark is the default (0x1). If it is, we don’t preserve it. If it<span class="Apple-converted-space"> </span><br>> is not, we preserve it. After promotion failure, we iterate over the<span class="Apple-converted-space"> </span><br>> young gen and set the mark of all objects (ParNew) or all forwarded<span class="Apple-converted-space"> </span><br>> objects (ParallelGC) to the default (0x1), then apply all preserved<span class="Apple-converted-space"> </span><br>> marks.<span class="Apple-converted-space"> </span><br>> What I’m proposing is that in the process I just described, the<span class="Apple-converted-space"> </span><br>> default mark will be 0x5, if biased locking is enabled (as most<span class="Apple-converted-space"> </span><br>> objects will be expected to have a 0x5 mark) and 0x1, if biased<span class="Apple-converted-space"> </span><br>> locking is disabled (as it is the case right now).<span class="Apple-converted-space"> </span><br><br>As you mentioned, the default value for new objects is typically not<span class="Apple-converted-space"> </span><br>0x1 when biased locking is enabled, but klass()->prototype_header().<span class="Apple-converted-space"> </span></div></div></span></blockquote></div><p><br></p><p>(OK, I now understand what you meant by “new objects”.) Indeed. But that’s not only the case for new objects. I’d guess that most objects will retain their initial mark? Maybe?</p><p><br></p><div><div><blockquote type="cite" class="clean_bq" style="font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span><div><div><br>Then (as we agree) the promotion failure code only needs to remember<span class="Apple-converted-space"> </span><br>the non-default mark values for later restoring.<span class="Apple-converted-space"> </span></div></div></span></blockquote></div><p><br></p><p>Indeed.</p><p><br></p><div><div><blockquote type="cite" class="clean_bq" style="font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span><div><div><br>One other "problem" seems to be that some evacuation failure recovery<span class="Apple-converted-space"> </span><br>code unconditionally sets the header of the objects that failed<span class="Apple-converted-space"> </span><br>promotion but are not in the preserved headers list to 0x1....<span class="Apple-converted-space"> </span></div></div></span></blockquote></div><p><br></p><p>It’d be hard to do otherwise? You’d have to do a look-up on a table to see whether the object’s mark should be set to the default or a stored value. I think, assuming that most objects have a default mark word, setting the mark word of all (forwarded?) objects in the young gen to the default, then apply the (hopefully, small number of) preserved marks afterwards is not unreasonable.</p><p>FWIW, it’d be nice if we could completely avoid self-forwarding (and a lot of those problems will just go away…).</p><p><br></p><div><div><blockquote type="cite" class="clean_bq" style="font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span><div><div><br>> > When running without biased locking, the amount of preserved marks<span class="Apple-converted-space"> </span><br>> > is even lower.<span class="Apple-converted-space"> </span><br>> Of course, because the the most populous mark will be 0x1 when biased<span class="Apple-converted-space"> </span><br>> locking is disabled, not 0x5. The logic of whether to preserve a mark<span class="Apple-converted-space"> </span><br>> or not was taken before biased locking was introduced, when most<span class="Apple-converted-space"> </span><br>> objects would have a 0x1 mark. Biased locking changed this behavior<span class="Apple-converted-space"> </span><br>> and most objects have a 0x5 mark, which invalidated the original<span class="Apple-converted-space"> </span><br>> assumptions.<span class="Apple-converted-space"> </span><br><br>Yes.<span class="Apple-converted-space"> </span><br><br>> > That may be an option in some cases in addition to these suggested<span class="Apple-converted-space"> </span><br>> > changes.<span class="Apple-converted-space"> </span><br>> Not sure what you mean.<span class="Apple-converted-space"> </span><br><br>In some cases, a "fix" to long promotion failure times might be to<span class="Apple-converted-space"> </span><br>disable biased locking - because biased locking may not even be<span class="Apple-converted-space"> </span><br>advantageous in some cases due to its own overhead.<span class="Apple-converted-space"> </span></div></div></span></blockquote></div><p><br></p><p>Well, if biased locking doesn’t pay off for an application (and we do have evidence that biased locking might not pay off for our services), then I assume a lot of classes will end up being unbiased and their prototype header set to 0x1 which might prevent the high amount of marks being preserved issue.</p><p><br></p><div><div><blockquote type="cite" class="clean_bq" style="font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span><div><div><br>> > > - Even though the per-worker preserved mark stacks eliminate the<span class="Apple-converted-space"> </span><br>> > > big scalability bottleneck, reducing (potentially dramatically)<span class="Apple-converted-space"> </span><br>> > > the number of marks that are preserved helps in a couple of ways:<span class="Apple-converted-space"> </span><br><br>> > > a)<span class="Apple-converted-space"> </span><br>> > > avoids allocating a lot of memory for the preserved mark stacks<span class="Apple-converted-space"> </span><br>> > > (which can get very, very large in some cases) and b) avoids<span class="Apple-converted-space"> </span><br>> > > having to scan / reclaim the preserved mark stacks post promotion<span class="Apple-converted-space"> </span><br>> > > failure, which reduces the overall GC time further. Even the<span class="Apple-converted-space"> </span><br>> > > parallel time in ParNew improves by a bit because there are a<span class="Apple-converted-space"> </span><br>> > > lot fewer stack pushes<span class="Apple-converted-space"> </span><br>> > > and malloc calls.<span class="Apple-converted-space"> </span><br>> ><span class="Apple-converted-space"> </span><br>> > ... during promotion failure.<span class="Apple-converted-space"> </span><br>> Yes, I’m sorry I was not clear. ParNew times improve a bit when they<span class="Apple-converted-space"> </span><br>> encounter promotion failures.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> ><span class="Apple-converted-space"> </span><br>> > > 3) In the case where lots of marks need to be preserved, we found<span class="Apple-converted-space"> </span><br>> > > that using 64K stack segments, instead of 4K segments, speeds up<span class="Apple-converted-space"> </span><br><br>> > > the preserved mark stack reclamation by a non-trivial amount<span class="Apple-converted-space"> </span><br>> > > (it's 3x/4x faster).<span class="Apple-converted-space"> </span><br>> ><span class="Apple-converted-space"> </span><br>> > In my tests some time ago, increasing stack segment size only<span class="Apple-converted-space"> </span><br>> > helped a little, not 3x/4x times though as reported after<span class="Apple-converted-space"> </span><br>> > implementing the per-thread preserved stacks.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> To be clear: it’s only the reclamation of the preserved mark stacks<span class="Apple-converted-space"> </span><br>> I’ve seen improve by 3x/4x. Given all the extra work we have to do<span class="Apple-converted-space"> </span><br>> (remove forwarding references, apply preserved marks, etc.) this is a<span class="Apple-converted-space"> </span><br>> very small part of the GC when a promotion failure happens. But,<span class="Apple-converted-space"> </span><br>> still...<span class="Apple-converted-space"> </span><br><br>Okay, my fault, I was reading this as 3x/4x improvement of the entire<span class="Apple-converted-space"> </span><br>promotion failure recovery. Makes sense now.<span class="Apple-converted-space"> </span></div></div></span></blockquote></div><p><br></p><p>No problem!</p><p><br></p><div><div><blockquote type="cite" class="clean_bq" style="font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span><div><div><br>> > A larger segment size may be a better trade-off for current, larger<span class="Apple-converted-space"> </span><br><br>> > applications though.<span class="Apple-converted-space"> </span><br>> Is there any way to auto-tune the segment size? So, the larger the<span class="Apple-converted-space"> </span><br>> stack grows, the larger the segment size?<span class="Apple-converted-space"> </span><br><br>Could be done, however is not implemented yet. And of course the basic<span class="Apple-converted-space"> </span><br>promotion failure handling code is very different between the<span class="Apple-converted-space"> </span><br>collectors. Volunteers welcome :]<span class="Apple-converted-space"> </span></div></div></span></blockquote></div><p><br></p><p>I factored out some of the logic to a PreservedMarks class which can be re-used by all GCs to somewhat cut down on the code replication...</p><p><br></p><div><div><blockquote type="cite" class="clean_bq" style="font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span><div><div><br>If this is done, I would also somewhat think of trying to allocate<span class="Apple-converted-space"> </span><br>these per-thread blocks from even larger memory areas that can be<span class="Apple-converted-space"> </span><br>disposed even more quickly.<span class="Apple-converted-space"> </span></div></div></span></blockquote></div><p><br></p><p>Sure! </p><p><br></p><div><div><blockquote type="cite" class="clean_bq" style="font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span><div><div><br>> > > We have fixes for all three issues above for ParNew. We're also<span class="Apple-converted-space"> </span><br>> > > going<span class="Apple-converted-space"> </span><br>> > > to implement them for ParallelGC. For JDK 9, 1) is already<span class="Apple-converted-space"> </span><br>> > > implemented, but 2) or 3) might also be worth doing.<span class="Apple-converted-space"> </span><br>> > ><span class="Apple-converted-space"> </span><br>> > > Is there interest in these changes?<span class="Apple-converted-space"> </span><br>> OK, as I said to Jon, I’ll have the ParNew changes ported to JDK 9<span class="Apple-converted-space"> </span><br>> soon. Should I create a new CR per GC (ParNew and ParallelGC) for the<span class="Apple-converted-space"> </span><br>> per-worker preserved mark stacks and we’ll take it from there?<span class="Apple-converted-space"> </span><br><br>Please do.<span class="Apple-converted-space"> </span></div></div></span></blockquote></div><p><br></p><p>JDK-8146989 and JDK-8146991. I’ll post a webrev for the first one later today.</p><p>Tony</p><p><br></p><div><blockquote type="cite" class="clean_bq" style="font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span><div><div><br>Thanks,<span class="Apple-converted-space"> </span><br>Thomas<span class="Apple-converted-space"> </span><br><br><br></div></div></span></blockquote><br class="Apple-interchange-newline"></div><br class="Apple-interchange-newline"></div><br class="Apple-interchange-newline"></div><br class="Apple-interchange-newline"></div><br class="Apple-interchange-newline"></div><br class="Apple-interchange-newline"></div><br class="Apple-interchange-newline"></div><br class="Apple-interchange-newline"></div> <div id="bloop_sign_1452700719383450880" class="bloop_sign"><div style="font-family:helvetica,arial;font-size:13px"><div>-----</div><div><br></div><div>Tony Printezis | JVM/GC Engineer / VM Team | Twitter</div><div><br></div><div>@TonyPrintezis</div><div><a href="mailto:tprintezis@twitter.com">tprintezis@twitter.com</a></div><div><br></div></div></div></body></html>