<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<font face="Times New Roman, Times, serif">Tony,<br>
<br>
We'd be interested in the fix for 1).  I'll have to go look at
more code<br>
before having a definite opinion on 2) but the way you describe
it<br>
makes it sound like something worth doing.  Similarly with 3).<br>
<br>
Jon<br>
<br>
</font><br>
<div class="moz-cite-prefix">On 01/11/2016 09:59 AM, Tony Printezis
wrote:<br>
</div>
<blockquote cite="mid:etPan.5693ed80.4a73d629.58e@tw-mbp-tprintezis"
type="cite">
<style>body{font-family:Helvetica,Arial;font-size:13px}</style>
<div id="bloop_customfont"
style="font-family:Helvetica,Arial;font-size:13px; color:
rgba(0,0,0,1.0); margin: 0px; line-height: auto;">
<div id="bloop_customfont" style="margin: 0px;">Hi all,</div>
<div id="bloop_customfont" style="margin: 0px;"><br>
</div>
<div id="bloop_customfont" style="margin: 0px;">We have been
recently investigating some very lengthy (several minutes)
promotion failures in ParNew, which also appear in ParallelGC.
We have identified a few issues and have some fixes to address
them. Here's a quick summary:</div>
<div id="bloop_customfont" style="margin: 0px;"><br>
</div>
<div id="bloop_customfont" style="margin: 0px;">1) There's a
scalability bottleneck when adding marks to the preserved mark
stack as there is only one stack, shared by all workers, and
pushes to it are protected by a mutex. This essentially
serializes all workers if there is a non-trivial amount of
marks to be preserved. The fix is similar to what's been
implemented in G1 in JDK 9, which is to introduce per-worker
preserved mark stacks.</div>
<div id="bloop_customfont" style="margin: 0px;"><br>
</div>
<div id="bloop_customfont" style="margin: 0px;">2) (More
interestingly) I was perplexed by the huge number of marks
that I see getting preserved during promotion failure. I did a
small study with a test I can reproduce the issue with. The
majority of the preserved marks were 0x5 (i.e. "anonymously
biased"). According to the current logic, no mark is preserved
if it's biased, presumably because it's assumed that the
object is biased towards a specific thread and we want to
preserve that mark as it contains the thread pointer. The fix
is to use a different default mark value when biased locking
is enabled (0x5) or disabled (0x1, as it is now). During
promotion failures, marks are not preserved if they are equal
to the default value and the mark of forwarded objects is set
to the default value post promotion failure and before the
preserved marks are re-instated.</div>
<div id="bloop_customfont" style="margin: 0px;"><br>
</div>
<div id="bloop_customfont" style="margin: 0px;">A few extra
observations on this:</div>
<div id="bloop_customfont" style="margin: 0px;"><br>
</div>
<div id="bloop_customfont" style="margin: 0px;">- I don't know
if the majority of objects we'll come across during promotion
failures will be anonymously biased (it is the case for
synthetic benchmarks). So, the above might pay off in certain
cases but not all. But I think it's still worth doing.</div>
<div id="bloop_customfont" style="margin: 0px;"><br>
</div>
<div id="bloop_customfont" style="margin: 0px;">- Even though
the per-worker preserved mark stacks eliminate the big
scalability bottleneck, reducing (potentially dramatically)
the number of marks that are preserved helps in a couple of
ways: a) avoids allocating a lot of memory for the preserved
mark stacks (which can get very, very large in some cases) and
b) avoids having to scan / reclaim the preserved mark stacks
post promotion failure, which reduces the overall GC time
further. Even the parallel time in ParNew improves by a bit
because there are a lot fewer stack pushes and malloc calls.</div>
<div id="bloop_customfont" style="margin: 0px;"><br>
</div>
<div id="bloop_customfont" style="margin: 0px;">3) In the case
where lots of marks need to be preserved, we found that using
64K stack segments, instead of 4K segments, speeds up the
preserved mark stack reclamation by a non-trivial amount (it's
3x/4x faster).</div>
<div id="bloop_customfont" style="margin: 0px;"><br>
</div>
<div id="bloop_customfont" style="margin: 0px;">We have fixes
for all three issues above for ParNew. We're also going to
implement them for ParallelGC. For JDK 9, 1) is already
implemented, but 2) or 3) might also be worth doing.</div>
<div id="bloop_customfont" style="margin: 0px;"><br>
</div>
<div id="bloop_customfont" style="margin: 0px;">Is there
interest in these changes?</div>
<div id="bloop_customfont" style="margin: 0px;"><br>
</div>
<div id="bloop_customfont" style="margin: 0px;">Tony</div>
<div><br>
</div>
</div>
<br>
<div id="bloop_sign_1452534850323611136" class="bloop_sign">
<div style="font-family:helvetica,arial;font-size:13px">
<div>-----</div>
<div><br>
</div>
<div>Tony Printezis | JVM/GC Engineer / VM Team | Twitter</div>
<div><br>
</div>
<div>@TonyPrintezis</div>
<div><a moz-do-not-send="true"
href="mailto:tprintezis@twitter.com">tprintezis@twitter.com</a></div>
<div><br>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>