<div dir="ltr">Erik,<div><br></div><div>I tend to agree with you that this seems like a good solution to the current problem at hand, irrespective of when/if G1 fully supplants CMS. Given that similar mechanism is used for safepointing, I don't think this introduces some completely new construct that nobody has yet seen in Hotspot. However, this is obviously not my decision to make :).</div><div><br></div><div>Given that you have DaCapo benchmarks set up, have you tried benching Andrew's storeload proposal? Would be interesting to see if anything's revealed there.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 12, 2015 at 1:23 PM, Erik Österlund <span dir="ltr"><<a href="mailto:erik.osterlund@lnu.se" target="_blank">erik.osterlund@lnu.se</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">
Hi Mikael and Andrew,
<div><br>
</div>
<div>Unless I missed something, I don’t think we introduce that much code complexity.</div>
<div>Of course I agree that G1 will make fixes in CMS a bit wasted in the long run.</div>
<div>However, until then it would be good if CMS still works. And a few lines shared code (handful for the actual GC) seems, to me, both less painful from an engineering point of view and better performant than going through all mutator code paths
that need changing (interpreter, c1, c2, for potentially many architectures).</div>
<div><br>
</div>
<div>Out of curiosity I patched the thing and my fix can be found here: <a href="http://cr.openjdk.java.net/~eosterlund/8079315/webrev.v1/" target="_blank">http://cr.openjdk.java.net/~eosterlund/8079315/webrev.v1/</a></div>
<div><br>
</div>
<div>Fortunately it looks like CMS is already batching cards pretty well for me so the change turned out to be very small. I logged to see how often this global fence is triggered and it’s very rare so I feel quite convinced it won’t impact performance
negatively even on “that guy’s” machine and with a terrible OS implementation.</div>
<div><br>
</div>
<div>I benchmarked it using DaCapo benchmarks locally on my computer (macbook x86_64 BSD) and there were no traces of any performance artefacts/regression.</div>
<div><br>
</div>
<div>If anyone happens to have a larger machine than my macbook, it would be interesting to take it for a spin. ;)</div>
<div><br>
</div>
<div>Disclaimer: I haven’t poked around a lot in CMS in the past, so I hope I didn’t miss any important card value transitions!</div>
<div><br>
</div>
<div>Thanks,</div>
<div>/Erik</div><div><div class="h5">
<div>
<div><br>
<div>
<blockquote type="cite">
<div>On 12 May 2015, at 14:17, Mikael Gerdin <<a href="mailto:mikael.gerdin@oracle.com" target="_blank">mikael.gerdin@oracle.com</a>> wrote:</div>
<br>
<div><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">On
2015-05-12 15:05, Aleksey Shipilev wrote:</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
On 11.05.2015 16:41, Andrew Haley wrote:<br>
<blockquote type="cite">On 05/11/2015 12:33 PM, Erik Österlund wrote:<br>
<blockquote type="cite">Hi Andrew,<br>
<br>
<blockquote type="cite">On 11 May 2015, at 11:58, Andrew Haley <<a href="mailto:aph@redhat.com" target="_blank">aph@redhat.com</a>> wrote:<br>
<br>
On 05/11/2015 11:40 AM, Erik Österlund wrote:<br>
<br>
<blockquote type="cite">I have heard statements like this that such mechanism would not work<br>
on RMO, but never got an explanation why it would work only on<br>
TSO. Could you please elaborate? I studied some kernel sources for<br>
a bunch of architectures and kernels, and it seems as far as I can<br>
see all good for RMO too.<br>
</blockquote>
<br>
Dave Dice himself told me that the algorithm is not in general safe<br>
for non-TSO. Perhaps, though, it is safe in this particular case. Of<br>
course, I may be misunderstanding him. I'm not sure of his reasoning<br>
but perhaps we should include him in this discussion.<br>
</blockquote>
<br>
I see. It would be interesting to hear his reasoning, because it is<br>
not clear to me.<br>
<br>
<blockquote type="cite">From my point of view, I can't see a strong argument for doing this on<br>
AArch64. StoreLoad barriers are not fantastically expensive there so<br>
it may not be worth going to such extremes. The cost of a StoreLoad<br>
barrier doesn't seem to be so much more than the StoreStore that we<br>
have to have anyway.<br>
</blockquote>
<br>
Yeah about performance I’m not sure when it’s worth removing these<br>
fences and on what hardware.<br>
</blockquote>
<br>
Your algorithm (as I understand it) trades a moderately expensive (but<br>
purely local) operation for a very expensive global operation, albeit<br>
with much lower frequency. It's not clear to me how much we value<br>
continuous operation versus faster operation with occasional global<br>
stalls. I suppose it must be application-dependent.<br>
</blockquote>
<br>
Okay, Dice's asymmetric trick is nice. In fact, that is arguably what<br>
Parallel is using already: it serializes the mutator stores by stopping<br>
the mutator at safepoint. Using mprotect and TLB tricks as the<br>
serialization actions is cute and dandy.<br>
<br>
However, I have doubts that employing the system-wide synchronization<br>
mechanism for concurrent collector is a good thing, when we can't<br>
predict and control the long-term performance of it. For example, we are<br>
basically coming at the mercy of underlying OS performance with mprotect<br>
calls. There are industrial GCs that rely on OS performance (*cough*<br>
*cough*), you can see what do those require to guarantee performance.<br>
</blockquote>
<br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">Just
to be clear, this type of synchronization is in fact already implemented in the JVM to synchronize thread states for the safepoint protocol, so it's not exactly new and unexplained territory.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">However
it's not clear to me that the code complexity involved with using that type of synchronization for conditional card marking in CMS is worth it.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br>
Also, given the problem is specific to CMS that arguably goes away in<br>
favor of G1, I would think introducing special-case-for-CMS barriers in<br>
mutator code is a sane interim solution.<br>
</blockquote>
<br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">I
agree.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br>
Especially if we can backport the G1-like barrier "filtering" in CMS<br>
case? If I read this thread right, Erik and Thomas concluded there is no<br>
clear benefit of introducing the mprotect-like mechanics with G1, which<br>
probably means the overheads are bearable with appropriate mutator-side<br>
changes.<br>
</blockquote>
<br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">I
don't think it would be easy to implement barrier "filtering" in CMS.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">Keep
in mind that even before the storeload was added to G1's barriers they were fairly heavy-weight. CMS' barriers are not, if we start to add conditionals and storeload barriers to them the runtime overhead may increase more than what it did when we added the
storeload to G1.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">/Mikael</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br>
Thanks,<br>
-Aleksey</blockquote>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div></div></div>
</blockquote></div><br></div>