Very slow promotion failures in ParNew / ParallelGC
Jon Masamitsu
jon.masamitsu at oracle.com
Mon Jan 11 21:10:49 UTC 2016
Tony,
We'd be interested in the fix for 1). I'll have to go look at more code
before having a definite opinion on 2) but the way you describe it
makes it sound like something worth doing. Similarly with 3).
Jon
On 01/11/2016 09:59 AM, Tony Printezis wrote:
> Hi all,
>
> We have been recently investigating some very lengthy (several
> minutes) promotion failures in ParNew, which also appear in
> ParallelGC. We have identified a few issues and have some fixes to
> address them. Here's a quick summary:
>
> 1) There's a scalability bottleneck when adding marks to the preserved
> mark stack as there is only one stack, shared by all workers, and
> pushes to it are protected by a mutex. This essentially serializes all
> workers if there is a non-trivial amount of marks to be preserved. The
> fix is similar to what's been implemented in G1 in JDK 9, which is to
> introduce per-worker preserved mark stacks.
>
> 2) (More interestingly) I was perplexed by the huge number of marks
> that I see getting preserved during promotion failure. I did a small
> study with a test I can reproduce the issue with. The majority of the
> preserved marks were 0x5 (i.e. "anonymously biased"). According to the
> current logic, no mark is preserved if it's biased, presumably because
> it's assumed that the object is biased towards a specific thread and
> we want to preserve that mark as it contains the thread pointer. The
> fix is to use a different default mark value when biased locking is
> enabled (0x5) or disabled (0x1, as it is now). During promotion
> failures, marks are not preserved if they are equal to the default
> value and the mark of forwarded objects is set to the default value
> post promotion failure and before the preserved marks are re-instated.
>
> A few extra observations on this:
>
> - I don't know if the majority of objects we'll come across during
> promotion failures will be anonymously biased (it is the case for
> synthetic benchmarks). So, the above might pay off in certain cases
> but not all. But I think it's still worth doing.
>
> - Even though the per-worker preserved mark stacks eliminate the big
> scalability bottleneck, reducing (potentially dramatically) the number
> of marks that are preserved helps in a couple of ways: a) avoids
> allocating a lot of memory for the preserved mark stacks (which can
> get very, very large in some cases) and b) avoids having to scan /
> reclaim the preserved mark stacks post promotion failure, which
> reduces the overall GC time further. Even the parallel time in ParNew
> improves by a bit because there are a lot fewer stack pushes and
> malloc calls.
>
> 3) In the case where lots of marks need to be preserved, we found that
> using 64K stack segments, instead of 4K segments, speeds up the
> preserved mark stack reclamation by a non-trivial amount (it's 3x/4x
> faster).
>
> We have fixes for all three issues above for ParNew. We're also going
> to implement them for ParallelGC. For JDK 9, 1) is already
> implemented, but 2) or 3) might also be worth doing.
>
> Is there interest in these changes?
>
> Tony
>
>
> -----
>
> Tony Printezis | JVM/GC Engineer / VM Team | Twitter
>
> @TonyPrintezis
> tprintezis at twitter.com <mailto:tprintezis at twitter.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20160111/ac5174e4/attachment.htm>
More information about the hotspot-gc-dev
mailing list