RFR: 8200557: OopStorage parallel iteration scales poorly
Kim Barrett
kim.barrett at oracle.com
Wed Apr 25 15:02:42 UTC 2018
Anyone looking at this yet?
> On Apr 19, 2018, at 4:18 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
>
> Please review this change to OopStorage parallel iteration to improve
> the scaling with additional threads.
>
> Two sources of poor scaling were found: (1) contention when claiming
> blocks, and (2) each worker thread ended up touching the majority of
> the blocks, even those not processed by that thread.
>
> To address this, we changed the representation of the sequence of all
> blocks. Rather than being a doubly-linked intrusive list linked
> through the blocks, it is now an array of pointers to blocks. We use
> a combination of refcounts and an RCU-inspired mechanism to safely
> manage the array storage when it needs to grow, avoiding the need to
> lock access to the array while performing concurrent iteration.
>
> The use of an array for the sequence of all blocks permits parallel
> iteration to claim ranges of indices using Atomic::add, which can be
> more efficient on some platforms than using cmpxchg loops. It also
> allows a worker thread to only touch exactly those blocks it is going
> to process, rather than walking a list of blocks. The only
> complicating factor is that we have to account for possible overshoot
> in a claim attempt.
>
> Blocks know their position in the array, to facilitate empty block
> deletion (an empty block might be anywhere in the active array, and we
> don't want to have to search for it). This also helps with
> allocation_status, eliminating the verification search that was needed
> with the list representation. allocation_status is now constant-time,
> which directly benefits -Xcheck:jni.
>
> A new gtest-based performance demonstration is included. It's not
> really a test, in that it doesn't do any verification. Rather, it
> performs parallel iteration and reports total time, per-thread times,
> and per-thread percentage of blocks processed. This is done for a
> variety of thread counts, to show the parallel speedup and load
> balancing. Running on my dual 6 core Xeon, I'm seeing more or less
> linear speedup for up to 10 threads processing 1M OopStorage entries.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8200557
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8200557/open.00/
>
> Testing:
> jdk-tier{1-3}, hs-tier{1-5}, on all Oracle supported platforms
More information about the hotspot-dev
mailing list