RFR: 8200557: OopStorage parallel iteration scales poorly

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Wed Apr 25 21:30:42 UTC 2018


http://cr.openjdk.java.net/~kbarrett/8200557/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.udiff.html

+OopStorage::BasicParState::IterationData::IterationData() :
+ _segment_start(0),
+ _segment_end(0),
+ _processed(0)
+{}


Can this constructor be inlined in the declaration?  This seems out of 
place here (and I had to hunt for it).

Can some other simple accessors be inlined in the declarations as well?  
That would decrease jumping around trying to understand things.

I think the OopStorage::BasicParState functions should be in a new .cpp 
file since there's now *src/hotspot/share/gc/shared/oopStorageParState.hpp
*Can you do this as a follow up/cleanup RFE?

http://cr.openjdk.java.net/~kbarrett/8200557/open.00/src/hotspot/share/gc/shared/oopStorage.hpp.udiff.html

    class Block;                  // Forward decl; defined in .inline.hpp file.
+ class BlockArray; // Forward decl; defined in .inline.hpp file.
    class BlockList;              // Forward decl for BlockEntry friend decl.

Can you give a short description of these, ie:

    class Block;                  // Block of oop* stored in OopStorage
+ class BlockArray; // Storage for all active oop Blocks
    class BlockList;              // List of oop Blocks from which are available for allocation

Or maybe a short description over the class declarations?  The trouble 
with these generic names is you forget why they exist.

I'd call BlockList a AllocationBlockList and BlockArray an ActiveBlockArray

http://cr.openjdk.java.net/~kbarrett/8200557/open.00/src/hotspot/share/gc/shared/oopStorage.inline.hpp.frames.html

38 class OopStorage::BlockArray {


This would be a good place to comment the purpose of this and a line 
about refcounting.

http://cr.openjdk.java.net/~kbarrett/8200557/open.00/src/hotspot/share/gc/shared/oopStorage.hpp.frames.html

Why does BlockList need to be included in the .hpp file?

http://cr.openjdk.java.net/~kbarrett/8200557/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.frames.html

All these places have block_count() which are indistinguishable. It 
looks like _block_count for the BlockArray should always be acquired 
outside of BlockArray class or rather give that as the only public 
option to prevent mistakes.  It seems like there wouldn't be much 
performance cost to calling load_acquire once at the safepoint iteration.

Comment about _why_ refcounts and what do they do here would be nice.

151 void OopStorage::BlockArray::increment_refcount() const {


567 void OopStorage::relinquish_block_array(BlockArray* array) const {
568 if (array->decrement_refcount()) {
569 assert(array != _active_array, "invariant");
570 BlockArray::destroy(array);
571 }
572 }


Here array is a pointer to the active array so iterators could destroy 
the array if they are the last ones to use it.  Is this right?

I've read through most of the rest of the code and it looks good apart 
from how complicated it is, which I don't know how to suggest to improve 
it other than some reminder comments and maybe more descriptive names.

thanks,
Coleen


On 4/19/18 4:18 AM, Kim Barrett wrote:
> Please review this change to OopStorage parallel iteration to improve
> the scaling with additional threads.
>
> Two sources of poor scaling were found: (1) contention when claiming
> blocks, and (2) each worker thread ended up touching the majority of
> the blocks, even those not processed by that thread.
>
> To address this, we changed the representation of the sequence of all
> blocks.  Rather than being a doubly-linked intrusive list linked
> through the blocks, it is now an array of pointers to blocks.  We use
> a combination of refcounts and an RCU-inspired mechanism to safely
> manage the array storage when it needs to grow, avoiding the need to
> lock access to the array while performing concurrent iteration.
>
> The use of an array for the sequence of all blocks permits parallel
> iteration to claim ranges of indices using Atomic::add, which can be
> more efficient on some platforms than using cmpxchg loops.  It also
> allows a worker thread to only touch exactly those blocks it is going
> to process, rather than walking a list of blocks.  The only
> complicating factor is that we have to account for possible overshoot
> in a claim attempt.
>
> Blocks know their position in the array, to facilitate empty block
> deletion (an empty block might be anywhere in the active array, and we
> don't want to have to search for it).  This also helps with
> allocation_status, eliminating the verification search that was needed
> with the list representation.  allocation_status is now constant-time,
> which directly benefits -Xcheck:jni.
>
> A new gtest-based performance demonstration is included. It's not
> really a test, in that it doesn't do any verification.  Rather, it
> performs parallel iteration and reports total time, per-thread times,
> and per-thread percentage of blocks processed.  This is done for a
> variety of thread counts, to show the parallel speedup and load
> balancing.  Running on my dual 6 core Xeon, I'm seeing more or less
> linear speedup for up to 10 threads processing 1M OopStorage entries.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8200557
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8200557/open.00/
>
> Testing:
> jdk-tier{1-3}, hs-tier{1-5}, on all Oracle supported platforms
>



More information about the hotspot-dev mailing list