8227226: Segmented array clearing for ZGC

Wed Aug 7 09:59:42 UTC 2019

Hi Ryan,

On 8/7/19 3:05 AM, Sciampacone, Ryan wrote:
> Although least intrusive, it goes back to some of the earlier complaints about using false in the constructor for do_zero.  It also makes a fair number of 

My earlier comment about this was not about passing false to the 
constructor, but the duplication of the _do_zero member, which I thought 
looked a bit odd. In this patch, this was avoided by separation these 
paths already in ZCollectedHeap::array_allocate().

> assumptions (and goes against the hierarchies intent) on initialization logic to hide in finish().  That said, I agree that is fairly clean - and definitely addresses the missed cases of the earlier webrev.
> 

We've had the same discussions here and concluded that we might want to 
restructure parts of MemAllocator to better accommodate this use case, 
but that overriding finish() seems ok for now. A patch to restructure 
MemAllocator could come later if we think it's needed.

> 2 things,
> 
> 1. Isn't the substitute_oop_array_klass() check too narrow?  It will only detect types Object[], and not any other type of reference array (such as String[]) ?  I believe there's a bug here (correct me if I'm wrong).

On the JVM level, Object[], String[] and int[][] all have the same 
Klass, so we should catch them all with this single check.

> 2. I'd want to see an assert() on the sizeof(long) == sizeof(void *) dependency.  I realize what code base this is in but it would be properly defensive.

Sounds good.

> 
> What does the reporting look like in this case?  Is the long[] type reported accepted?  I'm wondering if this depletes some of the simplicity.

By overriding finish(), the sampling/reporting remains correct and 
unaffected, as it will never see the intermediate long[].

Updated webrev:

http://cr.openjdk.java.net/~pliden/8227226/webrev.4

cheers,
Per

> 
> On 8/2/19, 6:13 AM, "hotspot-gc-dev on behalf of Per Liden" <hotspot-gc-dev-bounces at openjdk.java.net on behalf of per.liden at oracle.com> wrote:
> 
>      Did some micro-benchmarking (on a Xeon E5-2630) with various segment
>      sizes between 4K and 512K, and 64K seems to offer a good trade-off. For
>      a 1G array, the allocation time increases by ~1%, but in exchange the
>      worst case TTSP drops from ~280ms to ~0.6ms.
>      
>      Updated webrev using 64K:
>      
>      http://cr.openjdk.java.net/~pliden/8227226/webrev.3
>      
>      cheers,
>      Per
>      
>      On 8/2/19 11:11 AM, Per Liden wrote:
>      > Hi Erik,
>      >
>      > On 8/1/19 5:56 PM, Erik Osterlund wrote:
>      >> Hi Per,
>      >>
>      >> I like that this approach is unintrusive, does its thing at the right
>      >> abstraction layer, and also handles medium sized arrays.
>      >
>      > It even handles small arrays (i.e. arrays in small zpages) ;)
>      >
>      >> Looks good.
>      >
>      > Thanks! I'll test various segment sizes and see how that affects
>      > performance and TTSP.
>      >
>      > cheers,
>      > Per
>      >
>      >>
>      >> Thanks,
>      >> /Erik
>      >>
>      >>> On 1 Aug 2019, at 16:14, Per Liden <per.liden at oracle.com> wrote:
>      >>>
>      >>> Here's an updated webrev that should be complete, i.e. fixes the
>      >>> issues related to allocation sampling/reporting that I mentioned.
>      >>> This patch makes MemAllocator::finish() virtual, so that we can do
>      >>> our thing and install the correct klass pointer before the Allocation
>      >>> destructor executes. This seems to be the least intrusive why of
>      >>> doing this.
>      >>>
>      >>> http://cr.openjdk.java.net/~pliden/8227226/webrev.2
>      >>>
>      >>> This passed function testing, but proper benchmarking remains to be
>      >>> done.
>      >>>
>      >>> cheers,
>      >>> Per
>      >>>
>      >>>> On 7/31/19 7:19 PM, Per Liden wrote:
>      >>>> Hi,
>      >>>> I found some time to benchmark the "GC clears pages"-approach, and
>      >>>> it's fairly clear that it's not paying off. So ditching that idea.
>      >>>> However, I'm still looking for something that would not just do
>      >>>> segmented clearing of arrays in large zpages. Letting oop arrays
>      >>>> temporarily be typed arrays while it's being cleared could be an
>      >>>> option. I did a prototype for that, which looks like this:
>      >>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.1
>      >>>> There's at least one issue here, the code doing allocation sampling
>      >>>> will see that we allocated long arrays instead of oop arrays, so the
>      >>>> reporting there will be skewed. That can be addressed if we go down
>      >>>> this path. The code is otherwise fairly simple and contained. Feel
>      >>>> free to spot any issues.
>      >>>> cheers,
>      >>>> Per
>      >>>>> On 7/26/19 2:27 PM, Per Liden wrote:
>      >>>>> Hi Ryan & Erik,
>      >>>>>
>      >>>>> I had a look at this and started exploring a slightly different
>      >>>>> approach. Instead doing segmented clearing in the allocation path,
>      >>>>> we can have the concurrent GC thread clear pages when they are
>      >>>>> reclaimed and not do any clearing in the allocation path at all.
>      >>>>>
>      >>>>> That would look like this:
>      >>>>>
>      >>>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.0-base
>      >>>>>
>      >>>>> (I've had to temporarily comment out three lines of assert/debug
>      >>>>> code to make this work)
>      >>>>>
>      >>>>> The relocation set selection phase will now be tasked with some
>      >>>>> potentially expensive clearing work, so we'll want to make that
>      >>>>> part parallel also.
>      >>>>>
>      >>>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.0-parallel
>      >>>>>
>      >>>>> Moving this work from Java threads onto the concurrent GC threads
>      >>>>> means we will potentially prolong the RelocationSetSelection and
>      >>>>> Relocation phases. That might be a trade-off worth doing. In
>      >>>>> return, we get:
>      >>>>>
>      >>>>> * Faster array allocations, as there's now less work done in the
>      >>>>> allocation path.
>      >>>>> * This benefits all arrays, not just those allocated in large pages.
>      >>>>> * No need to consider/tune a "chunk size".
>      >>>>> * I also tend think we'll end up with slightly less complex code,
>      >>>>> that is a bit easier to reason about. Can be debated of course.
>      >>>>>
>      >>>>> This approach might also "survive" longer, because the YC scheme
>      >>>>> we've been loosely thinking about currently requires newly
>      >>>>> allocated pages to be cleared anyway. It's of course too early to
>      >>>>> tell if that requirement will stand in the end, but it's possible
>      >>>>> anyway.
>      >>>>>
>      >>>>> I'll need to do some more testing and benchmarking to make sure
>      >>>>> there's no regression or bugs here. The commented out debug code
>      >>>>> also needs to be addressed of course.
>      >>>>>
>      >>>>> Comments? Other ideas?
>      >>>>>
>      >>>>> cheers,
>      >>>>> Per
>      >>>>>
>      >>>>>> On 7/24/19 4:37 PM, Sciampacone, Ryan wrote:
>      >>>>>>
>      >>>>>> Somehow I lost the RFR off the front and started a new thread.
>      >>>>>> Now that we're both off vacation I'd like to revisit this.  Can
>      >>>>>> you take a look?
>      >>>>>>
>      >>>>>> On 7/8/19, 10:40 AM, "hotspot-gc-dev on behalf of Sciampacone,
>      >>>>>> Ryan" <hotspot-gc-dev-bounces at openjdk.java.net on behalf of
>      >>>>>> sci at amazon.com> wrote:
>      >>>>>>
>      >>>>>>       http://cr.openjdk.java.net/~phh/8227226/webrev.01/
>      >>>>>>       This shifts away from abusing the constructor do_zero magic
>      >>>>>> in exchange for virtualizing mem_clear() and specializing for the
>      >>>>>> Z version.  It does create a change in mem_clear in that it
>      >>>>>> returns an updated version of mem.  It does create change outside
>      >>>>>> of the Z code however it does feel cleaner.
>      >>>>>>       I didn't make a change to PinAllocating - looking at it, the
>      >>>>>> safety of keeping it constructor / destructor based still seemed
>      >>>>>> appropriate to me.  If the objection is to using the sequence
>      >>>>>> numbers to pin (and instead using handles to update) - this to me
>      >>>>>> seems less error prone.  I had originally discussed handles with
>      >>>>>> Stefan but the proposal came down to this which looks much cleaner.
>      >>>>>>       On 7/8/19, 6:36 AM, "hotspot-gc-dev on behalf of
>      >>>>>> Sciampacone, Ryan" <hotspot-gc-dev-bounces at openjdk.java.net on
>      >>>>>> behalf of sci at amazon.com> wrote:
>      >>>>>>           1) Yes this was a conscious decision.  There was
>      >>>>>> discussion on determining the optimal point for breakup but given
>      >>>>>> the existing sizes this seemed sufficient.  This doesn't preclude
>      >>>>>> the ability to go down that path if its deemed absolutely
>      >>>>>> necessary.  The path for more complex decisions is now available.
>      >>>>>>           2) Agree
>      >>>>>>           3) I'm not clear here.  Do you mean effectively going
>      >>>>>> direct to ZHeap and bypassing the single function PinAllocating?
>      >>>>>> Agree. Otherwise I'll ask you to be a bit clearer.
>      >>>>>>           4) Agree
>      >>>>>>           5) I initially had the exact same reaction but I played
>      >>>>>> around with a few other versions (including breaking up
>      >>>>>> initialization points between header and body to get the desired
>      >>>>>> results) and this ended up looking correct.  I'll try mixing in
>      >>>>>> the mem clearer function again (fresh start) to see if it looks
>      >>>>>> any better.
>      >>>>>>           On 7/8/19, 5:49 AM, "Per Liden" <per.liden at oracle.com>
>      >>>>>> wrote:
>      >>>>>>               Hi Ryan,
>      >>>>>>               A few general comments:
>      >>>>>>               1) It looks like this still only work for large pages?
>      >>>>>>               2) The log_info stuff should be removed.
>      >>>>>>               3) I'm not a huge fan of single-use utilities like
>      >>>>>> PinAllocating, at
>      >>>>>>               least not when, IMO, the alternative is more
>      >>>>>> straight forward and less code.
>      >>>>>>               4) Please make locals const when possible.
>      >>>>>>               5) Duplicating _do_zero looks odd. Injecting a "mem
>      >>>>>> clearer", similar to
>      >>>>>>               what Stefans original patch did, seems worth exploring.
>      >>>>>>               cheers,
>      >>>>>>               /Per
>      >>>>>>               (Btw, I'm on vacation so I might not be
>      >>>>>> super-responsive to emails)
>      >>>>>>               On 2019-07-08 12:42, Erik Österlund wrote:
>      >>>>>>               > Hi Ryan,
>      >>>>>>               >
>      >>>>>>               > This looks good in general. Just some stylistic
>      >>>>>> things...
>      >>>>>>               >
>      >>>>>>               > 1) In the ZGC project we like the letter 'Z' so
>      >>>>>> much that we put it in
>      >>>>>>               > front of everything we possibly can, including all
>      >>>>>> class names.
>      >>>>>>               > 2) We also explicitly state things are private
>      >>>>>> even though it's
>      >>>>>>               > bleedingly obvious.
>      >>>>>>               >
>      >>>>>>               > So:
>      >>>>>>               >
>      >>>>>>               > 39 class PinAllocating {
>      >>>>>>               > 40 HeapWord* _mem;
>      >>>>>>               > 41 public: -> 39 class ZPinAllocating { 40
>      >>>>>> private: 41 HeapWord* _mem;
>      >>>>>>               >    42
>      >>>>>>               >   41 public: I can be your sponsor and push this
>      >>>>>> change for you. I don't
>      >>>>>>               > think there is a need for another webrev for my
>      >>>>>> small stylistic remarks,
>      >>>>>>               > so I can just fix that before pushing this for
>      >>>>>> you. On that note, I'll
>      >>>>>>               > add me and StefanK to the contributed-by section
>      >>>>>> as we all worked out
>      >>>>>>               > the right solution to this problem
>      >>>>>> collaboratively. I have run through
>      >>>>>>               > mach5 tier1-5, and found no issues with this
>      >>>>>> patch. Thanks, /Erik
>      >>>>>>               >
>      >>>>>>               > On 2019-07-05 17:18, Sciampacone, Ryan wrote:
>      >>>>>>               >> http://cr.openjdk.java.net/~phh/8227226/webrev.00/
>      >>>>>>               >> https://bugs.openjdk.java.net/browse/JDK-8227226
>      >>>>>>               >>
>      >>>>>>               >> This patch introduces safe point checks into
>      >>>>>> array clearing during
>      >>>>>>               >> allocation for ZGC.  The patch isolates the
>      >>>>>> changes to ZGC as (in
>      >>>>>>               >> particular with the more modern collectors) the
>      >>>>>> approach to
>      >>>>>>               >> incrementalizing or respecting safe point checks
>      >>>>>> is going to be
>      >>>>>>               >> different.
>      >>>>>>               >>
>      >>>>>>               >> The approach is to keep the region holding the
>      >>>>>> array in the allocating
>      >>>>>>               >> state (pin logic) while updating the color to the
>      >>>>>> array after checks.
>      >>>>>>               >>
>      >>>>>>               >> Can I get a review?  Thanks.
>      >>>>>>               >>
>      >>>>>>               >> Ryan
>      >>>>>>               >
>      >>>>>>
>      >>
>      
>