8227226: Segmented array clearing for ZGC

Per Liden per.liden at oracle.com
Tue Aug 13 08:49:17 UTC 2019


Just a follow up. Stefan correctly noted that a heap iteration that 
happens before the final klass pointer has been installed will see the 
temporary long[], which can be problematic if that heap iteration is 
doing JVMTI-tagging on all long[] instances. In that case, that tag will 
eventually end up on the oop array when we later install the final klass 
pointer.

I filed JDK-8229451 to deal with this, and sent out a patch for review.

cheers,
Per

On 8/8/19 10:42 AM, Per Liden wrote:
> On 8/7/19 3:55 PM, Sciampacone, Ryan wrote:
>>      > By overriding finish(), the sampling/reporting remains correct and
>>      > unaffected, as it will never see the intermediate long[].
>> I learned something today.  Thank you.
>>
>> For MemAllocator, I think we all agree the flow is locked in a bit too 
>> rigidly but this helps with some of the VM/GC assumptions so we end up 
>> battling it.  That said I'm with you - if there's a rewrite to be had, 
>> it's not in this patch.
>>
>> Otherwise, fwiw lgtm.
> 
> Thanks Ryan!
> 
> (Since you don't have an OpenJDK id I can't add you as Reviewed-by, but 
> Stefan, Erik and you will all be added as Contributed-by)
> 
> cheers,
> Per
> 
>>
>>
>> On 8/7/19, 3:24 AM, "Per Liden" <per.liden at oracle.com> wrote:
>>
>>      Hi again,
>>      On 8/7/19 11:59 AM, Per Liden wrote:
>>      > Hi Ryan,
>>      >
>>      > On 8/7/19 3:05 AM, Sciampacone, Ryan wrote:
>>      >> Although least intrusive, it goes back to some of the earlier
>>      >> complaints about using false in the constructor for do_zero.  
>> It also
>>      >> makes a fair number of
>>      >
>>      > My earlier comment about this was not about passing false to the
>>      > constructor, but the duplication of the _do_zero member, which 
>> I thought
>>      > looked a bit odd. In this patch, this was avoided by separation 
>> these
>>      > paths already in ZCollectedHeap::array_allocate().
>>      >
>>      >> assumptions (and goes against the hierarchies intent) on
>>      >> initialization logic to hide in finish().  That said, I agree 
>> that is
>>      >> fairly clean - and definitely addresses the missed cases of the
>>      >> earlier webrev.
>>      >>
>>      >
>>      > We've had the same discussions here and concluded that we might 
>> want to
>>      > restructure parts of MemAllocator to better accommodate this 
>> use case,
>>      > but that overriding finish() seems ok for now. A patch to 
>> restructure
>>      > MemAllocator could come later if we think it's needed.
>>      >
>>      >> 2 things,
>>      >>
>>      >> 1. Isn't the substitute_oop_array_klass() check too narrow?  
>> It will
>>      >> only detect types Object[], and not any other type of 
>> reference array
>>      >> (such as String[]) ?  I believe there's a bug here (correct me 
>> if I'm
>>      >> wrong).
>>      >
>>      > On the JVM level, Object[], String[] and int[][] all have the same
>>      > Klass, so we should catch them all with this single check.
>>      Sorry, I'm of course wrong here. Changed the check to call
>>      klass->is_objArray_klass() instead. Thanks!
>>      Updated webrev.4 in-place.
>>      cheers,
>>      Per
>>      >
>>      >> 2. I'd want to see an assert() on the sizeof(long) == 
>> sizeof(void *)
>>      >> dependency.  I realize what code base this is in but it would be
>>      >> properly defensive.
>>      >
>>      > Sounds good.
>>      >
>>      >>
>>      >> What does the reporting look like in this case?  Is the long[] 
>> type
>>      >> reported accepted?  I'm wondering if this depletes some of the
>>      >> simplicity.
>>      >
>>      > By overriding finish(), the sampling/reporting remains correct and
>>      > unaffected, as it will never see the intermediate long[].
>>      >
>>      > Updated webrev:
>>      >
>>      > http://cr.openjdk.java.net/~pliden/8227226/webrev.4
>>      >
>>      > cheers,
>>      > Per
>>      >
>>      >>
>>      >> On 8/2/19, 6:13 AM, "hotspot-gc-dev on behalf of Per Liden"
>>      >> <hotspot-gc-dev-bounces at openjdk.java.net on behalf of
>>      >> per.liden at oracle.com> wrote:
>>      >>
>>      >>      Did some micro-benchmarking (on a Xeon E5-2630) with 
>> various segment
>>      >>      sizes between 4K and 512K, and 64K seems to offer a good
>>      >> trade-off. For
>>      >>      a 1G array, the allocation time increases by ~1%, but in 
>> exchange
>>      >> the
>>      >>      worst case TTSP drops from ~280ms to ~0.6ms.
>>      >>      Updated webrev using 64K:
>>      >>      http://cr.openjdk.java.net/~pliden/8227226/webrev.3
>>      >>      cheers,
>>      >>      Per
>>      >>      On 8/2/19 11:11 AM, Per Liden wrote:
>>      >>      > Hi Erik,
>>      >>      >
>>      >>      > On 8/1/19 5:56 PM, Erik Osterlund wrote:
>>      >>      >> Hi Per,
>>      >>      >>
>>      >>      >> I like that this approach is unintrusive, does its 
>> thing at
>>      >> the right
>>      >>      >> abstraction layer, and also handles medium sized arrays.
>>      >>      >
>>      >>      > It even handles small arrays (i.e. arrays in small 
>> zpages) ;)
>>      >>      >
>>      >>      >> Looks good.
>>      >>      >
>>      >>      > Thanks! I'll test various segment sizes and see how 
>> that affects
>>      >>      > performance and TTSP.
>>      >>      >
>>      >>      > cheers,
>>      >>      > Per
>>      >>      >
>>      >>      >>
>>      >>      >> Thanks,
>>      >>      >> /Erik
>>      >>      >>
>>      >>      >>> On 1 Aug 2019, at 16:14, Per Liden 
>> <per.liden at oracle.com> wrote:
>>      >>      >>>
>>      >>      >>> Here's an updated webrev that should be complete, 
>> i.e. fixes the
>>      >>      >>> issues related to allocation sampling/reporting that I
>>      >> mentioned.
>>      >>      >>> This patch makes MemAllocator::finish() virtual, so 
>> that we
>>      >> can do
>>      >>      >>> our thing and install the correct klass pointer 
>> before the
>>      >> Allocation
>>      >>      >>> destructor executes. This seems to be the least 
>> intrusive why of
>>      >>      >>> doing this.
>>      >>      >>>
>>      >>      >>> http://cr.openjdk.java.net/~pliden/8227226/webrev.2
>>      >>      >>>
>>      >>      >>> This passed function testing, but proper benchmarking 
>> remains
>>      >> to be
>>      >>      >>> done.
>>      >>      >>>
>>      >>      >>> cheers,
>>      >>      >>> Per
>>      >>      >>>
>>      >>      >>>> On 7/31/19 7:19 PM, Per Liden wrote:
>>      >>      >>>> Hi,
>>      >>      >>>> I found some time to benchmark the "GC clears
>>      >> pages"-approach, and
>>      >>      >>>> it's fairly clear that it's not paying off. So 
>> ditching that
>>      >> idea.
>>      >>      >>>> However, I'm still looking for something that would 
>> not just do
>>      >>      >>>> segmented clearing of arrays in large zpages. 
>> Letting oop
>>      >> arrays
>>      >>      >>>> temporarily be typed arrays while it's being cleared 
>> could
>>      >> be an
>>      >>      >>>> option. I did a prototype for that, which looks like 
>> this:
>>      >>      >>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.1
>>      >>      >>>> There's at least one issue here, the code doing 
>> allocation
>>      >> sampling
>>      >>      >>>> will see that we allocated long arrays instead of oop
>>      >> arrays, so the
>>      >>      >>>> reporting there will be skewed. That can be 
>> addressed if we
>>      >> go down
>>      >>      >>>> this path. The code is otherwise fairly simple and
>>      >> contained. Feel
>>      >>      >>>> free to spot any issues.
>>      >>      >>>> cheers,
>>      >>      >>>> Per
>>      >>      >>>>> On 7/26/19 2:27 PM, Per Liden wrote:
>>      >>      >>>>> Hi Ryan & Erik,
>>      >>      >>>>>
>>      >>      >>>>> I had a look at this and started exploring a slightly
>>      >> different
>>      >>      >>>>> approach. Instead doing segmented clearing in the
>>      >> allocation path,
>>      >>      >>>>> we can have the concurrent GC thread clear pages 
>> when they are
>>      >>      >>>>> reclaimed and not do any clearing in the allocation 
>> path at
>>      >> all.
>>      >>      >>>>>
>>      >>      >>>>> That would look like this:
>>      >>      >>>>>
>>      >>      >>>>> 
>> http://cr.openjdk.java.net/~pliden/8227226/webrev.0-base
>>      >>      >>>>>
>>      >>      >>>>> (I've had to temporarily comment out three lines of
>>      >> assert/debug
>>      >>      >>>>> code to make this work)
>>      >>      >>>>>
>>      >>      >>>>> The relocation set selection phase will now be 
>> tasked with
>>      >> some
>>      >>      >>>>> potentially expensive clearing work, so we'll want 
>> to make
>>      >> that
>>      >>      >>>>> part parallel also.
>>      >>      >>>>>
>>      >>      >>>>> 
>> http://cr.openjdk.java.net/~pliden/8227226/webrev.0-parallel
>>      >>      >>>>>
>>      >>      >>>>> Moving this work from Java threads onto the 
>> concurrent GC
>>      >> threads
>>      >>      >>>>> means we will potentially prolong the
>>      >> RelocationSetSelection and
>>      >>      >>>>> Relocation phases. That might be a trade-off worth 
>> doing. In
>>      >>      >>>>> return, we get:
>>      >>      >>>>>
>>      >>      >>>>> * Faster array allocations, as there's now less 
>> work done
>>      >> in the
>>      >>      >>>>> allocation path.
>>      >>      >>>>> * This benefits all arrays, not just those 
>> allocated in
>>      >> large pages.
>>      >>      >>>>> * No need to consider/tune a "chunk size".
>>      >>      >>>>> * I also tend think we'll end up with slightly less 
>> complex
>>      >> code,
>>      >>      >>>>> that is a bit easier to reason about. Can be 
>> debated of
>>      >> course.
>>      >>      >>>>>
>>      >>      >>>>> This approach might also "survive" longer, because 
>> the YC
>>      >> scheme
>>      >>      >>>>> we've been loosely thinking about currently 
>> requires newly
>>      >>      >>>>> allocated pages to be cleared anyway. It's of 
>> course too
>>      >> early to
>>      >>      >>>>> tell if that requirement will stand in the end, but 
>> it's
>>      >> possible
>>      >>      >>>>> anyway.
>>      >>      >>>>>
>>      >>      >>>>> I'll need to do some more testing and benchmarking 
>> to make
>>      >> sure
>>      >>      >>>>> there's no regression or bugs here. The commented 
>> out debug
>>      >> code
>>      >>      >>>>> also needs to be addressed of course.
>>      >>      >>>>>
>>      >>      >>>>> Comments? Other ideas?
>>      >>      >>>>>
>>      >>      >>>>> cheers,
>>      >>      >>>>> Per
>>      >>      >>>>>
>>      >>      >>>>>> On 7/24/19 4:37 PM, Sciampacone, Ryan wrote:
>>      >>      >>>>>>
>>      >>      >>>>>> Somehow I lost the RFR off the front and started a 
>> new
>>      >> thread.
>>      >>      >>>>>> Now that we're both off vacation I'd like to revisit
>>      >> this.  Can
>>      >>      >>>>>> you take a look?
>>      >>      >>>>>>
>>      >>      >>>>>> On 7/8/19, 10:40 AM, "hotspot-gc-dev on behalf of
>>      >> Sciampacone,
>>      >>      >>>>>> Ryan" <hotspot-gc-dev-bounces at openjdk.java.net on 
>> behalf of
>>      >>      >>>>>> sci at amazon.com> wrote:
>>      >>      >>>>>>
>>      >>      >>>>>>       
>> http://cr.openjdk.java.net/~phh/8227226/webrev.01/
>>      >>      >>>>>>       This shifts away from abusing the constructor
>>      >> do_zero magic
>>      >>      >>>>>> in exchange for virtualizing mem_clear() and 
>> specializing
>>      >> for the
>>      >>      >>>>>> Z version.  It does create a change in mem_clear 
>> in that it
>>      >>      >>>>>> returns an updated version of mem.  It does create 
>> change
>>      >> outside
>>      >>      >>>>>> of the Z code however it does feel cleaner.
>>      >>      >>>>>>       I didn't make a change to PinAllocating - 
>> looking at
>>      >> it, the
>>      >>      >>>>>> safety of keeping it constructor / destructor 
>> based still
>>      >> seemed
>>      >>      >>>>>> appropriate to me.  If the objection is to using 
>> the sequence
>>      >>      >>>>>> numbers to pin (and instead using handles to 
>> update) -
>>      >> this to me
>>      >>      >>>>>> seems less error prone.  I had originally discussed
>>      >> handles with
>>      >>      >>>>>> Stefan but the proposal came down to this which 
>> looks much
>>      >> cleaner.
>>      >>      >>>>>>       On 7/8/19, 6:36 AM, "hotspot-gc-dev on 
>> behalf of
>>      >>      >>>>>> Sciampacone, Ryan"
>>      >> <hotspot-gc-dev-bounces at openjdk.java.net on
>>      >>      >>>>>> behalf of sci at amazon.com> wrote:
>>      >>      >>>>>>           1) Yes this was a conscious decision.  
>> There was
>>      >>      >>>>>> discussion on determining the optimal point for 
>> breakup
>>      >> but given
>>      >>      >>>>>> the existing sizes this seemed sufficient.  This 
>> doesn't
>>      >> preclude
>>      >>      >>>>>> the ability to go down that path if its deemed 
>> absolutely
>>      >>      >>>>>> necessary.  The path for more complex decisions is 
>> now
>>      >> available.
>>      >>      >>>>>>           2) Agree
>>      >>      >>>>>>           3) I'm not clear here.  Do you mean 
>> effectively
>>      >> going
>>      >>      >>>>>> direct to ZHeap and bypassing the single function
>>      >> PinAllocating?
>>      >>      >>>>>> Agree. Otherwise I'll ask you to be a bit clearer.
>>      >>      >>>>>>           4) Agree
>>      >>      >>>>>>           5) I initially had the exact same 
>> reaction but I
>>      >> played
>>      >>      >>>>>> around with a few other versions (including 
>> breaking up
>>      >>      >>>>>> initialization points between header and body to 
>> get the
>>      >> desired
>>      >>      >>>>>> results) and this ended up looking correct.  I'll try
>>      >> mixing in
>>      >>      >>>>>> the mem clearer function again (fresh start) to 
>> see if it
>>      >> looks
>>      >>      >>>>>> any better.
>>      >>      >>>>>>           On 7/8/19, 5:49 AM, "Per Liden"
>>      >> <per.liden at oracle.com>
>>      >>      >>>>>> wrote:
>>      >>      >>>>>>               Hi Ryan,
>>      >>      >>>>>>               A few general comments:
>>      >>      >>>>>>               1) It looks like this still only 
>> work for
>>      >> large pages?
>>      >>      >>>>>>               2) The log_info stuff should be 
>> removed.
>>      >>      >>>>>>               3) I'm not a huge fan of single-use
>>      >> utilities like
>>      >>      >>>>>> PinAllocating, at
>>      >>      >>>>>>               least not when, IMO, the alternative 
>> is more
>>      >>      >>>>>> straight forward and less code.
>>      >>      >>>>>>               4) Please make locals const when 
>> possible.
>>      >>      >>>>>>               5) Duplicating _do_zero looks odd. 
>> Injecting
>>      >> a "mem
>>      >>      >>>>>> clearer", similar to
>>      >>      >>>>>>               what Stefans original patch did, 
>> seems worth
>>      >> exploring.
>>      >>      >>>>>>               cheers,
>>      >>      >>>>>>               /Per
>>      >>      >>>>>>               (Btw, I'm on vacation so I might not be
>>      >>      >>>>>> super-responsive to emails)
>>      >>      >>>>>>               On 2019-07-08 12:42, Erik Österlund 
>> wrote:
>>      >>      >>>>>>               > Hi Ryan,
>>      >>      >>>>>>               >
>>      >>      >>>>>>               > This looks good in general. Just some
>>      >> stylistic
>>      >>      >>>>>> things...
>>      >>      >>>>>>               >
>>      >>      >>>>>>               > 1) In the ZGC project we like the 
>> letter
>>      >> 'Z' so
>>      >>      >>>>>> much that we put it in
>>      >>      >>>>>>               > front of everything we possibly can,
>>      >> including all
>>      >>      >>>>>> class names.
>>      >>      >>>>>>               > 2) We also explicitly state things 
>> are
>>      >> private
>>      >>      >>>>>> even though it's
>>      >>      >>>>>>               > bleedingly obvious.
>>      >>      >>>>>>               >
>>      >>      >>>>>>               > So:
>>      >>      >>>>>>               >
>>      >>      >>>>>>               > 39 class PinAllocating {
>>      >>      >>>>>>               > 40 HeapWord* _mem;
>>      >>      >>>>>>               > 41 public: -> 39 class 
>> ZPinAllocating { 40
>>      >>      >>>>>> private: 41 HeapWord* _mem;
>>      >>      >>>>>>               >    42
>>      >>      >>>>>>               >   41 public: I can be your sponsor 
>> and
>>      >> push this
>>      >>      >>>>>> change for you. I don't
>>      >>      >>>>>>               > think there is a need for another 
>> webrev
>>      >> for my
>>      >>      >>>>>> small stylistic remarks,
>>      >>      >>>>>>               > so I can just fix that before 
>> pushing this
>>      >> for
>>      >>      >>>>>> you. On that note, I'll
>>      >>      >>>>>>               > add me and StefanK to the 
>> contributed-by
>>      >> section
>>      >>      >>>>>> as we all worked out
>>      >>      >>>>>>               > the right solution to this problem
>>      >>      >>>>>> collaboratively. I have run through
>>      >>      >>>>>>               > mach5 tier1-5, and found no issues 
>> with this
>>      >>      >>>>>> patch. Thanks, /Erik
>>      >>      >>>>>>               >
>>      >>      >>>>>>               > On 2019-07-05 17:18, Sciampacone, 
>> Ryan wrote:
>>      >>      >>>>>>               >>
>>      >> http://cr.openjdk.java.net/~phh/8227226/webrev.00/
>>      >>      >>>>>>               >>
>>      >> https://bugs.openjdk.java.net/browse/JDK-8227226
>>      >>      >>>>>>               >>
>>      >>      >>>>>>               >> This patch introduces safe point 
>> checks into
>>      >>      >>>>>> array clearing during
>>      >>      >>>>>>               >> allocation for ZGC.  The patch 
>> isolates the
>>      >>      >>>>>> changes to ZGC as (in
>>      >>      >>>>>>               >> particular with the more modern
>>      >> collectors) the
>>      >>      >>>>>> approach to
>>      >>      >>>>>>               >> incrementalizing or respecting 
>> safe point
>>      >> checks
>>      >>      >>>>>> is going to be
>>      >>      >>>>>>               >> different.
>>      >>      >>>>>>               >>
>>      >>      >>>>>>               >> The approach is to keep the region
>>      >> holding the
>>      >>      >>>>>> array in the allocating
>>      >>      >>>>>>               >> state (pin logic) while updating the
>>      >> color to the
>>      >>      >>>>>> array after checks.
>>      >>      >>>>>>               >>
>>      >>      >>>>>>               >> Can I get a review?  Thanks.
>>      >>      >>>>>>               >>
>>      >>      >>>>>>               >> Ryan
>>      >>      >>>>>>               >
>>      >>      >>>>>>
>>      >>      >>
>>      >>
>>


More information about the hotspot-gc-dev mailing list