8227226: Segmented array clearing for ZGC
Per Liden
per.liden at oracle.com
Tue Aug 13 08:49:17 UTC 2019
Just a follow up. Stefan correctly noted that a heap iteration that
happens before the final klass pointer has been installed will see the
temporary long[], which can be problematic if that heap iteration is
doing JVMTI-tagging on all long[] instances. In that case, that tag will
eventually end up on the oop array when we later install the final klass
pointer.
I filed JDK-8229451 to deal with this, and sent out a patch for review.
cheers,
Per
On 8/8/19 10:42 AM, Per Liden wrote:
> On 8/7/19 3:55 PM, Sciampacone, Ryan wrote:
>> > By overriding finish(), the sampling/reporting remains correct and
>> > unaffected, as it will never see the intermediate long[].
>> I learned something today. Thank you.
>>
>> For MemAllocator, I think we all agree the flow is locked in a bit too
>> rigidly but this helps with some of the VM/GC assumptions so we end up
>> battling it. That said I'm with you - if there's a rewrite to be had,
>> it's not in this patch.
>>
>> Otherwise, fwiw lgtm.
>
> Thanks Ryan!
>
> (Since you don't have an OpenJDK id I can't add you as Reviewed-by, but
> Stefan, Erik and you will all be added as Contributed-by)
>
> cheers,
> Per
>
>>
>>
>> On 8/7/19, 3:24 AM, "Per Liden" <per.liden at oracle.com> wrote:
>>
>> Hi again,
>> On 8/7/19 11:59 AM, Per Liden wrote:
>> > Hi Ryan,
>> >
>> > On 8/7/19 3:05 AM, Sciampacone, Ryan wrote:
>> >> Although least intrusive, it goes back to some of the earlier
>> >> complaints about using false in the constructor for do_zero.
>> It also
>> >> makes a fair number of
>> >
>> > My earlier comment about this was not about passing false to the
>> > constructor, but the duplication of the _do_zero member, which
>> I thought
>> > looked a bit odd. In this patch, this was avoided by separation
>> these
>> > paths already in ZCollectedHeap::array_allocate().
>> >
>> >> assumptions (and goes against the hierarchies intent) on
>> >> initialization logic to hide in finish(). That said, I agree
>> that is
>> >> fairly clean - and definitely addresses the missed cases of the
>> >> earlier webrev.
>> >>
>> >
>> > We've had the same discussions here and concluded that we might
>> want to
>> > restructure parts of MemAllocator to better accommodate this
>> use case,
>> > but that overriding finish() seems ok for now. A patch to
>> restructure
>> > MemAllocator could come later if we think it's needed.
>> >
>> >> 2 things,
>> >>
>> >> 1. Isn't the substitute_oop_array_klass() check too narrow?
>> It will
>> >> only detect types Object[], and not any other type of
>> reference array
>> >> (such as String[]) ? I believe there's a bug here (correct me
>> if I'm
>> >> wrong).
>> >
>> > On the JVM level, Object[], String[] and int[][] all have the same
>> > Klass, so we should catch them all with this single check.
>> Sorry, I'm of course wrong here. Changed the check to call
>> klass->is_objArray_klass() instead. Thanks!
>> Updated webrev.4 in-place.
>> cheers,
>> Per
>> >
>> >> 2. I'd want to see an assert() on the sizeof(long) ==
>> sizeof(void *)
>> >> dependency. I realize what code base this is in but it would be
>> >> properly defensive.
>> >
>> > Sounds good.
>> >
>> >>
>> >> What does the reporting look like in this case? Is the long[]
>> type
>> >> reported accepted? I'm wondering if this depletes some of the
>> >> simplicity.
>> >
>> > By overriding finish(), the sampling/reporting remains correct and
>> > unaffected, as it will never see the intermediate long[].
>> >
>> > Updated webrev:
>> >
>> > http://cr.openjdk.java.net/~pliden/8227226/webrev.4
>> >
>> > cheers,
>> > Per
>> >
>> >>
>> >> On 8/2/19, 6:13 AM, "hotspot-gc-dev on behalf of Per Liden"
>> >> <hotspot-gc-dev-bounces at openjdk.java.net on behalf of
>> >> per.liden at oracle.com> wrote:
>> >>
>> >> Did some micro-benchmarking (on a Xeon E5-2630) with
>> various segment
>> >> sizes between 4K and 512K, and 64K seems to offer a good
>> >> trade-off. For
>> >> a 1G array, the allocation time increases by ~1%, but in
>> exchange
>> >> the
>> >> worst case TTSP drops from ~280ms to ~0.6ms.
>> >> Updated webrev using 64K:
>> >> http://cr.openjdk.java.net/~pliden/8227226/webrev.3
>> >> cheers,
>> >> Per
>> >> On 8/2/19 11:11 AM, Per Liden wrote:
>> >> > Hi Erik,
>> >> >
>> >> > On 8/1/19 5:56 PM, Erik Osterlund wrote:
>> >> >> Hi Per,
>> >> >>
>> >> >> I like that this approach is unintrusive, does its
>> thing at
>> >> the right
>> >> >> abstraction layer, and also handles medium sized arrays.
>> >> >
>> >> > It even handles small arrays (i.e. arrays in small
>> zpages) ;)
>> >> >
>> >> >> Looks good.
>> >> >
>> >> > Thanks! I'll test various segment sizes and see how
>> that affects
>> >> > performance and TTSP.
>> >> >
>> >> > cheers,
>> >> > Per
>> >> >
>> >> >>
>> >> >> Thanks,
>> >> >> /Erik
>> >> >>
>> >> >>> On 1 Aug 2019, at 16:14, Per Liden
>> <per.liden at oracle.com> wrote:
>> >> >>>
>> >> >>> Here's an updated webrev that should be complete,
>> i.e. fixes the
>> >> >>> issues related to allocation sampling/reporting that I
>> >> mentioned.
>> >> >>> This patch makes MemAllocator::finish() virtual, so
>> that we
>> >> can do
>> >> >>> our thing and install the correct klass pointer
>> before the
>> >> Allocation
>> >> >>> destructor executes. This seems to be the least
>> intrusive why of
>> >> >>> doing this.
>> >> >>>
>> >> >>> http://cr.openjdk.java.net/~pliden/8227226/webrev.2
>> >> >>>
>> >> >>> This passed function testing, but proper benchmarking
>> remains
>> >> to be
>> >> >>> done.
>> >> >>>
>> >> >>> cheers,
>> >> >>> Per
>> >> >>>
>> >> >>>> On 7/31/19 7:19 PM, Per Liden wrote:
>> >> >>>> Hi,
>> >> >>>> I found some time to benchmark the "GC clears
>> >> pages"-approach, and
>> >> >>>> it's fairly clear that it's not paying off. So
>> ditching that
>> >> idea.
>> >> >>>> However, I'm still looking for something that would
>> not just do
>> >> >>>> segmented clearing of arrays in large zpages.
>> Letting oop
>> >> arrays
>> >> >>>> temporarily be typed arrays while it's being cleared
>> could
>> >> be an
>> >> >>>> option. I did a prototype for that, which looks like
>> this:
>> >> >>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.1
>> >> >>>> There's at least one issue here, the code doing
>> allocation
>> >> sampling
>> >> >>>> will see that we allocated long arrays instead of oop
>> >> arrays, so the
>> >> >>>> reporting there will be skewed. That can be
>> addressed if we
>> >> go down
>> >> >>>> this path. The code is otherwise fairly simple and
>> >> contained. Feel
>> >> >>>> free to spot any issues.
>> >> >>>> cheers,
>> >> >>>> Per
>> >> >>>>> On 7/26/19 2:27 PM, Per Liden wrote:
>> >> >>>>> Hi Ryan & Erik,
>> >> >>>>>
>> >> >>>>> I had a look at this and started exploring a slightly
>> >> different
>> >> >>>>> approach. Instead doing segmented clearing in the
>> >> allocation path,
>> >> >>>>> we can have the concurrent GC thread clear pages
>> when they are
>> >> >>>>> reclaimed and not do any clearing in the allocation
>> path at
>> >> all.
>> >> >>>>>
>> >> >>>>> That would look like this:
>> >> >>>>>
>> >> >>>>>
>> http://cr.openjdk.java.net/~pliden/8227226/webrev.0-base
>> >> >>>>>
>> >> >>>>> (I've had to temporarily comment out three lines of
>> >> assert/debug
>> >> >>>>> code to make this work)
>> >> >>>>>
>> >> >>>>> The relocation set selection phase will now be
>> tasked with
>> >> some
>> >> >>>>> potentially expensive clearing work, so we'll want
>> to make
>> >> that
>> >> >>>>> part parallel also.
>> >> >>>>>
>> >> >>>>>
>> http://cr.openjdk.java.net/~pliden/8227226/webrev.0-parallel
>> >> >>>>>
>> >> >>>>> Moving this work from Java threads onto the
>> concurrent GC
>> >> threads
>> >> >>>>> means we will potentially prolong the
>> >> RelocationSetSelection and
>> >> >>>>> Relocation phases. That might be a trade-off worth
>> doing. In
>> >> >>>>> return, we get:
>> >> >>>>>
>> >> >>>>> * Faster array allocations, as there's now less
>> work done
>> >> in the
>> >> >>>>> allocation path.
>> >> >>>>> * This benefits all arrays, not just those
>> allocated in
>> >> large pages.
>> >> >>>>> * No need to consider/tune a "chunk size".
>> >> >>>>> * I also tend think we'll end up with slightly less
>> complex
>> >> code,
>> >> >>>>> that is a bit easier to reason about. Can be
>> debated of
>> >> course.
>> >> >>>>>
>> >> >>>>> This approach might also "survive" longer, because
>> the YC
>> >> scheme
>> >> >>>>> we've been loosely thinking about currently
>> requires newly
>> >> >>>>> allocated pages to be cleared anyway. It's of
>> course too
>> >> early to
>> >> >>>>> tell if that requirement will stand in the end, but
>> it's
>> >> possible
>> >> >>>>> anyway.
>> >> >>>>>
>> >> >>>>> I'll need to do some more testing and benchmarking
>> to make
>> >> sure
>> >> >>>>> there's no regression or bugs here. The commented
>> out debug
>> >> code
>> >> >>>>> also needs to be addressed of course.
>> >> >>>>>
>> >> >>>>> Comments? Other ideas?
>> >> >>>>>
>> >> >>>>> cheers,
>> >> >>>>> Per
>> >> >>>>>
>> >> >>>>>> On 7/24/19 4:37 PM, Sciampacone, Ryan wrote:
>> >> >>>>>>
>> >> >>>>>> Somehow I lost the RFR off the front and started a
>> new
>> >> thread.
>> >> >>>>>> Now that we're both off vacation I'd like to revisit
>> >> this. Can
>> >> >>>>>> you take a look?
>> >> >>>>>>
>> >> >>>>>> On 7/8/19, 10:40 AM, "hotspot-gc-dev on behalf of
>> >> Sciampacone,
>> >> >>>>>> Ryan" <hotspot-gc-dev-bounces at openjdk.java.net on
>> behalf of
>> >> >>>>>> sci at amazon.com> wrote:
>> >> >>>>>>
>> >> >>>>>>
>> http://cr.openjdk.java.net/~phh/8227226/webrev.01/
>> >> >>>>>> This shifts away from abusing the constructor
>> >> do_zero magic
>> >> >>>>>> in exchange for virtualizing mem_clear() and
>> specializing
>> >> for the
>> >> >>>>>> Z version. It does create a change in mem_clear
>> in that it
>> >> >>>>>> returns an updated version of mem. It does create
>> change
>> >> outside
>> >> >>>>>> of the Z code however it does feel cleaner.
>> >> >>>>>> I didn't make a change to PinAllocating -
>> looking at
>> >> it, the
>> >> >>>>>> safety of keeping it constructor / destructor
>> based still
>> >> seemed
>> >> >>>>>> appropriate to me. If the objection is to using
>> the sequence
>> >> >>>>>> numbers to pin (and instead using handles to
>> update) -
>> >> this to me
>> >> >>>>>> seems less error prone. I had originally discussed
>> >> handles with
>> >> >>>>>> Stefan but the proposal came down to this which
>> looks much
>> >> cleaner.
>> >> >>>>>> On 7/8/19, 6:36 AM, "hotspot-gc-dev on
>> behalf of
>> >> >>>>>> Sciampacone, Ryan"
>> >> <hotspot-gc-dev-bounces at openjdk.java.net on
>> >> >>>>>> behalf of sci at amazon.com> wrote:
>> >> >>>>>> 1) Yes this was a conscious decision.
>> There was
>> >> >>>>>> discussion on determining the optimal point for
>> breakup
>> >> but given
>> >> >>>>>> the existing sizes this seemed sufficient. This
>> doesn't
>> >> preclude
>> >> >>>>>> the ability to go down that path if its deemed
>> absolutely
>> >> >>>>>> necessary. The path for more complex decisions is
>> now
>> >> available.
>> >> >>>>>> 2) Agree
>> >> >>>>>> 3) I'm not clear here. Do you mean
>> effectively
>> >> going
>> >> >>>>>> direct to ZHeap and bypassing the single function
>> >> PinAllocating?
>> >> >>>>>> Agree. Otherwise I'll ask you to be a bit clearer.
>> >> >>>>>> 4) Agree
>> >> >>>>>> 5) I initially had the exact same
>> reaction but I
>> >> played
>> >> >>>>>> around with a few other versions (including
>> breaking up
>> >> >>>>>> initialization points between header and body to
>> get the
>> >> desired
>> >> >>>>>> results) and this ended up looking correct. I'll try
>> >> mixing in
>> >> >>>>>> the mem clearer function again (fresh start) to
>> see if it
>> >> looks
>> >> >>>>>> any better.
>> >> >>>>>> On 7/8/19, 5:49 AM, "Per Liden"
>> >> <per.liden at oracle.com>
>> >> >>>>>> wrote:
>> >> >>>>>> Hi Ryan,
>> >> >>>>>> A few general comments:
>> >> >>>>>> 1) It looks like this still only
>> work for
>> >> large pages?
>> >> >>>>>> 2) The log_info stuff should be
>> removed.
>> >> >>>>>> 3) I'm not a huge fan of single-use
>> >> utilities like
>> >> >>>>>> PinAllocating, at
>> >> >>>>>> least not when, IMO, the alternative
>> is more
>> >> >>>>>> straight forward and less code.
>> >> >>>>>> 4) Please make locals const when
>> possible.
>> >> >>>>>> 5) Duplicating _do_zero looks odd.
>> Injecting
>> >> a "mem
>> >> >>>>>> clearer", similar to
>> >> >>>>>> what Stefans original patch did,
>> seems worth
>> >> exploring.
>> >> >>>>>> cheers,
>> >> >>>>>> /Per
>> >> >>>>>> (Btw, I'm on vacation so I might not be
>> >> >>>>>> super-responsive to emails)
>> >> >>>>>> On 2019-07-08 12:42, Erik Österlund
>> wrote:
>> >> >>>>>> > Hi Ryan,
>> >> >>>>>> >
>> >> >>>>>> > This looks good in general. Just some
>> >> stylistic
>> >> >>>>>> things...
>> >> >>>>>> >
>> >> >>>>>> > 1) In the ZGC project we like the
>> letter
>> >> 'Z' so
>> >> >>>>>> much that we put it in
>> >> >>>>>> > front of everything we possibly can,
>> >> including all
>> >> >>>>>> class names.
>> >> >>>>>> > 2) We also explicitly state things
>> are
>> >> private
>> >> >>>>>> even though it's
>> >> >>>>>> > bleedingly obvious.
>> >> >>>>>> >
>> >> >>>>>> > So:
>> >> >>>>>> >
>> >> >>>>>> > 39 class PinAllocating {
>> >> >>>>>> > 40 HeapWord* _mem;
>> >> >>>>>> > 41 public: -> 39 class
>> ZPinAllocating { 40
>> >> >>>>>> private: 41 HeapWord* _mem;
>> >> >>>>>> > 42
>> >> >>>>>> > 41 public: I can be your sponsor
>> and
>> >> push this
>> >> >>>>>> change for you. I don't
>> >> >>>>>> > think there is a need for another
>> webrev
>> >> for my
>> >> >>>>>> small stylistic remarks,
>> >> >>>>>> > so I can just fix that before
>> pushing this
>> >> for
>> >> >>>>>> you. On that note, I'll
>> >> >>>>>> > add me and StefanK to the
>> contributed-by
>> >> section
>> >> >>>>>> as we all worked out
>> >> >>>>>> > the right solution to this problem
>> >> >>>>>> collaboratively. I have run through
>> >> >>>>>> > mach5 tier1-5, and found no issues
>> with this
>> >> >>>>>> patch. Thanks, /Erik
>> >> >>>>>> >
>> >> >>>>>> > On 2019-07-05 17:18, Sciampacone,
>> Ryan wrote:
>> >> >>>>>> >>
>> >> http://cr.openjdk.java.net/~phh/8227226/webrev.00/
>> >> >>>>>> >>
>> >> https://bugs.openjdk.java.net/browse/JDK-8227226
>> >> >>>>>> >>
>> >> >>>>>> >> This patch introduces safe point
>> checks into
>> >> >>>>>> array clearing during
>> >> >>>>>> >> allocation for ZGC. The patch
>> isolates the
>> >> >>>>>> changes to ZGC as (in
>> >> >>>>>> >> particular with the more modern
>> >> collectors) the
>> >> >>>>>> approach to
>> >> >>>>>> >> incrementalizing or respecting
>> safe point
>> >> checks
>> >> >>>>>> is going to be
>> >> >>>>>> >> different.
>> >> >>>>>> >>
>> >> >>>>>> >> The approach is to keep the region
>> >> holding the
>> >> >>>>>> array in the allocating
>> >> >>>>>> >> state (pin logic) while updating the
>> >> color to the
>> >> >>>>>> array after checks.
>> >> >>>>>> >>
>> >> >>>>>> >> Can I get a review? Thanks.
>> >> >>>>>> >>
>> >> >>>>>> >> Ryan
>> >> >>>>>> >
>> >> >>>>>>
>> >> >>
>> >>
>>
More information about the hotspot-gc-dev
mailing list