8227226: Segmented array clearing for ZGC
Per Liden
per.liden at oracle.com
Wed Aug 7 10:23:13 UTC 2019
Hi again,
On 8/7/19 11:59 AM, Per Liden wrote:
> Hi Ryan,
>
> On 8/7/19 3:05 AM, Sciampacone, Ryan wrote:
>> Although least intrusive, it goes back to some of the earlier
>> complaints about using false in the constructor for do_zero. It also
>> makes a fair number of
>
> My earlier comment about this was not about passing false to the
> constructor, but the duplication of the _do_zero member, which I thought
> looked a bit odd. In this patch, this was avoided by separation these
> paths already in ZCollectedHeap::array_allocate().
>
>> assumptions (and goes against the hierarchies intent) on
>> initialization logic to hide in finish(). That said, I agree that is
>> fairly clean - and definitely addresses the missed cases of the
>> earlier webrev.
>>
>
> We've had the same discussions here and concluded that we might want to
> restructure parts of MemAllocator to better accommodate this use case,
> but that overriding finish() seems ok for now. A patch to restructure
> MemAllocator could come later if we think it's needed.
>
>> 2 things,
>>
>> 1. Isn't the substitute_oop_array_klass() check too narrow? It will
>> only detect types Object[], and not any other type of reference array
>> (such as String[]) ? I believe there's a bug here (correct me if I'm
>> wrong).
>
> On the JVM level, Object[], String[] and int[][] all have the same
> Klass, so we should catch them all with this single check.
Sorry, I'm of course wrong here. Changed the check to call
klass->is_objArray_klass() instead. Thanks!
Updated webrev.4 in-place.
cheers,
Per
>
>> 2. I'd want to see an assert() on the sizeof(long) == sizeof(void *)
>> dependency. I realize what code base this is in but it would be
>> properly defensive.
>
> Sounds good.
>
>>
>> What does the reporting look like in this case? Is the long[] type
>> reported accepted? I'm wondering if this depletes some of the
>> simplicity.
>
> By overriding finish(), the sampling/reporting remains correct and
> unaffected, as it will never see the intermediate long[].
>
> Updated webrev:
>
> http://cr.openjdk.java.net/~pliden/8227226/webrev.4
>
> cheers,
> Per
>
>>
>> On 8/2/19, 6:13 AM, "hotspot-gc-dev on behalf of Per Liden"
>> <hotspot-gc-dev-bounces at openjdk.java.net on behalf of
>> per.liden at oracle.com> wrote:
>>
>> Did some micro-benchmarking (on a Xeon E5-2630) with various segment
>> sizes between 4K and 512K, and 64K seems to offer a good
>> trade-off. For
>> a 1G array, the allocation time increases by ~1%, but in exchange
>> the
>> worst case TTSP drops from ~280ms to ~0.6ms.
>> Updated webrev using 64K:
>> http://cr.openjdk.java.net/~pliden/8227226/webrev.3
>> cheers,
>> Per
>> On 8/2/19 11:11 AM, Per Liden wrote:
>> > Hi Erik,
>> >
>> > On 8/1/19 5:56 PM, Erik Osterlund wrote:
>> >> Hi Per,
>> >>
>> >> I like that this approach is unintrusive, does its thing at
>> the right
>> >> abstraction layer, and also handles medium sized arrays.
>> >
>> > It even handles small arrays (i.e. arrays in small zpages) ;)
>> >
>> >> Looks good.
>> >
>> > Thanks! I'll test various segment sizes and see how that affects
>> > performance and TTSP.
>> >
>> > cheers,
>> > Per
>> >
>> >>
>> >> Thanks,
>> >> /Erik
>> >>
>> >>> On 1 Aug 2019, at 16:14, Per Liden <per.liden at oracle.com> wrote:
>> >>>
>> >>> Here's an updated webrev that should be complete, i.e. fixes the
>> >>> issues related to allocation sampling/reporting that I
>> mentioned.
>> >>> This patch makes MemAllocator::finish() virtual, so that we
>> can do
>> >>> our thing and install the correct klass pointer before the
>> Allocation
>> >>> destructor executes. This seems to be the least intrusive why of
>> >>> doing this.
>> >>>
>> >>> http://cr.openjdk.java.net/~pliden/8227226/webrev.2
>> >>>
>> >>> This passed function testing, but proper benchmarking remains
>> to be
>> >>> done.
>> >>>
>> >>> cheers,
>> >>> Per
>> >>>
>> >>>> On 7/31/19 7:19 PM, Per Liden wrote:
>> >>>> Hi,
>> >>>> I found some time to benchmark the "GC clears
>> pages"-approach, and
>> >>>> it's fairly clear that it's not paying off. So ditching that
>> idea.
>> >>>> However, I'm still looking for something that would not just do
>> >>>> segmented clearing of arrays in large zpages. Letting oop
>> arrays
>> >>>> temporarily be typed arrays while it's being cleared could
>> be an
>> >>>> option. I did a prototype for that, which looks like this:
>> >>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.1
>> >>>> There's at least one issue here, the code doing allocation
>> sampling
>> >>>> will see that we allocated long arrays instead of oop
>> arrays, so the
>> >>>> reporting there will be skewed. That can be addressed if we
>> go down
>> >>>> this path. The code is otherwise fairly simple and
>> contained. Feel
>> >>>> free to spot any issues.
>> >>>> cheers,
>> >>>> Per
>> >>>>> On 7/26/19 2:27 PM, Per Liden wrote:
>> >>>>> Hi Ryan & Erik,
>> >>>>>
>> >>>>> I had a look at this and started exploring a slightly
>> different
>> >>>>> approach. Instead doing segmented clearing in the
>> allocation path,
>> >>>>> we can have the concurrent GC thread clear pages when they are
>> >>>>> reclaimed and not do any clearing in the allocation path at
>> all.
>> >>>>>
>> >>>>> That would look like this:
>> >>>>>
>> >>>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.0-base
>> >>>>>
>> >>>>> (I've had to temporarily comment out three lines of
>> assert/debug
>> >>>>> code to make this work)
>> >>>>>
>> >>>>> The relocation set selection phase will now be tasked with
>> some
>> >>>>> potentially expensive clearing work, so we'll want to make
>> that
>> >>>>> part parallel also.
>> >>>>>
>> >>>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.0-parallel
>> >>>>>
>> >>>>> Moving this work from Java threads onto the concurrent GC
>> threads
>> >>>>> means we will potentially prolong the
>> RelocationSetSelection and
>> >>>>> Relocation phases. That might be a trade-off worth doing. In
>> >>>>> return, we get:
>> >>>>>
>> >>>>> * Faster array allocations, as there's now less work done
>> in the
>> >>>>> allocation path.
>> >>>>> * This benefits all arrays, not just those allocated in
>> large pages.
>> >>>>> * No need to consider/tune a "chunk size".
>> >>>>> * I also tend think we'll end up with slightly less complex
>> code,
>> >>>>> that is a bit easier to reason about. Can be debated of
>> course.
>> >>>>>
>> >>>>> This approach might also "survive" longer, because the YC
>> scheme
>> >>>>> we've been loosely thinking about currently requires newly
>> >>>>> allocated pages to be cleared anyway. It's of course too
>> early to
>> >>>>> tell if that requirement will stand in the end, but it's
>> possible
>> >>>>> anyway.
>> >>>>>
>> >>>>> I'll need to do some more testing and benchmarking to make
>> sure
>> >>>>> there's no regression or bugs here. The commented out debug
>> code
>> >>>>> also needs to be addressed of course.
>> >>>>>
>> >>>>> Comments? Other ideas?
>> >>>>>
>> >>>>> cheers,
>> >>>>> Per
>> >>>>>
>> >>>>>> On 7/24/19 4:37 PM, Sciampacone, Ryan wrote:
>> >>>>>>
>> >>>>>> Somehow I lost the RFR off the front and started a new
>> thread.
>> >>>>>> Now that we're both off vacation I'd like to revisit
>> this. Can
>> >>>>>> you take a look?
>> >>>>>>
>> >>>>>> On 7/8/19, 10:40 AM, "hotspot-gc-dev on behalf of
>> Sciampacone,
>> >>>>>> Ryan" <hotspot-gc-dev-bounces at openjdk.java.net on behalf of
>> >>>>>> sci at amazon.com> wrote:
>> >>>>>>
>> >>>>>> http://cr.openjdk.java.net/~phh/8227226/webrev.01/
>> >>>>>> This shifts away from abusing the constructor
>> do_zero magic
>> >>>>>> in exchange for virtualizing mem_clear() and specializing
>> for the
>> >>>>>> Z version. It does create a change in mem_clear in that it
>> >>>>>> returns an updated version of mem. It does create change
>> outside
>> >>>>>> of the Z code however it does feel cleaner.
>> >>>>>> I didn't make a change to PinAllocating - looking at
>> it, the
>> >>>>>> safety of keeping it constructor / destructor based still
>> seemed
>> >>>>>> appropriate to me. If the objection is to using the sequence
>> >>>>>> numbers to pin (and instead using handles to update) -
>> this to me
>> >>>>>> seems less error prone. I had originally discussed
>> handles with
>> >>>>>> Stefan but the proposal came down to this which looks much
>> cleaner.
>> >>>>>> On 7/8/19, 6:36 AM, "hotspot-gc-dev on behalf of
>> >>>>>> Sciampacone, Ryan"
>> <hotspot-gc-dev-bounces at openjdk.java.net on
>> >>>>>> behalf of sci at amazon.com> wrote:
>> >>>>>> 1) Yes this was a conscious decision. There was
>> >>>>>> discussion on determining the optimal point for breakup
>> but given
>> >>>>>> the existing sizes this seemed sufficient. This doesn't
>> preclude
>> >>>>>> the ability to go down that path if its deemed absolutely
>> >>>>>> necessary. The path for more complex decisions is now
>> available.
>> >>>>>> 2) Agree
>> >>>>>> 3) I'm not clear here. Do you mean effectively
>> going
>> >>>>>> direct to ZHeap and bypassing the single function
>> PinAllocating?
>> >>>>>> Agree. Otherwise I'll ask you to be a bit clearer.
>> >>>>>> 4) Agree
>> >>>>>> 5) I initially had the exact same reaction but I
>> played
>> >>>>>> around with a few other versions (including breaking up
>> >>>>>> initialization points between header and body to get the
>> desired
>> >>>>>> results) and this ended up looking correct. I'll try
>> mixing in
>> >>>>>> the mem clearer function again (fresh start) to see if it
>> looks
>> >>>>>> any better.
>> >>>>>> On 7/8/19, 5:49 AM, "Per Liden"
>> <per.liden at oracle.com>
>> >>>>>> wrote:
>> >>>>>> Hi Ryan,
>> >>>>>> A few general comments:
>> >>>>>> 1) It looks like this still only work for
>> large pages?
>> >>>>>> 2) The log_info stuff should be removed.
>> >>>>>> 3) I'm not a huge fan of single-use
>> utilities like
>> >>>>>> PinAllocating, at
>> >>>>>> least not when, IMO, the alternative is more
>> >>>>>> straight forward and less code.
>> >>>>>> 4) Please make locals const when possible.
>> >>>>>> 5) Duplicating _do_zero looks odd. Injecting
>> a "mem
>> >>>>>> clearer", similar to
>> >>>>>> what Stefans original patch did, seems worth
>> exploring.
>> >>>>>> cheers,
>> >>>>>> /Per
>> >>>>>> (Btw, I'm on vacation so I might not be
>> >>>>>> super-responsive to emails)
>> >>>>>> On 2019-07-08 12:42, Erik Österlund wrote:
>> >>>>>> > Hi Ryan,
>> >>>>>> >
>> >>>>>> > This looks good in general. Just some
>> stylistic
>> >>>>>> things...
>> >>>>>> >
>> >>>>>> > 1) In the ZGC project we like the letter
>> 'Z' so
>> >>>>>> much that we put it in
>> >>>>>> > front of everything we possibly can,
>> including all
>> >>>>>> class names.
>> >>>>>> > 2) We also explicitly state things are
>> private
>> >>>>>> even though it's
>> >>>>>> > bleedingly obvious.
>> >>>>>> >
>> >>>>>> > So:
>> >>>>>> >
>> >>>>>> > 39 class PinAllocating {
>> >>>>>> > 40 HeapWord* _mem;
>> >>>>>> > 41 public: -> 39 class ZPinAllocating { 40
>> >>>>>> private: 41 HeapWord* _mem;
>> >>>>>> > 42
>> >>>>>> > 41 public: I can be your sponsor and
>> push this
>> >>>>>> change for you. I don't
>> >>>>>> > think there is a need for another webrev
>> for my
>> >>>>>> small stylistic remarks,
>> >>>>>> > so I can just fix that before pushing this
>> for
>> >>>>>> you. On that note, I'll
>> >>>>>> > add me and StefanK to the contributed-by
>> section
>> >>>>>> as we all worked out
>> >>>>>> > the right solution to this problem
>> >>>>>> collaboratively. I have run through
>> >>>>>> > mach5 tier1-5, and found no issues with this
>> >>>>>> patch. Thanks, /Erik
>> >>>>>> >
>> >>>>>> > On 2019-07-05 17:18, Sciampacone, Ryan wrote:
>> >>>>>> >>
>> http://cr.openjdk.java.net/~phh/8227226/webrev.00/
>> >>>>>> >>
>> https://bugs.openjdk.java.net/browse/JDK-8227226
>> >>>>>> >>
>> >>>>>> >> This patch introduces safe point checks into
>> >>>>>> array clearing during
>> >>>>>> >> allocation for ZGC. The patch isolates the
>> >>>>>> changes to ZGC as (in
>> >>>>>> >> particular with the more modern
>> collectors) the
>> >>>>>> approach to
>> >>>>>> >> incrementalizing or respecting safe point
>> checks
>> >>>>>> is going to be
>> >>>>>> >> different.
>> >>>>>> >>
>> >>>>>> >> The approach is to keep the region
>> holding the
>> >>>>>> array in the allocating
>> >>>>>> >> state (pin logic) while updating the
>> color to the
>> >>>>>> array after checks.
>> >>>>>> >>
>> >>>>>> >> Can I get a review? Thanks.
>> >>>>>> >>
>> >>>>>> >> Ryan
>> >>>>>> >
>> >>>>>>
>> >>
>>
More information about the hotspot-gc-dev
mailing list