RFR: 8227226: Segmented array clearing for ZGC

Fri Aug 2 13:11:04 UTC 2019

Did some micro-benchmarking (on a Xeon E5-2630) with various segment 
sizes between 4K and 512K, and 64K seems to offer a good trade-off. For 
a 1G array, the allocation time increases by ~1%, but in exchange the 
worst case TTSP drops from ~280ms to ~0.6ms.

Updated webrev using 64K:

http://cr.openjdk.java.net/~pliden/8227226/webrev.3

cheers,
Per

On 8/2/19 11:11 AM, Per Liden wrote:
> Hi Erik,
> 
> On 8/1/19 5:56 PM, Erik Osterlund wrote:
>> Hi Per,
>>
>> I like that this approach is unintrusive, does its thing at the right 
>> abstraction layer, and also handles medium sized arrays.
> 
> It even handles small arrays (i.e. arrays in small zpages) ;)
> 
>> Looks good.
> 
> Thanks! I'll test various segment sizes and see how that affects 
> performance and TTSP.
> 
> cheers,
> Per
> 
>>
>> Thanks,
>> /Erik
>>
>>> On 1 Aug 2019, at 16:14, Per Liden <per.liden at oracle.com> wrote:
>>>
>>> Here's an updated webrev that should be complete, i.e. fixes the 
>>> issues related to allocation sampling/reporting that I mentioned. 
>>> This patch makes MemAllocator::finish() virtual, so that we can do 
>>> our thing and install the correct klass pointer before the Allocation 
>>> destructor executes. This seems to be the least intrusive why of 
>>> doing this.
>>>
>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.2
>>>
>>> This passed function testing, but proper benchmarking remains to be 
>>> done.
>>>
>>> cheers,
>>> Per
>>>
>>>> On 7/31/19 7:19 PM, Per Liden wrote:
>>>> Hi,
>>>> I found some time to benchmark the "GC clears pages"-approach, and 
>>>> it's fairly clear that it's not paying off. So ditching that idea.
>>>> However, I'm still looking for something that would not just do 
>>>> segmented clearing of arrays in large zpages. Letting oop arrays 
>>>> temporarily be typed arrays while it's being cleared could be an 
>>>> option. I did a prototype for that, which looks like this:
>>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.1
>>>> There's at least one issue here, the code doing allocation sampling 
>>>> will see that we allocated long arrays instead of oop arrays, so the 
>>>> reporting there will be skewed. That can be addressed if we go down 
>>>> this path. The code is otherwise fairly simple and contained. Feel 
>>>> free to spot any issues.
>>>> cheers,
>>>> Per
>>>>> On 7/26/19 2:27 PM, Per Liden wrote:
>>>>> Hi Ryan & Erik,
>>>>>
>>>>> I had a look at this and started exploring a slightly different 
>>>>> approach. Instead doing segmented clearing in the allocation path, 
>>>>> we can have the concurrent GC thread clear pages when they are 
>>>>> reclaimed and not do any clearing in the allocation path at all.
>>>>>
>>>>> That would look like this:
>>>>>
>>>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.0-base
>>>>>
>>>>> (I've had to temporarily comment out three lines of assert/debug 
>>>>> code to make this work)
>>>>>
>>>>> The relocation set selection phase will now be tasked with some 
>>>>> potentially expensive clearing work, so we'll want to make that 
>>>>> part parallel also.
>>>>>
>>>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.0-parallel
>>>>>
>>>>> Moving this work from Java threads onto the concurrent GC threads 
>>>>> means we will potentially prolong the RelocationSetSelection and 
>>>>> Relocation phases. That might be a trade-off worth doing. In 
>>>>> return, we get:
>>>>>
>>>>> * Faster array allocations, as there's now less work done in the 
>>>>> allocation path.
>>>>> * This benefits all arrays, not just those allocated in large pages.
>>>>> * No need to consider/tune a "chunk size".
>>>>> * I also tend think we'll end up with slightly less complex code, 
>>>>> that is a bit easier to reason about. Can be debated of course.
>>>>>
>>>>> This approach might also "survive" longer, because the YC scheme 
>>>>> we've been loosely thinking about currently requires newly 
>>>>> allocated pages to be cleared anyway. It's of course too early to 
>>>>> tell if that requirement will stand in the end, but it's possible 
>>>>> anyway.
>>>>>
>>>>> I'll need to do some more testing and benchmarking to make sure 
>>>>> there's no regression or bugs here. The commented out debug code 
>>>>> also needs to be addressed of course.
>>>>>
>>>>> Comments? Other ideas?
>>>>>
>>>>> cheers,
>>>>> Per
>>>>>
>>>>>> On 7/24/19 4:37 PM, Sciampacone, Ryan wrote:
>>>>>>
>>>>>> Somehow I lost the RFR off the front and started a new thread.
>>>>>> Now that we're both off vacation I'd like to revisit this.  Can 
>>>>>> you take a look?
>>>>>>
>>>>>> On 7/8/19, 10:40 AM, "hotspot-gc-dev on behalf of Sciampacone, 
>>>>>> Ryan" <hotspot-gc-dev-bounces at openjdk.java.net on behalf of 
>>>>>> sci at amazon.com> wrote:
>>>>>>
>>>>>>       http://cr.openjdk.java.net/~phh/8227226/webrev.01/
>>>>>>       This shifts away from abusing the constructor do_zero magic 
>>>>>> in exchange for virtualizing mem_clear() and specializing for the 
>>>>>> Z version.  It does create a change in mem_clear in that it 
>>>>>> returns an updated version of mem.  It does create change outside 
>>>>>> of the Z code however it does feel cleaner.
>>>>>>       I didn't make a change to PinAllocating - looking at it, the 
>>>>>> safety of keeping it constructor / destructor based still seemed 
>>>>>> appropriate to me.  If the objection is to using the sequence 
>>>>>> numbers to pin (and instead using handles to update) - this to me 
>>>>>> seems less error prone.  I had originally discussed handles with 
>>>>>> Stefan but the proposal came down to this which looks much cleaner.
>>>>>>       On 7/8/19, 6:36 AM, "hotspot-gc-dev on behalf of 
>>>>>> Sciampacone, Ryan" <hotspot-gc-dev-bounces at openjdk.java.net on 
>>>>>> behalf of sci at amazon.com> wrote:
>>>>>>           1) Yes this was a conscious decision.  There was 
>>>>>> discussion on determining the optimal point for breakup but given 
>>>>>> the existing sizes this seemed sufficient.  This doesn't preclude 
>>>>>> the ability to go down that path if its deemed absolutely 
>>>>>> necessary.  The path for more complex decisions is now available.
>>>>>>           2) Agree
>>>>>>           3) I'm not clear here.  Do you mean effectively going 
>>>>>> direct to ZHeap and bypassing the single function PinAllocating?  
>>>>>> Agree. Otherwise I'll ask you to be a bit clearer.
>>>>>>           4) Agree
>>>>>>           5) I initially had the exact same reaction but I played 
>>>>>> around with a few other versions (including breaking up 
>>>>>> initialization points between header and body to get the desired 
>>>>>> results) and this ended up looking correct.  I'll try mixing in 
>>>>>> the mem clearer function again (fresh start) to see if it looks 
>>>>>> any better.
>>>>>>           On 7/8/19, 5:49 AM, "Per Liden" <per.liden at oracle.com> 
>>>>>> wrote:
>>>>>>               Hi Ryan,
>>>>>>               A few general comments:
>>>>>>               1) It looks like this still only work for large pages?
>>>>>>               2) The log_info stuff should be removed.
>>>>>>               3) I'm not a huge fan of single-use utilities like 
>>>>>> PinAllocating, at
>>>>>>               least not when, IMO, the alternative is more 
>>>>>> straight forward and less code.
>>>>>>               4) Please make locals const when possible.
>>>>>>               5) Duplicating _do_zero looks odd. Injecting a "mem 
>>>>>> clearer", similar to
>>>>>>               what Stefans original patch did, seems worth exploring.
>>>>>>               cheers,
>>>>>>               /Per
>>>>>>               (Btw, I'm on vacation so I might not be 
>>>>>> super-responsive to emails)
>>>>>>               On 2019-07-08 12:42, Erik Österlund wrote:
>>>>>>               > Hi Ryan,
>>>>>>               >
>>>>>>               > This looks good in general. Just some stylistic 
>>>>>> things...
>>>>>>               >
>>>>>>               > 1) In the ZGC project we like the letter 'Z' so 
>>>>>> much that we put it in
>>>>>>               > front of everything we possibly can, including all 
>>>>>> class names.
>>>>>>               > 2) We also explicitly state things are private 
>>>>>> even though it's
>>>>>>               > bleedingly obvious.
>>>>>>               >
>>>>>>               > So:
>>>>>>               >
>>>>>>               > 39 class PinAllocating {
>>>>>>               > 40 HeapWord* _mem;
>>>>>>               > 41 public: -> 39 class ZPinAllocating { 40 
>>>>>> private: 41 HeapWord* _mem;
>>>>>>               >    42
>>>>>>               >   41 public: I can be your sponsor and push this 
>>>>>> change for you. I don't
>>>>>>               > think there is a need for another webrev for my 
>>>>>> small stylistic remarks,
>>>>>>               > so I can just fix that before pushing this for 
>>>>>> you. On that note, I'll
>>>>>>               > add me and StefanK to the contributed-by section 
>>>>>> as we all worked out
>>>>>>               > the right solution to this problem 
>>>>>> collaboratively. I have run through
>>>>>>               > mach5 tier1-5, and found no issues with this 
>>>>>> patch. Thanks, /Erik
>>>>>>               >
>>>>>>               > On 2019-07-05 17:18, Sciampacone, Ryan wrote:
>>>>>>               >> http://cr.openjdk.java.net/~phh/8227226/webrev.00/
>>>>>>               >> https://bugs.openjdk.java.net/browse/JDK-8227226
>>>>>>               >>
>>>>>>               >> This patch introduces safe point checks into 
>>>>>> array clearing during
>>>>>>               >> allocation for ZGC.  The patch isolates the 
>>>>>> changes to ZGC as (in
>>>>>>               >> particular with the more modern collectors) the 
>>>>>> approach to
>>>>>>               >> incrementalizing or respecting safe point checks 
>>>>>> is going to be
>>>>>>               >> different.
>>>>>>               >>
>>>>>>               >> The approach is to keep the region holding the 
>>>>>> array in the allocating
>>>>>>               >> state (pin logic) while updating the color to the 
>>>>>> array after checks.
>>>>>>               >>
>>>>>>               >> Can I get a review?  Thanks.
>>>>>>               >>
>>>>>>               >> Ryan
>>>>>>               >
>>>>>>
>>