RFR: 8227226: Segmented array clearing for ZGC

Thu Aug 1 10:19:22 UTC 2019

Hi Thomas,

On 8/1/19 11:43 AM, Thomas Schatzl wrote:
> On 01.08.19 01:28, Per Liden wrote:
>> Hi Thomas,
>>
>> On 7/31/19 7:59 PM, Thomas Schatzl wrote:
>>> Hi,
>>>
>>> On 31.07.19 10:19, Per Liden wrote:
>>>> Hi,
>>>>
>>>> I found some time to benchmark the "GC clears pages"-approach, and 
>>>> it's fairly clear that it's not paying off. So ditching that idea.
>>>>
>>>> However, I'm still looking for something that would not just do 
>>>> segmented clearing of arrays in large zpages. Letting oop arrays 
>>>> temporarily be typed arrays while it's being cleared could be an 
>>>> option. I did a prototype for that, which looks like this:
>>>>
>>>> http://cr.openjdk.java.net/~pliden/8227226/webrev.1
>>>>
>>>> There's at least one issue here, the code doing allocation sampling 
>>>> will see that we allocated long arrays instead of oop arrays, so the 
>>>> reporting there will be skewed. That can be addressed if we go down 
>>>> this path. The code is otherwise fairly simple and contained. Feel 
>>>> free to spot any issues.
>>>
>>>    that looks like a really neat way of doing this.
>>>
>>> Looking over this there does not seem to be any real dependency on 
>>> ZGC code, so if you went this way, would it be possible to provide 
>>> this solution for all collectors?
>>
>> This is potentially dangerous for any GC doing concurrent 
>> oop_iterate(), as in that case the klass pointer must only be read 
>> once, with acquire ordering.
>>
>> An example in G1 where this would break is 
>> HeapRegion::do_oops_on_memregion_in_humongous(), and I'm thinking 
>> there are more cases. 
> 
> Point taken, you are completely right, I was not thinking it through.
> 
> However for humongous objects it might be sufficient to just zero 
> manually in a loop with basically the same safepoint polling loop while 
> the klass is still NULL (and make sure it is not done again later).
> 
> Of course, also making sure that these seemingly empty regions are not 
> reclaimed during the safepoint somehow in a different way. :)

Yes, something like that could probably be done, and it's not completely 
different from Stefan's original patch for this where he pinned the page 
  (stopping it from being collected) while it was being cleared.

However, for ZGC, I'd really like to solve this problem for all arrays, 
not just those allocated in large zpages, which is why I've been keen on 
exploring some other options.

> 
>  > For example, when a half zeroed type array in young is
>  > promoted to old, and then we switch the klass pointer.
> 
> In G1 we are probably not so much worried by "large" objects into young 
> gen - while 16M max object size takes some time to clear, only handling 
> the humongous objects would already help a lot I believe.
> 
> Actually another approach could be the GC completing the zeroing in 
> parallel for young gen objects - at that time it does have all memory 
> bandwidth for itself. Which would at least improve the situation unless 
> many threads do that at the same time (still these objects may be 16m in 
> size max).
> 
> Or just guaranteeing that such objects stay in survivor "zeroing" 
> regions during a gc (in case of evac failure, do the work in the pause). 
> Another option would be delaying refinement for cards in these regions 
> if after gc we have such objects until completed (which may be not 
> enough due to memory visibility issues, but I just like that idea right 
> now :) ).
> 
> It is unclear if such large effort makes sense though, and probably 
> there are better options with a bit more thought :).
> 
>> I wouldn't be surprised if CMS have similar problems, but haven't check.
> 
> At this time I would not spend time on any new feature for CMS that is 
> not absolutely necessary.
> 
>> However, this would probably work fine for Serial and Parallel. On the 
>> other hand, depending on the performance impact, it's not completely 
>> obvious that you'd want it there.
>>
>> We could perhaps add this code to the shared ObjArrayAllocator, and 
>> introduce a CollectedHeap::supports_segmented_array_clearing() so that 
>> GCs can easily opt-in when they are ready to do so.
> 
> Not sure. It is probably worth looking into how this would work in the 
> other collectors in a different CR, I would keep it ZGC local for now 
> after all.

I agree. I'd like to keep this ZGC-specific for now. A future RFE could 
look into bringing this feature (perhaps solved in a different way) to 
other collectors, if deemed important.

cheers,
Per

> 
>>>
>>> For other collectors slightly larger segment sizes might be 
>>> sufficient too to slightly favor performance.
>>>
>>> Did you measure the impact on zeroing throughput of this?
>>
>> I haven't done any performance measurements of this yet. The current 
>> 4K segment size was just an educated guess, but it might not be the 
>> optimal number.
>>
> 
> Okay, thanks.
> 
> Thomas.