[15] RFR 8242216: ObjectSampler::weak_oops_do() should not trigger barrier

Wed Apr 8 08:17:48 UTC 2020

On 4/7/20 4:37 PM, Roman Kennke wrote:
> Hi Erik,
> 
> I'm trying to understand it. I am not deeply familiar with ZGC
> relocation, I am assuming (probably wrong) that it's similar to
> Shenandoah evacuation.
> 
> Thread 1 tries to relocate an object in a barrier and fails.
> <switch>
> Thread 2 (Java or GC thread) can still relocate the object into its own
> GCLAB.
> <switch>
> Thread 1 pins the object in the original page and continues to
> read/write to it.
> 
> How's this coordinated?

When pinning an object, ZGC inserts the current object address into the 
forwarding table. The CAS into the forwarding table is the 
synchronization point (like any other forwarding/relocation).

> 
> In Shenandoah, we have a protocol in place that guards all evacuations,
> and if any one thread runs OOM, then it triggers that protocol which
> ensures that all threads have left the evac-scope before declaring the
> failed object the canonical one to use (by simply resolving the
> forwarding pointer, which is safe now). It's basically a readers-writer
> locking scheme.

In ZGC, we don't need any special synchronization for this, as it uses 
the same mechanism as a normal relocation. The only thing that is 
different is that we can't free a page if it has pinned objects.

cheers,
Per

> 
> Roman
> 
>> When ZGC can not satisfy allocations due to relocation in barriers, it
>> pins the object to the from-space page,
>> so that it can not move. It will have to sit tight until the next GC to
>> move.
>>
>> /Erik
>>
>> On 2020-04-07 15:26, Roman Kennke wrote:
>>> Hi Erik,
>>>
>>> I am wondering, how does ZGC deal with the situation when GC runs out of
>>> memory and cannot fulfil the relocation? How does it coordinate Java
>>> threads and GC threads (and possibly other threads) to get out of this
>>> without possibly causing heap corruption?
>>>
>>> Roman
>>>
>>>
>>>> Hi Zhengyu,
>>>>
>>>> This change breaks ZGC. The raw oop may not have been relocated. It was
>>>> not by accident that I used an access load instead of a raw load,
>>>> when I built the leak profiler support.
>>>> Since this kind of issue keeps on popping up, where you can't deal with
>>>> access barriers because of some Shenandoah OOM handler,
>>>> perhaps your barriers need to be fixed instead to deal with these issues
>>>> instead. I predict it is not the last time we have
>>>> to restructure the shared code because of Shenandoah's OOM handler.
>>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>> On 2020-04-06 20:22, Zhengyu Gu wrote:
>>>>> Hi,
>>>>>
>>>>> This is a similar problem as JDK-8237396.
>>>>>
>>>>> Shenandoah does not expect barriers on it GC paths. Otherwise, it
>>>>> causes Shenandoah's OOM handler to fail.
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8242216
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~zgu/JDK-8242216/webrev.00/index.html
>>>>>
>>>>> Test:
>>>>>     tier1 (fastdebug and release) on Linux x86_64
>>>>>     Submit tests.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Zhengyu
>>>>>
>>
>