ObjectSynchronizer iterate only in-use monitors?

Tue May 16 07:43:38 UTC 2017

Am 16.05.2017 um 08:52 schrieb David Holmes:
> Correction ...
>
> On 16/05/2017 2:49 PM, David Holmes wrote:
>> On 16/05/2017 12:52 AM, Roman Kennke wrote:
>>> Am 11.05.2017 um 09:44 schrieb Robbin Ehn:
>>>> Hi,
>>>>
>>>> We have actually been discussing this last few days:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8153224
>>>
>>> We are seeing the exact same issue as mentioned in the bug report too.
>>> Sometimes, time-to-safepoint is very high, in the area of ~200ms. The
>>> reason for this seems to be deflate_idle_monitors().
>>>
>>> In the current implementation, the VM thread iterates over all threads,
>>> and deflates each thread's idle monitors one after another.
>>>
>>> I would rather have each Java thread deflate its own idle monitors
>>> before arriving at a safepoint. This should be much faster because
>>> higher chances a thread has its own monitors in cache, and deflation
>>> processing is done in parallel too. This requires some thought... e.g.
>>> what to do with non-runnable threads.
>>
>> This is risky and will need very careful analysis. The
>> type-stable-memory property of Monitors depends on very careful
>
> Monitors don't use TSM - ignore that part. :)
>
>> synchronization and the fact that deflation only happens at safepoints.
>
> This might not be as fragile as I was initially thinking, but a good
> design walkthrough of Monitor-lifecycle will be essential for
> understanding any subtleties of existing and proposed approaches.

I realized that monitor deflation depends strictly on all Java threads
having arrived at safepoint. If a thread starts deflating its monitors
while other threads are still in the process of getting to a safepoint,
those other threads could mess with the monitors that the first threads
is attempting to deflate. This is a no-go.

However, I devised a scheme that works well: instead of letting the VM
thread deflate all idle monitors after all threads arrived, and doing so
single-threaded, I am now letting GC worker threads deflate idle
monitors right before scanning/processing the synchronizer roots. E.g.:

http://cr.openjdk.java.net/~rkennke/deflate-per-thread/webrev.01/
<http://cr.openjdk.java.net/%7Erkennke/deflate-per-thread/webrev.01/>

I can probably collapse monitor deflation and iteration into one pass too.

This approach should be safe, right? I tested it with jcstress, specjvm
and specjbb, with no ill effects so far.

What do you think?

Pause times also improve nicely in my experiments:

Before:

Initial Mark Pauses (G)     =    12.76 s (a =    85047 us)
Initial Mark Pauses (N)     =     0.96 s (a =     6393 us)

Patched:

Initial Mark Pauses (G)     =     4.49 s (a =    31814 us)
Initial Mark Pauses (N)     =     3.70 s (a =    26206 us)

The (G) timing includes TTSP, the (N) timing is purely GC work. You can
see that before it was very heavily dominated by TTSP, which was mostly
deflating idle monitors. This work is now moved into the GC phase, and
since it parallelizes better, the total pause is less than half compared
to before.

Roman