ObjectSynchronizer iterate only in-use monitors?

Thu May 11 01:53:45 UTC 2017

Hi Roman,

On 11/05/2017 6:41 AM, Roman Kennke wrote:
> Hello,
>
> I have a question related to ObjectSynchronizer. We (the Shenandoah GC
> devs) found that for some programs, scanning ObjectSynchronizer roots
> takes quite long. ObjectSynchronizer::oops_do() scans all the blocks in
> gBlockList. As far as I understand, this contains all the monitor blocks
> of all threads, both currently in-use and free blocks.
>
> If I understand it correctly, it would be sufficient to scan only in-use
> monitors. And since each thread has its own in-use list (at least with
> MonitorInUseLists), it should be ok to scan that during each thread's
> scan, plus one additional scan of the gOmInUseList.

That seems reasonable to me. It sounds like the scanning code was not 
updated to reflect the introduction of MonitorInUseLists.

> I am writing here because I would like to get confirmation that what I'm
> doing is sane, or if there are any pitfalls that I'm not aware of. The
> webrev in question (against shenandoah/jdk9) is this:
>
> http://cr.openjdk.java.net/~rkennke/fastsyncroots/webrev.00/
> <http://cr.openjdk.java.net/%7Erkennke/fastsyncroots/webrev.00/>
>
> I tested it by running with SPECjvm2008 and jcstress and found no
> ill-effects.
>
> Performance-wise it makes a very significant difference (running
> gc-bench's roots.Sync test, which exaggerates synchronizer usage):
>
> baseline:
> [14,393s][info][gc,stats]     S: Thread Roots         =     0,34 s (a
> =    37748 us) (n =     9) (lvls, us =    36523,    36523,    36914,
> 37305,    42215)
> [14,393s][info][gc,stats]     S: Synchronizer Roots   =     0,14 s (a
> =    15115 us) (n =     9) (lvls, us =     9746,    10938,    14258,
> 14648,    25847)
> [14,393s][info][gc,stats]     UR: Thread Roots        =     0,22 s (a
> =    24967 us) (n =     9) (lvls, us =    12305,    24219,    25977,
> 27148,    27758)
> [14,393s][info][gc,stats]     UR: Synchronizer Roots  =     0,11 s (a
> =    11906 us) (n =     9) (lvls, us =     8340,     9082,    12109,
> 12695,    13787)
>
> patched:
> [14,293s][info][gc,stats]     S: Thread Roots         =     0,36 s (a
> =    40365 us) (n =     9) (lvls, us =    32031,    32031,    34570,
> 37109,    67224)
> [14,293s][info][gc,stats]     S: Synchronizer Roots   =     0,00 s (a
> =        0 us) (n =     9) (lvls, us =        0,        0,
> 0,        0,        0)
> [14,294s][info][gc,stats]     UR: Thread Roots        =     0,22 s (a
> =    24459 us) (n =     9) (lvls, us =    15820,    20508,    22070,
> 26172,    32573)
> [14,294s][info][gc,stats]     UR: Synchronizer Roots  =     0,00 s (a
> =        0 us) (n =     9) (lvls, us =        0,        0,
> 0,        0,        0)
>
> Notice how thread roots scanning goes a little bit up, but by far not as
> much as sync root scanning goes down.
>
> If you think what I'm doing is sane, this might even be useful for other
> GCs (although they're probably not as much bound by roots scanning as
> Shenandoah is).

I would not want to see such a Shenandoah specific patch if we can avoid 
it. I think this may be of general benefit but am not a GC person so 
best to ask on the hotspot-gc-dev list as well (cc'd).

Thanks,
David
-----

> Thanks, Roman
>