ObjectSynchronizer iterate only in-use monitors?
David Holmes
david.holmes at oracle.com
Thu May 11 01:53:45 UTC 2017
Hi Roman,
On 11/05/2017 6:41 AM, Roman Kennke wrote:
> Hello,
>
> I have a question related to ObjectSynchronizer. We (the Shenandoah GC
> devs) found that for some programs, scanning ObjectSynchronizer roots
> takes quite long. ObjectSynchronizer::oops_do() scans all the blocks in
> gBlockList. As far as I understand, this contains all the monitor blocks
> of all threads, both currently in-use and free blocks.
>
> If I understand it correctly, it would be sufficient to scan only in-use
> monitors. And since each thread has its own in-use list (at least with
> MonitorInUseLists), it should be ok to scan that during each thread's
> scan, plus one additional scan of the gOmInUseList.
That seems reasonable to me. It sounds like the scanning code was not
updated to reflect the introduction of MonitorInUseLists.
> I am writing here because I would like to get confirmation that what I'm
> doing is sane, or if there are any pitfalls that I'm not aware of. The
> webrev in question (against shenandoah/jdk9) is this:
>
> http://cr.openjdk.java.net/~rkennke/fastsyncroots/webrev.00/
> <http://cr.openjdk.java.net/%7Erkennke/fastsyncroots/webrev.00/>
>
> I tested it by running with SPECjvm2008 and jcstress and found no
> ill-effects.
>
> Performance-wise it makes a very significant difference (running
> gc-bench's roots.Sync test, which exaggerates synchronizer usage):
>
> baseline:
> [14,393s][info][gc,stats] S: Thread Roots = 0,34 s (a
> = 37748 us) (n = 9) (lvls, us = 36523, 36523, 36914,
> 37305, 42215)
> [14,393s][info][gc,stats] S: Synchronizer Roots = 0,14 s (a
> = 15115 us) (n = 9) (lvls, us = 9746, 10938, 14258,
> 14648, 25847)
> [14,393s][info][gc,stats] UR: Thread Roots = 0,22 s (a
> = 24967 us) (n = 9) (lvls, us = 12305, 24219, 25977,
> 27148, 27758)
> [14,393s][info][gc,stats] UR: Synchronizer Roots = 0,11 s (a
> = 11906 us) (n = 9) (lvls, us = 8340, 9082, 12109,
> 12695, 13787)
>
> patched:
> [14,293s][info][gc,stats] S: Thread Roots = 0,36 s (a
> = 40365 us) (n = 9) (lvls, us = 32031, 32031, 34570,
> 37109, 67224)
> [14,293s][info][gc,stats] S: Synchronizer Roots = 0,00 s (a
> = 0 us) (n = 9) (lvls, us = 0, 0,
> 0, 0, 0)
> [14,294s][info][gc,stats] UR: Thread Roots = 0,22 s (a
> = 24459 us) (n = 9) (lvls, us = 15820, 20508, 22070,
> 26172, 32573)
> [14,294s][info][gc,stats] UR: Synchronizer Roots = 0,00 s (a
> = 0 us) (n = 9) (lvls, us = 0, 0,
> 0, 0, 0)
>
> Notice how thread roots scanning goes a little bit up, but by far not as
> much as sync root scanning goes down.
>
> If you think what I'm doing is sane, this might even be useful for other
> GCs (although they're probably not as much bound by roots scanning as
> Shenandoah is).
I would not want to see such a Shenandoah specific patch if we can avoid
it. I think this may be of general benefit but am not a GC person so
best to ask on the hotspot-gc-dev list as well (cc'd).
Thanks,
David
-----
> Thanks, Roman
>
More information about the hotspot-gc-dev
mailing list