RFR: Fast synchronizer root scanning

Roman Kennke rkennke at redhat.com
Wed May 10 20:15:59 UTC 2017


When scanning synchronizer roots, we are scanning everything in
gBlockList, which, as far as I can tell, are all active monitor blocks
of all threads, plus any free blocks that have piled up. Doing this
appears to be quite inefficient.

I have implemented an alternative way to scan synchronizer roots: each
Thread maintains its own thread-local list of in-use monitors and a
free-list. The global in-use list is only used for 'moribund' threads. I
think it is sufficient scan each Thread's local in-use list plus the
global list. I checked with SPECjvm and jcstress and found no ill
issues. Performance wise, it looks *much* better:

baseline:
[14,393s][info][gc,stats]     S: Thread Roots         =     0,34 s (a
=    37748 us) (n =     9) (lvls, us =    36523,    36523,    36914,   
37305,    42215)
[14,393s][info][gc,stats]     S: Synchronizer Roots   =     0,14 s (a
=    15115 us) (n =     9) (lvls, us =     9746,    10938,    14258,   
14648,    25847)
[14,393s][info][gc,stats]     UR: Thread Roots        =     0,22 s (a
=    24967 us) (n =     9) (lvls, us =    12305,    24219,    25977,   
27148,    27758)
[14,393s][info][gc,stats]     UR: Synchronizer Roots  =     0,11 s (a
=    11906 us) (n =     9) (lvls, us =     8340,     9082,    12109,   
12695,    13787)

patched:
[14,293s][info][gc,stats]     S: Thread Roots         =     0,36 s (a
=    40365 us) (n =     9) (lvls, us =    32031,    32031,    34570,   
37109,    67224)
[14,293s][info][gc,stats]     S: Synchronizer Roots   =     0,00 s (a
=        0 us) (n =     9) (lvls, us =        0,        0,       
0,        0,        0)
[14,294s][info][gc,stats]     UR: Thread Roots        =     0,22 s (a
=    24459 us) (n =     9) (lvls, us =    15820,    20508,    22070,   
26172,    32573)
[14,294s][info][gc,stats]     UR: Synchronizer Roots  =     0,00 s (a
=        0 us) (n =     9) (lvls, us =        0,        0,       
0,        0,        0)


I.e. even though the work of scanning synchronizer roots is added to
thread roots scanning, thread-roots scanning does not very significantly
take longer, but sync roots scanning drops to 0 (there's usually nothing
to do for global in-use list).

I guarded this new behaviour with -XX:+ShenandoahFastSyncRoots, and
defaulting to false for now. This will allow us to do more testing and
get feedback from hotspot-runtime-dev, before enabling it by default.

http://cr.openjdk.java.net/~rkennke/fastsyncroots/webrev.00/
<http://cr.openjdk.java.net/%7Erkennke/fastsyncroots/webrev.00/>

Testing: hotspot_gc_shenandoah, jcstress -m quick, jmh-specjvm

Ok?




More information about the shenandoah-dev mailing list