RFR (M): Optimize object/array marking with bit-stealing task encoding
Aleksey Shipilev
shade at redhat.com
Mon Jan 16 15:46:20 UTC 2017
Hi,
Our mark stack contains ObjArrayFromToTask instances, which is are the tuples
<oop, from, to>. For arrays, from/to are describing the chunk to process. For
objects, from is always -1, indicating no chunk is expected.
Since HS taskqueue employs copying constructors to poll/push the tasks from/to
the queue, this means we always copy from/to fields, and the queue footprint
also always includes from/to fields. This is excessive for a prevailing case of
regular oop marking. This is an attempt to improve the case for regular oops,
without regressing parallel array processing:
http://cr.openjdk.java.net/~shade/shenandoah/mark-objtask-regular/webrev.02/
This patch improves concurrent mark times significantly for regular oops:
retain.Tree -p size=50000000:
Baseline: Concurrent Marking = 99.17 s (a = 826446 us) (n = 120)
(lvls, us = 806641, 826172, 839844, 841797, 887344)
Patched: Concurrent Marking = 93.77 s (a = 774975 us) (n = 121)
(lvls, us = 753906, 771484, 785156, 787109, 837818)
...and also ever-so-slightly improving for object arrays:
retain.RefArray -p size=2000000000:
Baseline: Concurrent Marking = 157.29 s (a = 741921 us) (n = 212)
(lvls, us = 720703, 740234, 753906, 755859, 822552)
Patched: Concurrent Marking = 158.64 s (a = 734448 us) (n = 216)
(lvls, us = 720703, 734375, 744141, 746094, 764200)
Less targeted workloads also improve concurrent mark times, e.g. Compiler.compiler:
Baseline: Concurrent Marking = 3.87 s (a = 168337 us) (n = 23)
(lvls, us = 93750, 103516, 154297, 232422, 439476)
Patched: Concurrent Marking = 2.53 s (a = 120386 us) (n = 21)
(lvls, us = 76953, 93164, 103516, 125000, 400385)
Testing: hotspot_gc_shenandoah, jcstress tests-all.
Thanks,
-Aleksey
More information about the shenandoah-dev
mailing list