RFR (M): Thread-local buffers for liveness data
Roman Kennke
rkennke at redhat.com
Thu Jan 5 10:44:33 UTC 2017
Good work, please push!
Roman
Am Mittwoch, den 04.01.2017, 23:23 +0100 schrieb Aleksey Shipilev:
> Hi,
>
> We know from mark-compact performance work that liveness computation
> takes a
> non-negligible part of marking time.
>
> If you look into profiles for the application with large dataset,
> then you can
> clearly see the atomic "lock xadd" from SHRegion::increase_live_data
> in
> hotspots. It is a hotspot for both plain latency and contention
> reasons, even on
> a moderately sized x86.
>
> Let's upgrade the one-slot cache into the full-blown thread-local
> buffers for
> liveness data:
> http://cr.openjdk.java.net/~shade/shenandoah/liveness-threadlocal/w
> ebrev.01/
>
> Observations:
>
> a) One-slot cache gives ~20-40% cache hit rate on most workloads.
> Which means
> every second object does the atomic xadd. My attempts in doing
> smarter
> N-slot/history caching were not fruitful: the long tail flaps happily
> all over
> the place.
>
> b) size_t and jint are overkill for the table. Each thread would
> potentially
> touch ${regions}*${sizeof(element)}-sized local table. On my machine,
> 2K size_t
> adds up to 16KB, which is half of L1. With jushort, it is only 4KB.
> In reality,
> most threads would touch only a few elements, and touch the atomic
> add on rare
> overflows.
>
> c) Switching live_data from bytes to HeapWords helps to expand the
> buffering
> capacity.
>
> d) With 8 threads, we take up 4*8 = +32KB of additional space. I
> would expect
> that our region count to grow sub-linearly with thread counts, and so
> for 128
> threads, it would be +512KB for all threads.
>
> e) Performance-wise, SPECjvm2008 is not affected (LDS is way too
> low);
>
> f) Mark tests that retain large object graphs benefit a lot. With
> "aggressive"
> heuristics, and large tree with 10M nodes:
>
> Baseline, conc mark times:
> 35.99 s (avg = 105.24 ms) (num = 342)
> 35.90 s (avg = 108.47 ms) (num = 331)
> 35.98 s (avg = 103.69 ms) (num = 347)
> 36.08 s (avg = 104.89 ms) (num = 344)
> 36.09 s (avg = 104.90 ms) (num = 344)
>
> Patched, conc mark times:
> 33.68 s (avg = 83.37 ms) (num = 404)
> 33.69 s (avg = 84.64 ms) (num = 398)
> 33.67 s (avg = 83.77 ms) (num = 402)
> 33.71 s (avg = 82.01 ms) (num = 411)
> 33.65 s (avg = 85.41 ms) (num = 394)
>
> (lower times => more frequent marks under "aggressive")
>
> Testing: hotspot_gc_shenandoah, SPECjvm2008, targeted benchmarks
>
> Thanks,
> -Aleksey
>
More information about the shenandoah-dev
mailing list