RFR (M): Thread-local buffers for liveness data

Roman Kennke rkennke at redhat.com
Thu Jan 5 10:44:33 UTC 2017


Good work, please push!

Roman

Am Mittwoch, den 04.01.2017, 23:23 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> We know from mark-compact performance work that liveness computation
> takes a
> non-negligible part of marking time.
> 
> If you look into profiles for the application with large dataset,
> then you can
> clearly see the atomic "lock xadd" from SHRegion::increase_live_data
> in
> hotspots. It is a hotspot for both plain latency and contention
> reasons, even on
> a moderately sized x86.
> 
> Let's upgrade the one-slot cache into the full-blown thread-local
> buffers for
> liveness data:
>   http://cr.openjdk.java.net/~shade/shenandoah/liveness-threadlocal/w
> ebrev.01/
> 
> Observations:
> 
>  a) One-slot cache gives ~20-40% cache hit rate on most workloads.
> Which means
> every second object does the atomic xadd. My attempts in doing
> smarter
> N-slot/history caching were not fruitful: the long tail flaps happily
> all over
> the place.
> 
>  b) size_t and jint are overkill for the table. Each thread would
> potentially
> touch ${regions}*${sizeof(element)}-sized local table. On my machine,
> 2K size_t
> adds up to 16KB, which is half of L1. With jushort, it is only 4KB.
> In reality,
> most threads would touch only a few elements, and touch the atomic
> add on rare
> overflows.
> 
>  c) Switching live_data from bytes to HeapWords helps to expand the
> buffering
> capacity.
> 
>  d) With 8 threads, we take up 4*8 = +32KB of additional space. I
> would expect
> that our region count to grow sub-linearly with thread counts, and so
> for 128
> threads, it would be +512KB for all threads.
> 
>  e) Performance-wise, SPECjvm2008 is not affected (LDS is way too
> low);
> 
>  f) Mark tests that retain large object graphs benefit a lot. With
> "aggressive"
> heuristics, and large tree with 10M nodes:
> 
> Baseline, conc mark times:
>   35.99 s (avg =   105.24 ms)  (num =   342)
>   35.90 s (avg =   108.47 ms)  (num =   331)
>   35.98 s (avg =   103.69 ms)  (num =   347)
>   36.08 s (avg =   104.89 ms)  (num =   344)
>   36.09 s (avg =   104.90 ms)  (num =   344)
> 
> Patched, conc mark times:
>   33.68 s (avg =    83.37 ms)  (num =   404)
>   33.69 s (avg =    84.64 ms)  (num =   398)
>   33.67 s (avg =    83.77 ms)  (num =   402)
>   33.71 s (avg =    82.01 ms)  (num =   411)
>   33.65 s (avg =    85.41 ms)  (num =   394)
> 
> (lower times => more frequent marks under "aggressive")
> 
> Testing: hotspot_gc_shenandoah, SPECjvm2008, targeted benchmarks
> 
> Thanks,
> -Aleksey
> 


More information about the shenandoah-dev mailing list