RFR(XL): 8203469: Faster safepoints
Robbin Ehn
robbin.ehn at oracle.com
Wed Jan 16 09:45:50 UTC 2019
Thanks Aleksey!
Great that you are seeing such improvement and thanks for the build test.
I added a new logging line in this patch, e.g. :
[93.734s][info][safepoint,stats] Deoptimize: Synchronization: 4899 Operation:
457865 - Total: 462764 Application: 8270801 (ns)
Which might be useful (it is for me), let me know if it's needs reformatting or
is missing information etc.. or useless.
We also don't measure the early prolog with:
380 Universe::heap()->safepoint_synchronize_begin();
381
382 // By getting the Threads_lock, we assure that no threads are about to
start or
383 // exit. It is released again in SafepointSynchronize::end().
384 Threads_lock->lock();
I had a measurement there but it was never any time spent there, something to
keep in mind at least.
/Robbin
On 2019-01-15 21:34, Aleksey Shipilev wrote:
> On 1/15/19 12:33 PM, Robbin Ehn wrote:
>> On 2019-01-15 11:48, Andrew Haley wrote:
>>> On 1/15/19 10:39 AM, Robbin Ehn wrote:
>>>> - On a 16 strand machine synchronization and un-synchronization/starting is at
>>>> least 3x faster (in non-trivial test). Synchronization ~600 -> ~100us and
>>>> starting ~400->~100us.
>>>
>>> Thanks. Could you share that benchmark? It'd be interesting to try it on
>>> other architectures.
>>>
>>
>> Those numbers are calculate from linux perf data, I couldn't remember what I run.
>> But apparently it is also specJVM2008 serial with benchmark thread 16.
>> The reason for serial that is that Aleksey tested a couple of previews with
>> Shenandoah and indicated that serial was a good test.
>> (Aleksey, if you have time please test again)
>
> Hey. I indeed remember testing the early version of the patch, discovered some Serial throughput
> hit, and Robbin had fixed that then. Unfortunately, SPECjvm does not run cleanly with current
> jdk/jdk. We have Serial-like workload driven by JMH. Re-running it with the current jdk/jdk first to
> get the bearings on possible throughput loss (i7-7820X, 16 threads, -Xms4g -Xmx4g):
>
> ZGC, baseline: 19779 ± 79 ops/s
> ZGC, fast safepoints: 19858 ± 89 ops/s ; no loss!
> Shenandoah, baseline: 19929 ± 57 ops/s
> Shenandoah, fast safepoints: 19932 ± 49 ops/s ; no loss!
>
> Excellent, both collectors indicate no throughput loss.
>
> Now, there are two ways to measure the safepointing overhead. First, Shenandoah measures both "net"
> pause time (VM operation time, basically), and "gross" pause time (which includes both entering and
> leaving the safepoint, as well as VM operation time). With longer Serial run to get more GC cycles
> going:
>
> # Shenandoah, baseline
> [info][gc,stats] Total Pauses (G) = 0.47 s (a = 1139 us) (n = 416)
> (lvls, us = 203, 381, 426, 2324, 4705)
> [info][gc,stats] Total Pauses (N) = 0.09 s (a = 216 us) (n = 416)
> (lvls, us = 27, 182, 203, 223, 1258)
>
> # Shenandoah, fast safepoints
> [info][gc,stats] Total Pauses (G) = 0.21 s (a = 500 us) (n = 414)
> (lvls, us = 191, 398, 418, 443, 8992)
> [info][gc,stats] Total Pauses (N) = 0.10 s (a = 229 us) (n = 414)
> (lvls, us = 26, 184, 217, 244, 2327)
>
> Ta-da! Gross pause times dropped twice. Which means that before we spent ~3/4 ms in safepoint infra,
> and now we only spend ~1/4 ms.
>
> The second way is to parse safepoint logs. JMH has -prof safepoints that does it automatically. (It
> also prints percentiles and event counts, omitted here for brevity):
>
> Benchmark Mode Cnt Score Error Units
>
> # ZGC, baseline
> Serial.test thrpt 10 19710.077 ± 96.073 ops/s
> Serial.test:·safepoints.pause thrpt 1260 1310.022 ms
> Serial.test:·safepoints.pause.avg thrpt 1.040 ms
> Serial.test:·safepoints.ttsp thrpt 1260 1053.717 ms
> Serial.test:·safepoints.ttsp.avg thrpt 0.836 ms
>
> # ZGC, fast safepoints
> Serial.test thrpt 10 19904.687 ± 39.679 ops/s
> Serial.test:·safepoints.pause thrpt 1275 291.520 ms
> Serial.test:·safepoints.pause.avg thrpt 0.229 ms
> Serial.test:·safepoints.ttsp thrpt 1275 92.065 ms
> Serial.test:·safepoints.ttsp.avg thrpt 0.072 ms ; 11.6x lower!
>
> # Shenandoah, baseline
> Serial.test thrpt 10 19874.508 ± 47.175 ops/s
> Serial.test:·safepoints.pause thrpt 389 425.723 ms
> Serial.test:·safepoints.pause.avg thrpt 1.094 ms
> Serial.test:·safepoints.ttsp thrpt 389 312.413 ms
> Serial.test:·safepoints.ttsp.avg thrpt 0.803 ms
>
> # Shenandoah, fast safepoints
> Serial.test thrpt 10 19918.094 ± 48.058 ops/s
> Serial.test:·safepoints.pause thrpt 387 159.440 ms
> Serial.test:·safepoints.pause.avg thrpt 0.412 ms
> Serial.test:·safepoints.ttsp thrpt 387 41.941 ms
> Serial.test:·safepoints.ttsp.avg thrpt 0.108 ms ; 7.4x lower!
>
> So, TTSP (time to safepoint) had improved majestically. The rest of "gross" pause time measured by
> Shenandoah is quite probably the safepoint leaving latency.
>
> Also did a spot build checks: the patch builds in fastdebug on aarch64, s390x, x86_32.
>
> -Aleksey
>
More information about the hotspot-dev
mailing list