JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Peter Levart
peter.levart at gmail.com
Mon Mar 24 09:21:55 UTC 2014
On 03/20/2014 08:49 AM, Aleksey Shipilev wrote:
> On 03/20/2014 11:06 AM, Peter Levart wrote:
>> I was thinking about last night, for question: "Why is this
>> double-checked non-volatile-then-volatile trick not any faster than pure
>> volatile variant even on ARM platform where volatile read should have
>> some penalty compared to normal read?", might be in the fact that
>> Raspberry Pi is a single-core/single-thread "machine". Would anyone with
>> JVM JIT compiler expertise care to share some insight? I suspect that on
>> such platform, the compiler optimizes volatile accesses so that they are
>> performed without otherwise necessary memory fences...
> Yes, at least C2 is known to not emit memory fences on uniprocessor
> machines. You need to have a multicore ARM. If you are still interested,
> contact me privately and I can arrange the access to my personal
> quad-core Cortex-A9.
>
> -Aleksey.
Hi,
Thanks to Aleksey for re-establishing the access, I bring you results of
the microbenchmark from his quad-core Cortex-A9:
JDK 8 options: -client, org.openjdk.jmh.Main parameters: ".*" -i 10 -r 5 -wi 5 -w 1 -f 1 [-t 1|max]
--- Baseline, 1-thread ---
Benchmark Mode Samples Mean Mean error Units
o.t.Bench6375303.testFirstToString avgt 10 69292.305 299.516 ns/op
o.t.Bench6375303.testToString avgt 10*20.003* 0.433 ns/op
--- Baseline, 4-threads ---
Benchmark Mode Samples Mean Mean error Units
o.t.Bench6375303.testFirstToString avgt 10 100390.024 2158.132 ns/op
o.t.Bench6375303.testToString avgt 10*20.151* 0.677 ns/op
--- double-checked nonvolatile-then-volatile-read+CAS, 1-thread ---
Benchmark Mode Samples Mean Mean error Units
o.t.Bench6375303.testFirstToString avgt 10 69951.406 221.516 ns/op
o.t.Bench6375303.testToString avgt 10*19.681* 0.025 ns/op
--- double-checked nonvolatile-then-volatile-read+CAS, 4-threads ---
Benchmark Mode Samples Mean Mean error Units
o.t.Bench6375303.testFirstToString avgt 10 104231.335 3842.095 ns/op
o.t.Bench6375303.testToString avgt 10*20.030* 0.595 ns/op
--- classic volatile read+CAS, 1-thread ---
Benchmark Mode Samples Mean Mean error Units
o.t.Bench6375303.testFirstToString avgt 10 69753.542 180.110 ns/op
o.t.Bench6375303.testToString avgt 10*23.285* 0.267 ns/op
--- classic volatile read+CAS, 4-threads ---
Benchmark Mode Samples Mean Mean error Units
o.t.Bench6375303.testFirstToString avgt 10 99664.256 1814.090 ns/op
o.t.Bench6375303.testToString avgt 10*23.491* 0.606 ns/op
...as can be seen, the double-checked read-then-volatile-read+CAS trick
is about 15% faster than classic volatile-read+CAS in this case.
Regards, Peter
More information about the core-libs-dev
mailing list