JDK 9 RFR of 6375303: Review use of caching in BigDecimal

Mon Mar 24 09:21:55 UTC 2014

On 03/20/2014 08:49 AM, Aleksey Shipilev wrote:
> On 03/20/2014 11:06 AM, Peter Levart wrote:
>> I was thinking about last night, for question: "Why is this
>> double-checked non-volatile-then-volatile trick not any faster than pure
>> volatile variant even on ARM platform where volatile read should have
>> some penalty compared to normal read?", might be in the fact that
>> Raspberry Pi is a single-core/single-thread "machine". Would anyone with
>> JVM JIT compiler expertise care to share some insight? I suspect that on
>> such platform, the compiler optimizes volatile accesses so that they are
>> performed without otherwise necessary memory fences...
> Yes, at least C2 is known to not emit memory fences on uniprocessor
> machines. You need to have a multicore ARM. If you are still interested,
> contact me privately and I can arrange the access to my personal
> quad-core Cortex-A9.
>
> -Aleksey.

Hi,

Thanks to Aleksey for re-establishing the access, I bring you results of 
the microbenchmark from his quad-core Cortex-A9:

JDK 8 options: -client, org.openjdk.jmh.Main parameters: ".*" -i 10 -r 5 -wi 5 -w 1 -f 1 [-t 1|max]

--- Baseline, 1-thread ---

Benchmark                              Mode   Samples         Mean   Mean error    Units
o.t.Bench6375303.testFirstToString     avgt        10    69292.305      299.516    ns/op
o.t.Bench6375303.testToString          avgt        10*20.003*         0.433    ns/op

--- Baseline, 4-threads ---

Benchmark                              Mode   Samples         Mean   Mean error    Units
o.t.Bench6375303.testFirstToString     avgt        10   100390.024     2158.132    ns/op
o.t.Bench6375303.testToString          avgt        10*20.151*         0.677    ns/op

--- double-checked nonvolatile-then-volatile-read+CAS, 1-thread ---

Benchmark                              Mode   Samples         Mean   Mean error    Units
o.t.Bench6375303.testFirstToString     avgt        10    69951.406      221.516    ns/op
o.t.Bench6375303.testToString          avgt        10*19.681*         0.025    ns/op

--- double-checked nonvolatile-then-volatile-read+CAS, 4-threads ---

Benchmark                              Mode   Samples         Mean   Mean error    Units
o.t.Bench6375303.testFirstToString     avgt        10   104231.335     3842.095    ns/op
o.t.Bench6375303.testToString          avgt        10*20.030*         0.595    ns/op

--- classic volatile read+CAS, 1-thread ---

Benchmark                              Mode   Samples         Mean   Mean error    Units
o.t.Bench6375303.testFirstToString     avgt        10    69753.542      180.110    ns/op
o.t.Bench6375303.testToString          avgt        10*23.285*         0.267    ns/op

--- classic volatile read+CAS, 4-threads ---

Benchmark                              Mode   Samples         Mean   Mean error    Units
o.t.Bench6375303.testFirstToString     avgt        10    99664.256     1814.090    ns/op
o.t.Bench6375303.testToString          avgt        10*23.491*         0.606    ns/op

...as can be seen, the double-checked read-then-volatile-read+CAS trick 
is about 15% faster than classic volatile-read+CAS in this case.

Regards, Peter