JDK 9 RFR of 6375303: Review use of caching in BigDecimal

Sat Mar 22 09:01:10 UTC 2014

On 03/22/2014 02:01 AM, Brian Burkhalter wrote:
> On Mar 20, 2014, at 12:49 AM, Aleksey Shipilev 
> <aleksey.shipilev at oracle.com <mailto:aleksey.shipilev at oracle.com>> wrote:
>
>> On 03/20/2014 11:06 AM, Peter Levart wrote:
>>> I was thinking about last night, for question: "Why is this
>>> double-checked non-volatile-then-volatile trick not any faster than pure
>>> volatile variant even on ARM platform where volatile read should have
>>> some penalty compared to normal read?", might be in the fact that
>>> Raspberry Pi is a single-core/single-thread "machine". Would anyone with
>>> JVM JIT compiler expertise care to share some insight? I suspect that on
>>> such platform, the compiler optimizes volatile accesses so that they are
>>> performed without otherwise necessary memory fences...
>>
>> Yes, at least C2 is known to not emit memory fences on uniprocessor
>> machines. You need to have a multicore ARM. If you are still interested,
>> contact me privately and I can arrange the access to my personal
>> quad-core Cortex-A9.
>
> This is all very interesting but I would like to have closure so to 
> speak on the patches that I posted.

Hi Brian,

> I think the questions still unresolved are:
>
> 1) Is a change of toString() necessary?

I can't speak of that since I don't have empirical data in what 
proportion of code it would be beneficial. I can imagine situations 
where returned strings are used as keys in HashMap and String's equals 
is optimized for same-instance comparison, but I don't know if such 
situations happen in practice...

> 2) If #1 is “yes” then which version of the patch? (I think the most 
> recent)

Aleksey Shipilev kindly let me access it's quad-code ARM "Big Iron" and 
before the connection to it broke yesterday, I managed to get the 
results of a single-threaded run of the microbenchmark. While 
differences were not spectacular, they were measurable (sorry I haven't 
saved the results). The variant with volatile stringCache field and CAS 
was a little slower than the double-checked 
nonvolatile-then-volatile-read-and-CAS trick.

It's also important that the fast-path method does not exceed the 
maximum bytecode size for inlining (35 bytes by default, I think) so the 
variant with two methods toString/toStringSlow is necessary to avoid 
performance regression.

> 3) Are the other changes OK which are unrelated to toString()?

Looks good. Just a nit. In the following method:

3726     private static void matchScale(BigDecimal[] val) {
3727         if (val[0].scale == val[1].scale) {
3728             // return
3729         } else if (val[0].scale < val[1].scale) {
3730             val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY);
3731         } else if (val[1].scale < val[0].scale) {
3732             val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY);
3733         }
3734     }

One of 3 ifs is superfluous. Either the 1st one:

     private static void matchScale(BigDecimal[] val) {
         */* if (val[0].scale == val[1].scale) {
             // return
         } else */*  if (val[0].scale < val[1].scale) {
             val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY);
         } else if (val[1].scale < val[0].scale) {
             val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY);
         }
     }

...or the last one:

     private static void matchScale(BigDecimal[] val) {
         if (val[0].scale == val[1].scale) {
             // return
         } else if (val[0].scale < val[1].scale) {
             val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY);
         } else*/* if (val[1].scale < val[0].scale) */*  {
             val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY);
         }
     }

Regards, Peter

>
> Thanks,
>
> Brian