RFR: 8310929: Optimization for Integer.toString [v13]

Thu Aug 31 20:08:04 UTC 2023

On Tue, 18 Jul 2023 01:49:17 GMT, 温绍锦 <duke at openjdk.org> wrote:

>> Optimization for:
>> Integer.toString
>> Long.toString
>> StringBuilder#append(int)
>> 
>> # Benchmark Result
>> 
>> 
>> sh make/devkit/createJMHBundle.sh
>> bash configure --with-jmh=build/jmh/jars
>> make test TEST="micro:java.lang.Integers.toString*" 
>> make test TEST="micro:java.lang.Longs.toString*" 
>> make test TEST="micro:java.lang.StringBuilders.toStringCharWithInt8"
>> 
>> 
>> ## 1. [aliyun_ecs_c8i.xlarge](https://help.aliyun.com/document_detail/25378.html#c8i)
>> * cpu : intel xeon sapphire rapids (x64)
>> 
>> ``` diff
>> -Benchmark               (size)  Mode  Cnt  Score   Error  Units (baseline)
>> -Integers.toStringBig       500  avgt   15  6.825 ± 0.023  us/op
>> -Integers.toStringSmall     500  avgt   15  4.823 ± 0.023  us/op
>> -Integers.toStringTiny      500  avgt   15  3.878 ± 0.101  us/op
>> 
>> +Benchmark               (size)  Mode  Cnt  Score   Error  Units (PR Update 04 f4aa1989)
>> +Integers.toStringBig       500  avgt   15  6.002 ± 0.054  us/op (+13.71%)
>> +Integers.toStringSmall     500  avgt   15  4.025 ± 0.020  us/op (+19.82%)
>> +Integers.toStringTiny      500  avgt   15  3.874 ± 0.067  us/op (+0.10%)
>> 
>> -Benchmark            (size)  Mode  Cnt  Score   Error  Units (baseline)
>> -Longs.toStringBig       500  avgt   15  9.224 ± 0.021  us/op
>> -Longs.toStringSmall     500  avgt   15  4.621 ± 0.087  us/op
>> 
>> +Benchmark            (size)  Mode  Cnt  Score   Error  Units (PR Update 04 f4aa1989)
>> +Longs.toStringBig       500  avgt   15  7.483 ± 0.018  us/op (+23.26%)
>> +Longs.toStringSmall     500  avgt   15  4.020 ± 0.016  us/op (+14.95%)
>> 
>> -Benchmark                           Mode  Cnt     Score    Error  Units (baseline)
>> -StringBuilders.toStringCharWithInt8 avgt   15    89.327 ±  0.733  ns/op
>> 
>> +Benchmark                            Mode  Cnt   Score   Error  Units (PR Update 04 f4aa1989)
>> +StringBuilders.toStringCharWithInt8  avgt   15  36.639 ± 0.422  ns/op (+143.80%)
>> 
>> 
>> 
>> ## 2. [aliyun_ecs_c8a.xlarge](https://help.aliyun.com/document_detail/25378.html#c8a)
>> * cpu : amd epc genoa (x64)
>> 
>> ``` diff
>> -Benchmark               (size)  Mode  Cnt  Score   Error  Units (baseline)
>> -Integers.toStringBig       500  avgt   15  6.753 ± 0.007  us/op
>> -Integers.toStringSmall     500  avgt   15  4.470 ± 0.005  us/op
>> -Integers.toStringTiny      500  avgt   15  2.764 ± 0.020  us/op
>> 
>> +Benchmark               (size)  Mode  Cnt  Score   Error  Units (PR Update 04 f4aa1989)
>> +Integers.toStringBig       500  avgt   15  5.036 ± 0.005  us/op (+...
>
> 温绍锦 has updated the pull request incrementally with one additional commit since the last revision:
> 
>   assert bounds check

While I share the wariness against lookup tables (and the microbenchmarks that lend them too much credence), I think the approach in this PR is mostly an incremental improvement on an existing design and should be evaluated as such. By packing two lookup tables into one we reduce the opportunity for cache misses. The new lookup table is no larger as the two existing ones, so we're not wasting memory or sacrificing density. Having it in a single table makes it less likely that the arrays will end up in different locations on the heap or in compiled code. I see mostly positives here.

The main drawback is the proliferation of `Unsafe` usage.

Switching out the lookup table-based algorithm for something clever without a lookup table is laudable, but comes with a new set of challenges and should be done as a follow up. Since these tables can be aggressively constant-folded into the code section by our JITs it might even turn out to be a wash. 

If @RogerRiggs is happy with asserts in lieu of explicit bounds checking then I move to approve this.

src/java.base/share/classes/java/lang/StringUTF16.java line 1585:

> 1583:                     buf,
> 1584:                     Unsafe.ARRAY_BYTE_BASE_OFFSET + (charPos << 1),
> 1585:                     PACKED_DIGITS_UTF16[r]);

What performance would you get if you used the same lookup table as the other implementations, but inflate the value to UTF-16 on the fly?

int packed = (int)Integer.PACKED_DIGITS[r];
int inflated = ((packed & 0xFF00) << 8) & (packed & 0xFF));
UNSAFE.putIntUnaligned(
                    buf,
                    Unsafe.ARRAY_BYTE_BASE_OFFSET + (charPos << 1),
                    inflated);

This would avoid juggling more lookup table data around than before, alleviating some of the concerns voiced in this PR comment thread.

-------------

PR Review: https://git.openjdk.org/jdk/pull/14699#pullrequestreview-1605608821
PR Review Comment: https://git.openjdk.org/jdk/pull/14699#discussion_r1312156127