RFR: 8315585: Optimization for decimal to string [v7]

Sat Oct 21 11:59:40 UTC 2023

On Thu, 19 Oct 2023 11:58:26 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> @cl4es 
>> 
>>> Good, narrows it down to what's going on in `prepend(long, byte[], String)`. Might boil down to `System.arraycopy`. This method might not be optimized for tiny arrays on all platforms. Specializing for single-char case:
>>> 
>>> ```java
>>> diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java
>>> index 9b19d7e2ac1..6eb70925dab 100644
>>> --- a/src/java.base/share/classes/java/lang/String.java
>>> +++ b/src/java.base/share/classes/java/lang/String.java
>>> @@ -4723,7 +4723,11 @@ static void repeatCopyRest(byte[] buffer, int offset, int limit, int copied) {
>>>       */
>>>      void getBytes(byte[] dst, int dstBegin, byte coder) {
>>>          if (coder() == coder) {
>>> -            System.arraycopy(value, 0, dst, dstBegin << coder, value.length);
>>> +            if (value.length == 1) {
>>> +                dst[(dstBegin << coder)] = value[0];
>>> +            } else {
>>> +                System.arraycopy(value, 0, dst, dstBegin << coder, value.length);
>>> +            }
>>>          } else {    // this.coder == LATIN && coder == UTF16
>>>              StringLatin1.inflate(value, 0, dst, dstBegin, value.length);
>>>          }
>>> ```
>>> 
>>> .. seem to help the JIT do the right thing consistently, too:
>>> 
>>> ```
>>> Benchmark                                 Mode  Cnt   Score   Error  Units
>>> BigDecimals.testSmallToEngineeringString  avgt   50  11,757 ± 0,480  ns/op
>>> ```
>> 
>> In addition to #16244, will you submit a PR for this?
>
>> In addition to #16244, will you submit a PR for this?
> 
> Once both #16244 and this has been integrated I want to revisit this. The effect of changing `getBytes(byte[], int, byte)` might have disappeared with #16244 since it better guarantees the JIT will constant fold the prefixes thoroughly. We've observed related issues with System.arraycopy, however, see https://bugs.openjdk.org/browse/JDK-8295496 - so I want to evaluate a few different options here, time allowing.

@cl4es I found that StringConcatFactory.makeConcatWithConstants will become slower when using recipe("\1.\1", long.class, long.class). If we treat the prefix of length 1 as char, it will become faster.

In this branch https://github.com/wenshao/jdk/tree/optim_decimal_to_string_x2 , if we remove the code that treats the prefix of length 1 as char in the preparer(String prefix, Class<?> cl) method, it will become slower.

package java.lang.invoke;

class StringConcatFactory {
    static MethodHandle prepender(String prefix, Class<?> cl) {
        if (prefix == null || prefix.isEmpty()) {
            return noPrefixPrepender(cl);
        } else {
            // remove this will slower
            if (prefix.length() == 1
                    && (cl == char.class || cl == int.class || cl == long.class)) {
                char ch = prefix.charAt(0);
                MethodHandle prepend = JLA.stringConcatHelper(
                        "prepend",
                        methodType(long.class, long.class, byte[].class, cl, char.class)).rebind();
                return MethodHandles.insertArguments(prepend, 3, ch);
            }

            return MethodHandles.insertArguments(
                    prepender(cl), 3, prefix);
        }
    }
}

Here are the performance numbers running on the MaxBook M1 Max:

-Benchmark                             Mode  Cnt   Score   Error  Units (String prefix)
-BigDecimals.smallScale3ToPlainString  avgt   15  28.898 ? 0.527  ns/op

+Benchmark                             Mode  Cnt   Score   Error  Units (char prefix)
+BigDecimals.smallScale3ToPlainString  avgt   15  23.182 ? 0.477  ns/op

Maybe you should submit another PR to improve StringConcatFactory.makeConcatWithConstants

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16006#issuecomment-1773768539