RFR: 8333893: Optimization for StringBuilder append boolean & null
Emanuel Peter
epeter at openjdk.org
Tue Jun 11 09:37:12 UTC 2024
On Tue, 11 Jun 2024 09:17:00 GMT, Shaojin Wen <duke at openjdk.org> wrote:
>> @wenshao
>>> @eme64 It seems that when the following code uses StringUTF16.putChar, C2's optimization is not as good as the manual merging and storage effect.
>>
>> As I asked above, you will need to provide some evidence / generated assembly / perf data, and logs from `TraceMergeStores`. I currently do not have time to produce these myself, and I think they would be crucial to determine where the missing performance has gone. See my earlier comment:
>> https://github.com/openjdk/jdk/pull/19626#issuecomment-2158533469
>>
>> And please also try @cl4es advide here:
>> https://github.com/openjdk/jdk/pull/19626#issuecomment-2159509806
>>
>> And sure, maybe you need some public API for setting multiple bytes at once, which the `MergeStores` optimization can optimize. I'm a C2 engineer, so I leave that up to the library folks ;)
>
> @eme64 The assembly information is below, can you take a look and see if it can help you diagnose the problem?
>
> * JavaCode
>
> class AbstractStringBuilder {
> private AbstractStringBuilder appendNull() {
> int count = this.count;
> ensureCapacityInternal(count + 4);
> byte[] val = this.value;
> if (isLatin1()) {
> val[count ] = 'n';
> val[count + 1] = 'u';
> val[count + 2] = 'l';
> val[count + 3] = 'l';
> } else {
> StringUTF16.putCharsAt(val, count, 'n', 'u', 'l', 'l');
> }
> this.count = count + 4;
> return this;
> }
> }
>
> class StringUTF16 {
> public static void putCharsAt(byte[] value, int i, char c1, char c2, char c3, char c4) {
> putChar(value, i , c1);
> putChar(value, i + 1, c2);
> putChar(value, i + 2, c3);
> putChar(value, i + 3, c4);
> }
> }
>
>
> * Apple M1 StringBuilder.appendNull PrintAssembly
>
> /Users/wenshao/Work/git/jdk/build/macosx-aarch64-server-release/jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:CompileCommand=compileonly,*StringBuilder.appendNull -XX:-TieredCompilation -XX:TieredStopAtLevel=4 -javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=61041:/Applications/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8 -Dsun.stdout.encoding=UTF-8 -Dsun.stderr.encoding=UTF-8 ....
>
> Compiled method (n/a) 96 1 n java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L (native)
> total in heap [0x0000000102efba08,0x0000000102efbb20] = 280
> relocation [0x0000000102efbae0,0x0000000102efbae8] = 8
> main code [0x0000000102efbb00,0x0000000102efbb20] = 32
>
> [Disassembly]
> --------------------------------------------------------------------------------
> [Constant Pool (empty)]
>
> --------------------------------------------------------------------------------
>
> [Verified Entry Point]
> # {method} {0x000000011c3e1c80} 'linkToStatic' '(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object;' in 'java/lang/invoke/MethodHandle'
> # parm0: c_rarg1:c_rarg1
> = 'java/lang/Object'
> # parm1: c_rarg2:c_rarg2
> = 'java/lang/Object'
> # parm2: c_rarg3:c_rarg3
> = 'java/lang/Object'
> # parm3: c_rarg4:c_rarg4
> = 'java/lang/Object'
> # parm4: c_rarg5:c_rarg5
> = 'j...
@wenshao This is just an assembly dump. You need to have some profiling data that tells you where the time is spent. I'm not going to do the analysis work for you, I'm sorry. I gave you some pointers as how to do that. If you have more questions about how to do that, feel free to ask. You also have not provided the `TraceMergeStores` log yet, as I asked you.
Can you investigate WHY there is a performance difference? Which `loads` and `branches` etc are generated?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2160251736
More information about the core-libs-dev
mailing list