RFR: 8148937: (str) Adapt StringJoiner for Compact Strings
Сергей Цыпанов
github.com+10835776+stsypanov at openjdk.java.net
Mon Mar 15 13:56:08 UTC 2021
On Mon, 15 Mar 2021 12:36:24 GMT, Claes Redestad <redestad at openjdk.org> wrote:
>> Hi @cl4es,
>>> Some of these changes conflict with #2334, which suggest removing the `coder` and `isLatin1` methods from `String`.
>>
>> I've checked out Aleksey's branch and applied my changes onto it, the only thing that I changed to make it work is replacing
>> public boolean isLatin1(String str) {
>> return str.isLatin1();
>> }
>> with
>> public boolean isLatin1(String str) {
>> return str.coder == String.LATIN1;
>> }
>> The rest of the code was left intact. `jdk:tier1` is OK after the change.
>>> As a more general point I think it would be good to explore options that does not increase leakage of the implementation detail that `Strings` are latin1- or utf16-encoded outside of java.lang.
>>
>> Apart from `JavaLangAccess` the only thing that comes to my mind is reflection, but it will destroy all the improvement. Otherwise I cannot figure out any other way to access somehow package-private latin/non-latin functionality of `j.l.String` in `java.util` package. I wonder, whether I'm missing any other opportunities?
>
> A less intrusive alternative would be to use a `StringBuilder`, see changes in this branch: https://github.com/openjdk/jdk/compare/master...cl4es:stringjoin_improvement?expand=1 (I adapted your StringJoinerBenchmark to work with the ascii-only build constraint)
>
> This underperforms compared to your patch since StringBuilder.toString needs to do a copy, but improves over the baseline:
> Benchmark (count) (length) (mode) Mode Cnt Score Error Units
> StringJoinerBenchmark.stringJoiner 100 64 latin avgt 5 5420.701 ± 1433.485 ns/op
> StringJoinerBenchmark.stringJoiner:·gc.alloc.rate.norm 100 64 latin avgt 5 20640.428 ± 0.130 B/op
> Patch:
> Benchmark (count) (length) (mode) Mode Cnt Score Error Units
> StringJoinerBenchmark.stringJoiner 100 64 latin avgt 5 4271.401 ± 677.560 ns/op
> StringJoinerBenchmark.stringJoiner:·gc.alloc.rate.norm 100 64 latin avgt 5 14136.294 ± 0.095 B/op
>
> The comparative benefit is that we'd avoid punching more holes into String implementation details for now. Not ruling that out indefinitely, but I think it needs a stronger motivation than to improve StringJoiner alone.
I was thinking about `StringBuilder` at the very beginning but then decided to have no bounds checks and reallocations. Indeed from maintainability point of view your solution is much more attractive. I have one minor comment about `append()`: I think we should do more chaining, e.g.
StringBuilder sb = new StringBuilder(len);
sb.append(elts[0]);
should be
StringBuilder sb = new StringBuilder(len).append(elts[0]);
etc. to have less bytecode.
E.g. if we take
public String sb() {
StringBuilder sb = new StringBuilder();
sb.append("a");
sb.append("b");
return sb.toString();
}
`sb.append()` gets compiled into
L1
LINENUMBER 23 L1
ALOAD 1
LDC "a"
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
POP
L2
LINENUMBER 24 L2
ALOAD 1
LDC "b"
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
POP
```
and in case we have
```
sb.append("a").append("b");
```
the amount of byte code is reduced:
```
L1
LINENUMBER 23 L1
ALOAD 1
LDC "a"
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
LDC "b"
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
POP
```
Also I'd change
if (addLen == 0) {
compactElts();
return size == 0 ? "" : elts[0];
}
to
if (size == 0) {
if (addLen == 0) {
return "";
}
return prefix + suffix;
}
The second variant is more understandable to me and likely to be slightly faster.
And finally, should I close this PR and you'll create a new one from your branch, or I need to copy your changes here?
-------------
PR: https://git.openjdk.java.net/jdk/pull/2627
More information about the core-libs-dev
mailing list