String.charAt vs StringBuilder.charAt performance
Brett Okken
brett.okken.os at gmail.com
Mon Jul 21 21:01:46 UTC 2025
Updating to have different test methods for each representation did remove
the difference for the non-ascii String case for the jdk 21+ releases.
However, the ascii (latin) strings are still slower with String than
StringBuilder.
How does C2 then handle something like StringCharBuffer wrapping a
CharSequence for all of it's get operations:
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/StringCharBuffer.java#L88-L97
Which is then used by CharBufferSpliterator
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/CharBufferSpliterator.java
And by many CharsetEncoder impls when either source or destination is not
backed by array (which would be the case if StringCharBuffer used):
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/nio/cs/UTF_8.java#L517
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/nio/cs/UnicodeEncoder.java#L81
jdk 17
Benchmark (data) Mode Cnt
Score Error Units
CharSequenceCharAtBenchmark.testString ascii avgt 3
1429.358 ± 623.424 ns/op
CharSequenceCharAtBenchmark.testString non-ascii avgt 3
705.282 ± 233.453 ns/op
CharSequenceCharAtBenchmark.testStringBuilder ascii avgt 3
724.138 ± 267.346 ns/op
CharSequenceCharAtBenchmark.testStringBuilder non-ascii avgt 3
718.357 ± 864.066 ns/op
jdk 21
Benchmark (data) Mode Cnt
Score Error Units
CharSequenceCharAtBenchmark.testString ascii avgt 3
1087.024 ┬▒ 235.082 ns/op
CharSequenceCharAtBenchmark.testString non-ascii avgt 3
687.520 ┬▒ 747.532 ns/op
CharSequenceCharAtBenchmark.testStringBuilder ascii avgt 3
672.802 ┬▒ 29.740 ns/op
CharSequenceCharAtBenchmark.testStringBuilder non-ascii avgt 3
689.964 ┬▒ 791.175 ns/op
jdk 25
Benchmark (data) Mode Cnt
Score Error Units
CharSequenceCharAtBenchmark.testString ascii avgt 3
1176.057 ┬▒ 1157.979 ns/op
CharSequenceCharAtBenchmark.testString non-ascii avgt 3
697.382 ┬▒ 231.144 ns/op
CharSequenceCharAtBenchmark.testStringBuilder ascii avgt 3
692.970 ┬▒ 105.112 ns/op
CharSequenceCharAtBenchmark.testStringBuilder non-ascii avgt 3
703.178 ┬▒ 446.019 ns/op
jdk 26
Benchmark (data) Mode Cnt
Score Error Units
CharSequenceCharAtBenchmark.testString ascii avgt 3
1132.971 ┬▒ 350.786 ns/op
CharSequenceCharAtBenchmark.testString non-ascii avgt 3
688.201 ┬▒ 175.797 ns/op
CharSequenceCharAtBenchmark.testStringBuilder ascii avgt 3
704.380 ┬▒ 101.763 ns/op
CharSequenceCharAtBenchmark.testStringBuilder non-ascii avgt 3
673.622 ┬▒ 51.462 ns/op
@Warmup(iterations = 2, time = 7, timeUnit = TimeUnit.SECONDS)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@Fork(value = 1, jvmArgsPrepend = {"-Xms512M", "-Xmx512M"})
public class CharSequenceCharAtBenchmark {
@Param(value = {"ascii", "non-ascii"})
public String data;
private String string;
private StringBuilder stringBuilder;
@Setup(Level.Trial)
public void setup() throws Exception {
StringBuilder sb = new StringBuilder(3152);
for (int i=0; i<3152; ++i) {
char c = (char) i;
if ("ascii".equals(data)) {
c = (char) (i & 0x7f);
}
sb.append(c);
}
string = sb.toString();
stringBuilder = sb;
}
@Benchmark
public int testString() {
String sequence = this.string;
int sum = 0;
for (int i=0, j=sequence.length(); i<j; ++i) {
sum += sequence.charAt(i);
}
return sum;
}
@Benchmark
public int testStringBuilder() {
StringBuilder sequence = this.stringBuilder;
int sum = 0;
for (int i=0, j=sequence.length(); i<j; ++i) {
sum += sequence.charAt(i);
}
return sum;
}
}
On Mon, Jul 21, 2025 at 1:12 PM Roger Riggs <roger.riggs at oracle.com> wrote:
> Hi Brett,
>
> I'd suggest separate initialization and test methods for the two cases to
> get more reliable numbers.
>
> By using @Trial and using a common field for the test data, I think you
> have handicapped C2.
> The training runs JMH does to warm up C2 are 'seeing' two different types
> for the value of sequence.
> Making the test runs independent will remov doubt about interactions due
> to the test setup.
>
> Roger
>
> On 7/21/25 1:43 PM, Brett Okken wrote:
>
> > output labeled as StringBuffer but the jmh creates StringBuilder.
>
> Ugh - sorry about that. But yes - this is about StringBuilder vs String.
>
> > I would not be surprised that C2 has more optimizations for String than
> for StringBuilder.
>
> If that were true, it would not surprise me. However, these tests show the
> opposite. String is /slower/ than StringBuilder.
>
> On Mon, Jul 21, 2025 at 12:34 PM Roger Riggs <roger.riggs at oracle.com>
> wrote:
>
>> Hi Brett,
>>
>> The labeling of the output is confusing, the test output labeled as
>> StringBuffer but the jmh creates StringBuilder.
>> (StringBuffer methods are all synchronized and could explain why they are
>> slower).
>>
>> Also, I would not be surprised that C2 has more optimizations for String
>> than for StringBuilder.
>>
>> Regards, Roger
>>
>> On 7/19/25 6:09 PM, Brett Okken wrote:
>>
>> Making sequence a local variable does improve things (especially for
>> ascii), but a substantial difference remains. It appears that the
>> performance difference for ascii goes all the way back to jdk 11. The
>> difference for non-ascii showed up in jdk 21. I wonder if this is related
>> to the index checks?
>>
>> jdk 11
>>
>> Benchmark (data) (source) Mode Cnt Score Error Units
>> test ascii String avgt 3 1137.348 ± 12.835 ns/op
>> test ascii StringBuffer avgt 3 712.874 ± 509.320 ns/op
>> test non-ascii String avgt 3 668.657 ± 246.550 ns/op
>> test non-ascii StringBuffer avgt 3 897.344 ± 4353.414 ns/op
>>
>>
>> jdk 17
>> Benchmark (data) (source) Mode Cnt Score Error Units
>> test ascii String avgt 3 1321.497 ± 2107.466 ns/op
>> test ascii StringBuffer avgt 3 715.936 ± 412.189 ns/op
>> test non-ascii String avgt 3 722.986 ± 443.389 ns/op
>> test non-ascii StringBuffer avgt 3 722.787 ± 771.816 ns/op
>>
>>
>> jdk 21
>> Benchmark (data) (source) Mode Cnt Score Error Units
>> test ascii String avgt 3 1150.301 ┬▒ 918.549 ns/op
>> test ascii StringBuffer avgt 3 713.183 ┬▒ 543.850 ns/op
>> test non-ascii String avgt 3 4642.667 ┬▒ 11481.029 ns/op
>> test non-ascii StringBuffer avgt 3 728.027 ┬▒ 936.521 ns/op
>>
>>
>> jdk 25
>> Benchmark (data) (source) Mode Cnt Score Error Units
>> test ascii String avgt 3 1184.513 ┬▒ 2057.498 ns/op
>> test ascii StringBuffer avgt 3 786.611 ┬▒ 411.657 ns/op
>> test non-ascii String avgt 3 4197.585 ┬▒ 2761.388 ns/op
>> test non-ascii StringBuffer avgt 3 716.375 ┬▒ 815.349 ns/op
>>
>>
>> jdk 26
>> Benchmark (data) (source) Mode Cnt Score Error Units
>> test ascii String avgt 3 1107.207 ┬▒ 423.072 ns/op
>> test ascii StringBuffer avgt 3 742.780 ┬▒ 178.890 ns/op
>> test non-ascii String avgt 3 4043.914 ┬▒ 498.439 ns/op
>> test non-ascii StringBuffer avgt 3 712.535 ┬▒ 583.255 ns/op
>>
>>
>> On Sat, Jul 19, 2025 at 4:17 PM Chen Liang <liangchenblue at gmail.com>
>> wrote:
>>
>>> Without looking at C2 IRs, I think there are a few potential culprits we
>>> can look into:
>>> 1. JDK-8351000 and JDK-8351443 updated StringBuilder
>>> 2. Sequence field is read in the loop; I wonder if making it an explicit
>>> immutable local variable changes anything here.
>>>
>>> On Sat, Jul 19, 2025 at 2:34 PM Brett Okken <brett.okken.os at gmail.com>
>>> wrote:
>>>
>>>> I was looking at the performance of StringCharBuffer for various
>>>> backing CharSequence types and was surprised to see a significant
>>>> performance difference between String and StringBuffer. I wrote a
>>>> small jmh which shows that the String implementation of charAt is
>>>> significantly slower than StringBuilder. Is this expected?
>>>>
>>>> Benchmark (data) (source) Mode Cnt
>>>> Score Error Units
>>>> CharSequenceCharAtBenchmark.test ascii String avgt 3
>>>> 2537.311 ┬▒ 8952.197 ns/op
>>>> CharSequenceCharAtBenchmark.test ascii StringBuffer avgt 3
>>>> 852.004 ┬▒ 2532.958 ns/op
>>>> CharSequenceCharAtBenchmark.test non-ascii String avgt 3
>>>> 5115.381 ┬▒ 13822.592 ns/op
>>>> CharSequenceCharAtBenchmark.test non-ascii StringBuffer avgt 3
>>>> 836.230 ┬▒ 1154.191 ns/op
>>>>
>>>>
>>>>
>>>> @Measurement(iterations = 3, time = 5, timeUnit = TimeUnit.SECONDS)
>>>> @Warmup(iterations = 2, time = 7, timeUnit = TimeUnit.SECONDS)
>>>> @BenchmarkMode(Mode.AverageTime)
>>>> @OutputTimeUnit(TimeUnit.NANOSECONDS)
>>>> @State(Scope.Benchmark)
>>>> @Fork(value = 1, jvmArgsPrepend = {"-Xms512M", "-Xmx512M"})
>>>> public class CharSequenceCharAtBenchmark {
>>>>
>>>> @Param(value = {"ascii", "non-ascii"})
>>>> public String data;
>>>>
>>>> @Param(value = {"String", "StringBuffer"})
>>>> public String source;
>>>>
>>>> private CharSequence sequence;
>>>>
>>>> @Setup(Level.Trial)
>>>> public void setup() throws Exception {
>>>> StringBuilder sb = new StringBuilder(3152);
>>>> for (int i=0; i<3152; ++i) {
>>>> char c = (char) i;
>>>> if ("ascii".equals(data)) {
>>>> c = (char) (i & 0x7f);
>>>> }
>>>> sb.append(c);
>>>> }
>>>>
>>>> switch(source) {
>>>> case "String":
>>>> sequence = sb.toString();
>>>> break;
>>>> case "StringBuffer":
>>>> sequence = sb;
>>>> break;
>>>> default:
>>>> throw new IllegalArgumentException(source);
>>>> }
>>>> }
>>>>
>>>> @Benchmark
>>>> public int test() {
>>>> int sum = 0;
>>>> for (int i=0, j=sequence.length(); i<j; ++i) {
>>>> sum += sequence.charAt(i);
>>>> }
>>>> return sum;
>>>> }
>>>> }
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250721/9c4acaba/attachment-0001.htm>
More information about the core-libs-dev
mailing list