String.charAt vs StringBuilder.charAt performance

Mon Jul 21 21:01:46 UTC 2025

Updating to have different test methods for each representation did remove
the difference for the non-ascii String case for the jdk 21+ releases.
However, the ascii (latin) strings are still slower with String than
StringBuilder.

How does C2 then handle something like StringCharBuffer wrapping a
CharSequence for all of it's get operations:
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/StringCharBuffer.java#L88-L97

Which is then used by CharBufferSpliterator
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/CharBufferSpliterator.java

And by many CharsetEncoder impls when either source or destination is not
backed by array (which would be the case if StringCharBuffer used):
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/nio/cs/UTF_8.java#L517
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/nio/cs/UnicodeEncoder.java#L81

jdk 17
Benchmark                                         (data)  Mode  Cnt
Score     Error  Units
CharSequenceCharAtBenchmark.testString             ascii  avgt    3
 1429.358 ± 623.424  ns/op
CharSequenceCharAtBenchmark.testString         non-ascii  avgt    3
705.282 ± 233.453  ns/op
CharSequenceCharAtBenchmark.testStringBuilder      ascii  avgt    3
724.138 ± 267.346  ns/op
CharSequenceCharAtBenchmark.testStringBuilder  non-ascii  avgt    3
718.357 ± 864.066  ns/op

jdk 21
Benchmark                                         (data)  Mode  Cnt
Score     Error  Units
CharSequenceCharAtBenchmark.testString             ascii  avgt    3
 1087.024 ┬▒ 235.082  ns/op
CharSequenceCharAtBenchmark.testString         non-ascii  avgt    3
687.520 ┬▒ 747.532  ns/op
CharSequenceCharAtBenchmark.testStringBuilder      ascii  avgt    3
672.802 ┬▒  29.740  ns/op
CharSequenceCharAtBenchmark.testStringBuilder  non-ascii  avgt    3
689.964 ┬▒ 791.175  ns/op

jdk 25
Benchmark                                         (data)  Mode  Cnt
Score      Error  Units
CharSequenceCharAtBenchmark.testString             ascii  avgt    3
 1176.057 ┬▒ 1157.979  ns/op
CharSequenceCharAtBenchmark.testString         non-ascii  avgt    3
697.382 ┬▒  231.144  ns/op
CharSequenceCharAtBenchmark.testStringBuilder      ascii  avgt    3
692.970 ┬▒  105.112  ns/op
CharSequenceCharAtBenchmark.testStringBuilder  non-ascii  avgt    3
703.178 ┬▒  446.019  ns/op

jdk 26
Benchmark                                         (data)  Mode  Cnt
Score     Error  Units
CharSequenceCharAtBenchmark.testString             ascii  avgt    3
 1132.971 ┬▒ 350.786  ns/op
CharSequenceCharAtBenchmark.testString         non-ascii  avgt    3
688.201 ┬▒ 175.797  ns/op
CharSequenceCharAtBenchmark.testStringBuilder      ascii  avgt    3
704.380 ┬▒ 101.763  ns/op
CharSequenceCharAtBenchmark.testStringBuilder  non-ascii  avgt    3
673.622 ┬▒  51.462  ns/op

@Warmup(iterations = 2, time = 7, timeUnit = TimeUnit.SECONDS)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@Fork(value = 1, jvmArgsPrepend = {"-Xms512M", "-Xmx512M"})
public class CharSequenceCharAtBenchmark {

    @Param(value = {"ascii", "non-ascii"})
    public String data;

    private String string;

    private StringBuilder stringBuilder;

    @Setup(Level.Trial)
    public void setup() throws Exception {
        StringBuilder sb = new StringBuilder(3152);
        for (int i=0; i<3152; ++i) {
            char c = (char) i;
            if ("ascii".equals(data)) {
                c = (char) (i & 0x7f);
            }
            sb.append(c);
        }

        string = sb.toString();
        stringBuilder = sb;
    }

    @Benchmark
    public int testString() {
        String sequence = this.string;
        int sum = 0;
        for (int i=0, j=sequence.length(); i<j; ++i) {
            sum += sequence.charAt(i);
        }
        return sum;
    }

    @Benchmark
    public int testStringBuilder() {
        StringBuilder sequence = this.stringBuilder;
        int sum = 0;
        for (int i=0, j=sequence.length(); i<j; ++i) {
            sum += sequence.charAt(i);
        }
        return sum;
    }
}

On Mon, Jul 21, 2025 at 1:12 PM Roger Riggs <roger.riggs at oracle.com> wrote:

> Hi Brett,
>
> I'd suggest separate initialization and test methods for the two cases to
> get more reliable numbers.
>
> By using @Trial and using a common field for the test data, I think you
> have handicapped C2.
> The training runs JMH does to warm up C2 are 'seeing' two different types
> for the value of sequence.
> Making the test runs independent will remov doubt about interactions due
> to the test setup.
>
> Roger
>
> On 7/21/25 1:43 PM, Brett Okken wrote:
>
> >  output labeled as StringBuffer but the jmh creates StringBuilder.
>
> Ugh - sorry about that. But yes - this is about StringBuilder vs String.
>
> > I would not be surprised that C2 has more optimizations for String than
> for StringBuilder.
>
> If that were true, it would not surprise me. However, these tests show the
> opposite. String is /slower/ than StringBuilder.
>
> On Mon, Jul 21, 2025 at 12:34 PM Roger Riggs <roger.riggs at oracle.com>
> wrote:
>
>> Hi Brett,
>>
>> The labeling of the output is confusing, the test output labeled as
>> StringBuffer but the jmh creates StringBuilder.
>> (StringBuffer methods are all synchronized and could explain why they are
>> slower).
>>
>> Also, I would not be surprised that C2 has more optimizations for String
>> than for StringBuilder.
>>
>> Regards, Roger
>>
>> On 7/19/25 6:09 PM, Brett Okken wrote:
>>
>> Making sequence a local variable does improve things (especially for
>> ascii), but a substantial difference remains. It appears that the
>> performance difference for ascii goes all the way back to jdk 11. The
>> difference for non-ascii showed up in jdk 21. I wonder if this is related
>> to the index checks?
>>
>> jdk 11
>>
>> Benchmark  (data)      (source)  Mode  Cnt     Score      Error  Units
>> test        ascii        String  avgt    3  1137.348 ±   12.835  ns/op
>> test        ascii  StringBuffer  avgt    3   712.874 ±  509.320  ns/op
>> test    non-ascii        String  avgt    3   668.657 ±  246.550  ns/op
>> test    non-ascii  StringBuffer  avgt    3   897.344 ± 4353.414  ns/op
>>
>>
>> jdk 17
>> Benchmark  (data)      (source)  Mode  Cnt     Score      Error  Units
>> test        ascii        String  avgt    3  1321.497 ± 2107.466  ns/op
>> test        ascii  StringBuffer  avgt    3   715.936 ±  412.189  ns/op
>> test    non-ascii        String  avgt    3   722.986 ±  443.389  ns/op
>> test    non-ascii  StringBuffer  avgt    3   722.787 ±  771.816  ns/op
>>
>>
>> jdk 21
>> Benchmark  (data)      (source)  Mode  Cnt     Score       Error  Units
>> test        ascii        String  avgt    3  1150.301 ┬▒   918.549  ns/op
>> test        ascii  StringBuffer  avgt    3   713.183 ┬▒   543.850  ns/op
>> test    non-ascii        String  avgt    3  4642.667 ┬▒ 11481.029  ns/op
>> test    non-ascii  StringBuffer  avgt    3   728.027 ┬▒   936.521  ns/op
>>
>>
>> jdk 25
>> Benchmark  (data)      (source)  Mode  Cnt     Score      Error  Units
>> test        ascii        String  avgt    3  1184.513 ┬▒ 2057.498  ns/op
>> test        ascii  StringBuffer  avgt    3   786.611 ┬▒  411.657  ns/op
>> test    non-ascii        String  avgt    3  4197.585 ┬▒ 2761.388  ns/op
>> test    non-ascii  StringBuffer  avgt    3   716.375 ┬▒  815.349  ns/op
>>
>>
>> jdk 26
>> Benchmark  (data)      (source)  Mode  Cnt     Score     Error  Units
>> test        ascii        String  avgt    3  1107.207 ┬▒ 423.072  ns/op
>> test        ascii  StringBuffer  avgt    3   742.780 ┬▒ 178.890  ns/op
>> test    non-ascii        String  avgt    3  4043.914 ┬▒ 498.439  ns/op
>> test    non-ascii  StringBuffer  avgt    3   712.535 ┬▒ 583.255  ns/op
>>
>>
>> On Sat, Jul 19, 2025 at 4:17 PM Chen Liang <liangchenblue at gmail.com>
>> wrote:
>>
>>> Without looking at C2 IRs, I think there are a few potential culprits we
>>> can look into:
>>> 1. JDK-8351000 and JDK-8351443 updated StringBuilder
>>> 2. Sequence field is read in the loop; I wonder if making it an explicit
>>> immutable local variable changes anything here.
>>>
>>> On Sat, Jul 19, 2025 at 2:34 PM Brett Okken <brett.okken.os at gmail.com>
>>> wrote:
>>>
>>>> I was looking at the performance of StringCharBuffer for various
>>>> backing CharSequence types and was surprised to see a significant
>>>> performance difference between String and StringBuffer. I wrote a
>>>> small jmh which shows that the String implementation of charAt is
>>>> significantly slower than StringBuilder. Is this expected?
>>>>
>>>> Benchmark                            (data)      (source)  Mode  Cnt
>>>>   Score       Error  Units
>>>> CharSequenceCharAtBenchmark.test      ascii        String  avgt    3
>>>> 2537.311 ┬▒  8952.197  ns/op
>>>> CharSequenceCharAtBenchmark.test      ascii  StringBuffer  avgt    3
>>>> 852.004 ┬▒  2532.958  ns/op
>>>> CharSequenceCharAtBenchmark.test  non-ascii        String  avgt    3
>>>> 5115.381 ┬▒ 13822.592  ns/op
>>>> CharSequenceCharAtBenchmark.test  non-ascii  StringBuffer  avgt    3
>>>> 836.230 ┬▒  1154.191  ns/op
>>>>
>>>>
>>>>
>>>> @Measurement(iterations = 3, time = 5, timeUnit = TimeUnit.SECONDS)
>>>> @Warmup(iterations = 2, time = 7, timeUnit = TimeUnit.SECONDS)
>>>> @BenchmarkMode(Mode.AverageTime)
>>>> @OutputTimeUnit(TimeUnit.NANOSECONDS)
>>>> @State(Scope.Benchmark)
>>>> @Fork(value = 1, jvmArgsPrepend = {"-Xms512M", "-Xmx512M"})
>>>> public class CharSequenceCharAtBenchmark {
>>>>
>>>>     @Param(value = {"ascii", "non-ascii"})
>>>>     public String data;
>>>>
>>>>     @Param(value = {"String", "StringBuffer"})
>>>>     public String source;
>>>>
>>>>     private CharSequence sequence;
>>>>
>>>>     @Setup(Level.Trial)
>>>>     public void setup() throws Exception {
>>>>         StringBuilder sb = new StringBuilder(3152);
>>>>         for (int i=0; i<3152; ++i) {
>>>>             char c = (char) i;
>>>>             if ("ascii".equals(data)) {
>>>>                 c = (char) (i & 0x7f);
>>>>             }
>>>>             sb.append(c);
>>>>         }
>>>>
>>>>         switch(source) {
>>>>             case "String":
>>>>                 sequence = sb.toString();
>>>>                 break;
>>>>             case "StringBuffer":
>>>>                 sequence = sb;
>>>>                 break;
>>>>             default:
>>>>                 throw new IllegalArgumentException(source);
>>>>         }
>>>>     }
>>>>
>>>>     @Benchmark
>>>>     public int test() {
>>>>         int sum = 0;
>>>>         for (int i=0, j=sequence.length(); i<j; ++i) {
>>>>             sum += sequence.charAt(i);
>>>>         }
>>>>         return sum;
>>>>     }
>>>> }
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250721/9c4acaba/attachment-0001.htm>