RFR: 8361018: Re-examine buffering and encoding conversion in BufferedWriter [v6]
Shaojin Wen
swen at openjdk.org
Wed Jul 2 08:52:43 UTC 2025
On Tue, 1 Jul 2025 00:01:21 GMT, Shaojin Wen <swen at openjdk.org> wrote:
>> BufferedWriter -> OutputStreamWriter -> StreamEncoder
>>
>> In this call chain, BufferedWriter has a char[] buffer, and StreamEncoder has a ByteBuffer. There are two layers of cache here, or the BufferedWriter layer can be removed. And when charset is UTF8, if the content of write(String) is LATIN1, a conversion from LATIN1 to UTF16 and then to LATIN1 will occur here.
>>
>> LATIN1 -> UTF16 -> UTF8
>>
>> We can improve BufferedWriter. When the parameter Writer instanceof OutputStreamWriter is passed in, remove the cache and call it directly. In addition, improve write(String) in StreamEncoder to avoid unnecessary encoding conversion.
>
> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision:
>
> Revert "BufferedWriter buffer use StringBuilder"
>
> This reverts commit da902ca0b0bd6acc003deb8ad1ca0d6485a29a27.
According to the suggestions of liach and xx, I added the improvement research of BufferedWriter using StringBuilder as buffer + ArrayEncoder to the current PR, which can have a good performance improvement in non-UTF8 scenarios.
The code is this branch: https://github.com/wenshao/jdk/tree/utf8_writer_202506_x4
There are a lot of code changes, which should be another PR.
git remote add wenshao git at github.com:wenshao/jdk.git
git fetch wenshao
## Baseline
# https://github.com/wenshao/jdk/tree/utf8_writer_202506_test
git checkout 2758d6ad7767832db004d28f10cc764f33fa438e
make test TEST="micro:java.io.BufferedWriterBench" MICRO="OPTIONS=-p charset=ISO_8859_1,ASCII,UTF8,UTF16,GB18030
# Current (PR 26022 & BufferedWriter use StringBuilder as buffer)
# https://github.com/wenshao/jdk/tree/utf8_writer_202506_x4
git checkout 77c5996b6a7b7ea74d03b64c4c8e827a7d76f05a
make test TEST="micro:java.io.BufferedWriterBench" MICRO="OPTIONS=-p charset=ISO_8859_1,ASCII,UTF8,UTF16,GB18030
## Benchmark Numbers on Aliyun ECS c9i (Intel x64 CPU)
Benchmark (charType) (charset) Units Base_Score Current_Score Improvement(%)
writeCharArray ascii ISO_8859_1 us/op 3.128 3.027 +3.23%
writeCharArray ascii ASCII us/op 3.126 3.351 -6.88%
writeCharArray ascii UTF8 us/op 3.125 3.716 -18.91%
writeCharArray ascii UTF16 us/op 32.469 11.404 +64.87%
writeCharArray ascii GB18030 us/op 9.642 7.296 +24.34%
writeCharArray utf8_2_bytes ISO_8859_1 us/op 3.137 3.016 +3.86%
writeCharArray utf8_2_bytes ASCII us/op 96.779 8.725 +90.99%
writeCharArray utf8_2_bytes UTF8 us/op 17.346 12.966 +25.25%
writeCharArray utf8_2_bytes UTF16 us/op 32.407 11.267 +65.19%
writeCharArray utf8_2_bytes GB18030 us/op 82.994 12.401 +85.14%
writeCharArray utf8_3_bytes ISO_8859_1 us/op 100.063 7.486 +92.51%
writeCharArray utf8_3_bytes ASCII us/op 96.061 9.236 +90.40%
writeCharArray utf8_3_bytes UTF8 us/op 28.340 13.358 +52.86%
writeCharArray utf8_3_bytes UTF16 us/op 32.468 11.785 +63.70%
writeCharArray utf8_3_bytes GB18030 us/op 40.864 37.012 +9.66%
writeCharArray emoji ISO_8859_1 us/op 190.547 10.149 +94.67%
writeCharArray emoji ASCII us/op 187.803 12.774 +93.17%
writeCharArray emoji UTF8 us/op 41.493 23.473 +43.49%
writeCharArray emoji UTF16 us/op 48.248 16.227 +66.36%
writeCharArray emoji GB18030 us/op 147.360 63.437 +57.01%
writeString ascii ISO_8859_1 us/op 3.340 2.770 +17.09%
writeString ascii ASCII us/op 3.340 3.069 +8.11%
writeString ascii UTF8 us/op 3.324 2.944 +11.43%
writeString ascii UTF16 us/op 32.503 11.214 +65.49%
writeString ascii GB18030 us/op 9.023 6.999 +22.43%
writeString utf8_2_bytes ISO_8859_1 us/op 3.338 2.827 +15.31%
writeString utf8_2_bytes ASCII us/op 95.964 8.542 +91.10%
writeString utf8_2_bytes UTF8 us/op 17.660 10.155 +42.44%
writeString utf8_2_bytes UTF16 us/op 32.516 11.173 +65.63%
writeString utf8_2_bytes GB18030 us/op 82.369 12.231 +85.14%
writeString utf8_3_bytes ISO_8859_1 us/op 100.280 7.363 +92.66%
writeString utf8_3_bytes ASCII us/op 95.279 9.060 +90.48%
writeString utf8_3_bytes UTF8 us/op 28.344 18.366 +35.19%
writeString utf8_3_bytes UTF16 us/op 32.672 11.284 +65.43%
writeString utf8_3_bytes GB18030 us/op 43.798 37.145 +15.16%
writeString emoji ISO_8859_1 us/op 189.574 9.904 +94.75%
writeString emoji ASCII us/op 187.021 12.427 +93.35%
writeString emoji UTF8 us/op 41.775 25.875 +37.98%
writeString emoji UTF16 us/op 48.240 15.696 +67.10%
writeString emoji GB18030 us/op 147.097 63.587 +56.78%
## Benchmark Numbers on MacBook M1 Pro (aarch64)
Benchmark (charType) (charset) Units Base_Score Current_Score Improvement(%)
BufferedWriterBench.writeCharArray ascii ISO_8859_1 us/op 2.815 2.133 +24.20%
BufferedWriterBench.writeCharArray ascii ASCII us/op 2.742 2.352 +14.22%
BufferedWriterBench.writeCharArray ascii UTF8 us/op 2.704 2.616 +3.25%
BufferedWriterBench.writeCharArray ascii UTF16 us/op 31.294 8.489 +72.87%
BufferedWriterBench.writeCharArray ascii GB18030 us/op 8.932 3.820 +57.20%
BufferedWriterBench.writeCharArray utf8_2_bytes ISO_8859_1 us/op 2.828 2.210 +21.85%
BufferedWriterBench.writeCharArray utf8_2_bytes ASCII us/op 109.255 5.669 +94.80%
BufferedWriterBench.writeCharArray utf8_2_bytes UTF8 us/op 22.353 14.039 +37.15%
BufferedWriterBench.writeCharArray utf8_2_bytes UTF16 us/op 31.268 8.349 +73.28%
BufferedWriterBench.writeCharArray utf8_2_bytes GB18030 us/op 90.835 6.816 +92.50%
BufferedWriterBench.writeCharArray utf8_3_bytes ISO_8859_1 us/op 109.734 7.834 +92.88%
BufferedWriterBench.writeCharArray utf8_3_bytes ASCII us/op 106.981 7.906 +92.60%
BufferedWriterBench.writeCharArray utf8_3_bytes UTF8 us/op 21.453 16.076 +25.07%
BufferedWriterBench.writeCharArray utf8_3_bytes UTF16 us/op 31.294 6.945 +77.75%
BufferedWriterBench.writeCharArray utf8_3_bytes GB18030 us/op 49.007 27.891 +43.02%
BufferedWriterBench.writeCharArray emoji ISO_8859_1 us/op 223.538 11.189 +94.54%
BufferedWriterBench.writeCharArray emoji ASCII us/op 264.875 11.384 +95.69%
BufferedWriterBench.writeCharArray emoji UTF8 us/op 35.704 21.672 +39.29%
BufferedWriterBench.writeCharArray emoji UTF16 us/op 45.979 11.255 +75.51%
BufferedWriterBench.writeCharArray emoji GB18030 us/op 148.829 57.625 +61.33%
BufferedWriterBench.writeString ascii ISO_8859_1 us/op 2.898 2.159 +25.49%
BufferedWriterBench.writeString ascii ASCII us/op 2.876 2.591 +9.91%
BufferedWriterBench.writeString ascii UTF8 us/op 2.894 2.466 +14.79%
BufferedWriterBench.writeString ascii UTF16 us/op 31.236 8.759 +71.82%
BufferedWriterBench.writeString ascii GB18030 us/op 9.010 3.899 +56.70%
BufferedWriterBench.writeString utf8_2_bytes ISO_8859_1 us/op 2.894 2.178 +24.71%
BufferedWriterBench.writeString utf8_2_bytes ASCII us/op 108.426 5.611 +94.82%
BufferedWriterBench.writeString utf8_2_bytes UTF8 us/op 22.206 12.225 +44.93%
BufferedWriterBench.writeString utf8_2_bytes UTF16 us/op 31.305 8.773 +71.98%
BufferedWriterBench.writeString utf8_2_bytes GB18030 us/op 90.820 6.907 +92.40%
BufferedWriterBench.writeString utf8_3_bytes ISO_8859_1 us/op 108.983 7.931 +92.66%
BufferedWriterBench.writeString utf8_3_bytes ASCII us/op 107.064 7.836 +92.66%
BufferedWriterBench.writeString utf8_3_bytes UTF8 us/op 21.664 13.102 +39.47%
BufferedWriterBench.writeString utf8_3_bytes UTF16 us/op 31.546 6.930 +78.00%
BufferedWriterBench.writeString utf8_3_bytes GB18030 us/op 52.688 27.698 +47.17%
BufferedWriterBench.writeString emoji ISO_8859_1 us/op 221.930 11.160 +94.95%
BufferedWriterBench.writeString emoji ASCII us/op 236.791 11.116 +95.30%
BufferedWriterBench.writeString emoji UTF8 us/op 35.025 23.210 +33.73%
BufferedWriterBench.writeString emoji UTF16 us/op 45.988 11.334 +75.32%
BufferedWriterBench.writeString emoji GB18030 us/op 148.202 57.472 +61.23%
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26022#issuecomment-3027011273
More information about the nio-dev
mailing list