RFR: 8361018: Re-examine buffering and encoding conversion in BufferedWriter [v6]

Shaojin Wen swen at openjdk.org
Wed Jul 2 08:52:43 UTC 2025


On Tue, 1 Jul 2025 00:01:21 GMT, Shaojin Wen <swen at openjdk.org> wrote:

>> BufferedWriter -> OutputStreamWriter -> StreamEncoder
>> 
>> In this call chain, BufferedWriter has a char[] buffer, and StreamEncoder has a ByteBuffer. There are two layers of cache here, or the BufferedWriter layer can be removed. And when charset is UTF8, if the content of write(String) is LATIN1, a conversion from LATIN1 to UTF16 and then to LATIN1 will occur here.
>> 
>> LATIN1 -> UTF16 -> UTF8
>> 
>> We can improve BufferedWriter. When the parameter Writer instanceof OutputStreamWriter is passed in, remove the cache and call it directly. In addition, improve write(String) in StreamEncoder to avoid unnecessary encoding conversion.
>
> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert "BufferedWriter buffer use StringBuilder"
>   
>   This reverts commit da902ca0b0bd6acc003deb8ad1ca0d6485a29a27.

According to the suggestions of liach and xx, I added the improvement research of BufferedWriter using StringBuilder as buffer + ArrayEncoder to the current PR, which can have a good performance improvement in non-UTF8 scenarios.

The code is this branch: https://github.com/wenshao/jdk/tree/utf8_writer_202506_x4

There are a lot of code changes, which should be another PR.


git remote add wenshao git at github.com:wenshao/jdk.git
git fetch wenshao

## Baseline
# https://github.com/wenshao/jdk/tree/utf8_writer_202506_test
git checkout 2758d6ad7767832db004d28f10cc764f33fa438e
make test TEST="micro:java.io.BufferedWriterBench" MICRO="OPTIONS=-p charset=ISO_8859_1,ASCII,UTF8,UTF16,GB18030

# Current (PR 26022 & BufferedWriter use StringBuilder as buffer)
# https://github.com/wenshao/jdk/tree/utf8_writer_202506_x4
git checkout 77c5996b6a7b7ea74d03b64c4c8e827a7d76f05a
make test TEST="micro:java.io.BufferedWriterBench" MICRO="OPTIONS=-p charset=ISO_8859_1,ASCII,UTF8,UTF16,GB18030


## Benchmark Numbers on Aliyun ECS c9i (Intel x64 CPU)

Benchmark         (charType)   (charset)  Units Base_Score Current_Score Improvement(%)
writeCharArray         ascii  ISO_8859_1  us/op      3.128         3.027 +3.23%
writeCharArray         ascii       ASCII  us/op      3.126         3.351 -6.88%
writeCharArray         ascii        UTF8  us/op      3.125         3.716 -18.91%
writeCharArray         ascii       UTF16  us/op     32.469        11.404 +64.87%
writeCharArray         ascii     GB18030  us/op      9.642         7.296 +24.34%
writeCharArray  utf8_2_bytes  ISO_8859_1  us/op      3.137         3.016 +3.86%
writeCharArray  utf8_2_bytes       ASCII  us/op     96.779         8.725 +90.99%
writeCharArray  utf8_2_bytes        UTF8  us/op     17.346        12.966 +25.25%
writeCharArray  utf8_2_bytes       UTF16  us/op     32.407        11.267 +65.19%
writeCharArray  utf8_2_bytes     GB18030  us/op     82.994        12.401 +85.14%
writeCharArray  utf8_3_bytes  ISO_8859_1  us/op    100.063         7.486 +92.51%
writeCharArray  utf8_3_bytes       ASCII  us/op     96.061         9.236 +90.40%
writeCharArray  utf8_3_bytes        UTF8  us/op     28.340        13.358 +52.86%
writeCharArray  utf8_3_bytes       UTF16  us/op     32.468        11.785 +63.70%
writeCharArray  utf8_3_bytes     GB18030  us/op     40.864        37.012 +9.66%
writeCharArray         emoji  ISO_8859_1  us/op    190.547        10.149 +94.67%
writeCharArray         emoji       ASCII  us/op    187.803        12.774 +93.17%
writeCharArray         emoji        UTF8  us/op     41.493        23.473 +43.49%
writeCharArray         emoji       UTF16  us/op     48.248        16.227 +66.36%
writeCharArray         emoji     GB18030  us/op    147.360        63.437 +57.01%
writeString            ascii  ISO_8859_1  us/op      3.340         2.770 +17.09%
writeString            ascii       ASCII  us/op      3.340         3.069 +8.11%
writeString            ascii        UTF8  us/op      3.324         2.944 +11.43%
writeString            ascii       UTF16  us/op     32.503        11.214 +65.49%
writeString            ascii     GB18030  us/op      9.023         6.999 +22.43%
writeString     utf8_2_bytes  ISO_8859_1  us/op      3.338         2.827 +15.31%
writeString     utf8_2_bytes       ASCII  us/op     95.964         8.542 +91.10%
writeString     utf8_2_bytes        UTF8  us/op     17.660        10.155 +42.44%
writeString     utf8_2_bytes       UTF16  us/op     32.516        11.173 +65.63%
writeString     utf8_2_bytes     GB18030  us/op     82.369        12.231 +85.14%
writeString     utf8_3_bytes  ISO_8859_1  us/op    100.280         7.363 +92.66%
writeString     utf8_3_bytes       ASCII  us/op     95.279         9.060 +90.48%
writeString     utf8_3_bytes        UTF8  us/op     28.344        18.366 +35.19%
writeString     utf8_3_bytes       UTF16  us/op     32.672        11.284 +65.43%
writeString     utf8_3_bytes     GB18030  us/op     43.798        37.145 +15.16%
writeString            emoji  ISO_8859_1  us/op    189.574         9.904 +94.75%
writeString            emoji       ASCII  us/op    187.021        12.427 +93.35%
writeString            emoji        UTF8  us/op     41.775        25.875 +37.98%
writeString            emoji       UTF16  us/op     48.240        15.696 +67.10%
writeString            emoji     GB18030  us/op    147.097        63.587 +56.78%


## Benchmark Numbers on MacBook M1 Pro (aarch64)

Benchmark                             (charType)   (charset)  Units Base_Score Current_Score Improvement(%)
BufferedWriterBench.writeCharArray         ascii  ISO_8859_1  us/op      2.815         2.133 +24.20%
BufferedWriterBench.writeCharArray         ascii       ASCII  us/op      2.742         2.352 +14.22%
BufferedWriterBench.writeCharArray         ascii        UTF8  us/op      2.704         2.616 +3.25%
BufferedWriterBench.writeCharArray         ascii       UTF16  us/op     31.294         8.489 +72.87%
BufferedWriterBench.writeCharArray         ascii     GB18030  us/op      8.932         3.820 +57.20%
BufferedWriterBench.writeCharArray  utf8_2_bytes  ISO_8859_1  us/op      2.828         2.210 +21.85%
BufferedWriterBench.writeCharArray  utf8_2_bytes       ASCII  us/op    109.255         5.669 +94.80%
BufferedWriterBench.writeCharArray  utf8_2_bytes        UTF8  us/op     22.353        14.039 +37.15%
BufferedWriterBench.writeCharArray  utf8_2_bytes       UTF16  us/op     31.268         8.349 +73.28%
BufferedWriterBench.writeCharArray  utf8_2_bytes     GB18030  us/op     90.835         6.816 +92.50%
BufferedWriterBench.writeCharArray  utf8_3_bytes  ISO_8859_1  us/op    109.734         7.834 +92.88%
BufferedWriterBench.writeCharArray  utf8_3_bytes       ASCII  us/op    106.981         7.906 +92.60%
BufferedWriterBench.writeCharArray  utf8_3_bytes        UTF8  us/op     21.453        16.076 +25.07%
BufferedWriterBench.writeCharArray  utf8_3_bytes       UTF16  us/op     31.294         6.945 +77.75%
BufferedWriterBench.writeCharArray  utf8_3_bytes     GB18030  us/op     49.007        27.891 +43.02%
BufferedWriterBench.writeCharArray         emoji  ISO_8859_1  us/op    223.538        11.189 +94.54%
BufferedWriterBench.writeCharArray         emoji       ASCII  us/op    264.875        11.384 +95.69%
BufferedWriterBench.writeCharArray         emoji        UTF8  us/op     35.704        21.672 +39.29%
BufferedWriterBench.writeCharArray         emoji       UTF16  us/op     45.979        11.255 +75.51%
BufferedWriterBench.writeCharArray         emoji     GB18030  us/op    148.829        57.625 +61.33%
BufferedWriterBench.writeString            ascii  ISO_8859_1  us/op      2.898         2.159 +25.49%
BufferedWriterBench.writeString            ascii       ASCII  us/op      2.876         2.591 +9.91%
BufferedWriterBench.writeString            ascii        UTF8  us/op      2.894         2.466 +14.79%
BufferedWriterBench.writeString            ascii       UTF16  us/op     31.236         8.759 +71.82%
BufferedWriterBench.writeString            ascii     GB18030  us/op      9.010         3.899 +56.70%
BufferedWriterBench.writeString     utf8_2_bytes  ISO_8859_1  us/op      2.894         2.178 +24.71%
BufferedWriterBench.writeString     utf8_2_bytes       ASCII  us/op    108.426         5.611 +94.82%
BufferedWriterBench.writeString     utf8_2_bytes        UTF8  us/op     22.206        12.225 +44.93%
BufferedWriterBench.writeString     utf8_2_bytes       UTF16  us/op     31.305         8.773 +71.98%
BufferedWriterBench.writeString     utf8_2_bytes     GB18030  us/op     90.820         6.907 +92.40%
BufferedWriterBench.writeString     utf8_3_bytes  ISO_8859_1  us/op    108.983         7.931 +92.66%
BufferedWriterBench.writeString     utf8_3_bytes       ASCII  us/op    107.064         7.836 +92.66%
BufferedWriterBench.writeString     utf8_3_bytes        UTF8  us/op     21.664        13.102 +39.47%
BufferedWriterBench.writeString     utf8_3_bytes       UTF16  us/op     31.546         6.930 +78.00%
BufferedWriterBench.writeString     utf8_3_bytes     GB18030  us/op     52.688        27.698 +47.17%
BufferedWriterBench.writeString            emoji  ISO_8859_1  us/op    221.930        11.160 +94.95%
BufferedWriterBench.writeString            emoji       ASCII  us/op    236.791        11.116 +95.30%
BufferedWriterBench.writeString            emoji        UTF8  us/op     35.025        23.210 +33.73%
BufferedWriterBench.writeString            emoji       UTF16  us/op     45.988        11.334 +75.32%
BufferedWriterBench.writeString            emoji     GB18030  us/op    148.202        57.472 +61.23%

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26022#issuecomment-3027011273


More information about the nio-dev mailing list