RFR: 8361018: Re-examine buffering and encoding conversion in BufferedWriter [v6]

Wed Jul 2 14:32:41 UTC 2025

On Tue, 1 Jul 2025 00:01:21 GMT, Shaojin Wen <swen at openjdk.org> wrote:

>> BufferedWriter -> OutputStreamWriter -> StreamEncoder
>> 
>> In this call chain, BufferedWriter has a char[] buffer, and StreamEncoder has a ByteBuffer. There are two layers of cache here, or the BufferedWriter layer can be removed. And when charset is UTF8, if the content of write(String) is LATIN1, a conversion from LATIN1 to UTF16 and then to LATIN1 will occur here.
>> 
>> LATIN1 -> UTF16 -> UTF8
>> 
>> We can improve BufferedWriter. When the parameter Writer instanceof OutputStreamWriter is passed in, remove the cache and call it directly. In addition, improve write(String) in StreamEncoder to avoid unnecessary encoding conversion.
>
> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert "BufferedWriter buffer use StringBuilder"
>   
>   This reverts commit da902ca0b0bd6acc003deb8ad1ca0d6485a29a27.

This latest prototype looks great! It means that we can get rid of the old `BufferedImpl` by using `WriterImpl` as the new code, and remove `StreamEncoder.UTF8Impl`.

I think this prototype can be split this way:
1. Update ArrayEncoder to pass `dp`, open up StringBuilder in JLA, and make BufferedWriter + StreamEncoder use ArrayEncoder. We can use a benchmark writing encodings like CESU for a first step proof of concept.
2. Make UTF8/ISO88591 array encoders. This will open up a few String UTF8 encoding methods in JLA.
3. More array encoders. For example, GB18030 gets a new array encoder in your patch.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26022#issuecomment-3028104142