OutputStreamWriter (not) flushing stateful Charsetencoder

Wed Nov 10 14:12:26 UTC 2021

(I thought this was discussed  a while back on a OpenJDK mailing list, but I can’t find it. So apologies if this is a duplicate, but I might have seen it on Apache Commons-io instead - which fixed a similar issue on reader side)

The problem: I have code using a OutputStreamWriter with a customer defined charset name. this writer is flushed, and the code expects all pending bytes to be written. However when a stateful charset like cp930 is used, this is not the case. The final unshift byte for example is only written when the writer is closed. This is probably because it does not call end of data encode on the encoder in the flush().

The class does not clearly say or not say what is the correct behavior, however the flush() is formulated in a way that one could expect it should produce a complete stream.

So, is this a Bug in the implementation, if not should it be added to the documentation?

Here is a small JShell reproducer, you see the extra unshift byte (dec 15) only after the close:

var b = new byte[] { 0x31, (byte)0xef, (byte)0xbc, (byte)0x91 };
var s = new String(b, "UTF-8"); // „1２“ (1 is ascii, 2 is fw)
var bos = new ByteArrayOutputStream();
var w = new OutputStreamWriter(bos, "cp930"); // stateful ebcdic with Shift chars
w.write(s);
w.flush();
bos.toByteArray()
$8 ==> byte[4] { -15, 14, 66, -15 }
w.close();
bos.toByteArray()
$10 ==> byte[5] { -15, 14, 66, -15, 15 }

--
http://bernd.eckenfels.net