RFR: 8292043: Incorrect decoding near EOF for stateful decoders like UTF-16

Sat Aug 20 08:07:30 UTC 2022

On Fri, 19 Aug 2022 16:32:02 GMT, Naoto Sato <naoto at openjdk.org> wrote:

> Fixing incorrect state handling with EOF in `StreamDecoder`. There's a `reset()` call to the decoder seeing the EOF before the last `decode()` operation to handle the state correctly. Removing the call should not affect other cases because `reset()` is issued down the execution.

test/jdk/java/io/InputStreamReader/StatefulDecoderNearEOF.java line 53:

> 51:                     StandardCharsets.UTF_16.newDecoder().onMalformedInput(CodingErrorAction.REPORT))) {
> 52:                 System.out.printf("%04x%n", r.read()); // \u00d8 (wrong, uses UTF-16BE)
> 53:                 System.out.printf("%04x%n", r.read()); // EOF

This will pass if either read fails, I think the test should be checking that the first call to read throws MalformedInputException.

Would it be feasible to add a second test where there are characters between the BOM and the truncated high surrogate? It would be possible to decode those characters before it fails at the end of the stream.

-------------

PR: https://git.openjdk.org/jdk/pull/9945