RFR: 8266013: Unexpected replacement character handling on stateful CharsetEncoder [v2]

Tue May 11 18:13:55 UTC 2021

On Fri, 30 Apr 2021 16:11:30 GMT, Ichiroh Takiguchi <itakiguchi at openjdk.org> wrote:

>> When an invalid character is converted by getBytes() method, the character is converted to replacement byte data.
>> Shift code (SO/SI) may not be added into right place by EBCDIC Mix charset.
>> EBCDIC Mix charset encoder is stateful encoder.
>> Shift code should be added by switching character set.
>> On x-IBM1364, "\u3000\uD800" should be converted to "\x0E\x40\x40\x0F\x6F", but "\x0E\x40\x40\x6F\x0F"
>> SI is not in right place.
>> 
>> Also ISO2022 related charsets use escape sequence to switch character set.
>> But same kind of issue is there.
>
> Ichiroh Takiguchi has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8266013: Unexpected replacement character handling on stateful CharsetEncoder

src/java.base/share/classes/java/nio/charset/Charset-X-Coder.java.template line 632:

> 630:             if (action == CodingErrorAction.REPLACE) {
> 631: #if[encoder]
> 632:                 if (maxBytesPerChar > 3.0) {

Does this check imply it is for stateful encoder? Since the fix is for incorrect SO/SI handling, should the fix be localized in those EBCDIC/ISO2022 encoders, not in the generic Charset-X-Coder?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3719