RFR: 8325340: Add ASCII fast-path to Data-/ObjectInputStream.readUTF

Claes Redestad redestad at openjdk.org
Wed Feb 14 11:32:03 UTC 2024


On Wed, 14 Feb 2024 10:41:17 GMT, Raffaello Giulietti <rgiulietti at openjdk.org> wrote:

>> Adding a fast-path for ASCII-only modified UTF-8 strings deserialied via Data- and ObjectInputStream
>> 
>> Testing: tier1-3
>
> src/java.base/share/classes/java/io/DataInputStream.java line 604:
> 
>> 602:                 // For ASCII ISO-8859-1 is equivalent to UTF-8, while avoiding a redundant
>> 603:                 // scan
>> 604:                 return new String(bytearr, 0, utflen, StandardCharsets.ISO_8859_1);
> 
> Not sure this is correct.
> If `bytearr` contains some `(byte)0`, that is, if `in` is malformed, this doesn't throw `UTFDataFormatException`, but it should: modified UTF-8 cannot contain zeros.

While properly encoded modified UTF-8 strings won't have embedded zeros (`\u0000` will be encoded as `0xC0, 0x80`) the decoding routines in `DataInputStream` and `ObjectInputStream` allows them and does not throw an exception if an embedded zero is encountered. This PR does not change semantics here AFAICT. If you think we need to be stricter in these decoders that could be done as a separate RFE and I'll put this on hold.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17734#discussion_r1489325376


More information about the core-libs-dev mailing list