RFR: 8325340: Add ASCII fast-path to Data-/ObjectInputStream.readUTF
Claes Redestad
redestad at openjdk.org
Wed Feb 14 14:32:55 UTC 2024
On Wed, 14 Feb 2024 11:35:10 GMT, Raffaello Giulietti <rgiulietti at openjdk.org> wrote:
>> While properly encoded modified UTF-8 strings won't have embedded zeros (`\u0000` will be encoded as `0xC0, 0x80`) the decoding routines in `DataInputStream` and `ObjectInputStream` allows them and does not throw an exception if an embedded zero is encountered. This PR does not change semantics here AFAICT. If you think we need to be stricter in these decoders that could be done as a separate RFE and I'll put this on hold.
>
> Ah OK.
>
> I didn't check the current code, only the proposed one.
> Although the specification clearly states that the method should throw, if the current code does not throw on zeros, then it makes sense that the proposed one shouldn't either.
The specification is somewhat ambiguous:
https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/io/DataInput.html#readUTF()
There's a sweeping `Throws UTFDataFormatException - if the bytes do not represent a valid modified UTF-8 encoding of a string` but also: `If the first byte of a group matches the bit pattern 0xxxxxxx (where x means "may be 0 or 1"), then the group consists of just that byte. The byte is zero-extended to form a character.` I think the latter gives some leeway on being lenient on embedded zeros, even if it's made clear elsewhere that valid encoders need to replace zeros with the `0xC0, 0x80` sequence.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/17734#discussion_r1489564324
More information about the nio-dev
mailing list