RFR: 8325340: Add ASCII fast-path to Data-/ObjectInputStream.readUTF [v5]

Raffaello Giulietti rgiulietti at openjdk.org
Thu Feb 15 10:57:54 UTC 2024


On Wed, 14 Feb 2024 14:30:02 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> Ah OK.
>> 
>> I didn't check the current code, only the proposed one.
>> Although the specification clearly states that the method should throw, if the current code does not throw on zeros, then it makes sense that the proposed one shouldn't either.
>
> The specification is somewhat ambiguous:
> https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/io/DataInput.html#readUTF()
> 
> There's a sweeping `Throws UTFDataFormatException - if the bytes do not represent a valid modified UTF-8 encoding of a string` but also: `If the first byte of a group matches the bit pattern 0xxxxxxx (where x means "may be 0 or 1"), then the group consists of just that byte. The byte is zero-extended to form a character.` I think the latter gives some leeway on being lenient on embedded zeros, even if it's made clear elsewhere that valid encoders need to replace zeros with the `0xC0, 0x80` sequence.

In fact, the implementations of `readUTF*()` in `DataInputStream` and `ObjectInputStream` are much more lenient than that. They also accept ASCII characters that are encoded with 2 bytes instead of 1. There's no check that the encoding is "minimal length".

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17734#discussion_r1490820923


More information about the nio-dev mailing list