<i18n dev> RFR: JDK-8039751: UTF-8 decoder fails to handle some edge cases correctly

Wed Apr 9 23:43:23 UTC 2014

On 09/04/2014 15:51, Xueming Shen wrote:
> Hi,
> 
> Please help review the fix for JDK-8039751.
> 
> Issue:     https://bugs.openjdk.java.net/browse/JDK-8039751
> webrev:  http://cr.openjdk.java.net/~sherman/8039751/webrev/
> 
> 
> This is the corner case (in 4 bytes sequence) we missed when fixing
> 7096080 [1].
> The UTF_8 decoder correctly returns the malformed length for some malformed
> 4-byte illegal byte sequence (via Decoder.malformedN(...)), but it fails
> to do so if
> there is no enough (< 4 bytes) bytes in input buffer (via
> isMalfromed4_2(...))
> 
> The proposed change fixes these corner cases.
> 
> Hey Mark, my reading of tomcat's test case suggests "malformed 4-byte
> sequence"
> is the only thing left after the jdk8 fix, right?

Thanks for such a quick response.

I agree with your reading of the Tomcat test case. There are two
slightly different edge cases here.

The first is the one I explained in detail in the bug report where you
know from the first two bytes that whatever the next two bytes are, the
result is going to be larger than the largest valid code point.

The second is where you know from the first two bytes that whatever the
next two bytes are, the code point should have been encoded in fewer bytes.

If I am reading your additional test cases correctly, you have both of
these covered.

Many thanks,

Mark

> 
> Thanks!
> -Sherman
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-7096080