RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string

David Holmes dholmes at openjdk.org
Fri Jul 26 21:46:42 UTC 2024


On Fri, 26 Jul 2024 08:16:14 GMT, Daniel Jeliński <djelinski at openjdk.org> wrote:

>> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
>> 
>> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
>> 
>> Testing:
>>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>>  - tiers 1-3 sanity testing
>> 
>> Thanks.
>
> src/hotspot/share/utilities/utf8.cpp line 440:
> 
>> 438:         // Could be first or fourth byte. If fourth
>> 439:         // then 2 bytes before will have second byte pattern (0b1010xxxx)
>> 440:         if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) {
> 
> Suggestion:
> 
>         if ((index - 3) >= 0 && ((buffer[index - 2] & 0xF0) == 0xA0)) {

I don't understand the rationale for the suggestion sorry.

> src/hotspot/share/utilities/utf8.cpp line 442:
> 
>> 440:         if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) {
>> 441:           // it was fourth byte so truncate 3 bytes earlier
>> 442:           assert(buffer[index - 3] == 0xED, "malformed sequence");
> 
> This needs to be an if, not an assert: ec-a0-80 is a [legitimate 3-byte UTF-8](https://www.compart.com/en/unicode/U+C800)

Will need to re-examine this part.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693629740
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693630625


More information about the hotspot-dev mailing list