RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string

Thomas Stuefe stuefe at openjdk.org
Fri Jul 26 05:39:33 UTC 2024


On Fri, 26 Jul 2024 04:03:10 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
> 
> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
> 
> Testing:
>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>  - tiers 1-3 sanity testing
> 
> Thanks.

This is a neat technique, but it won't work for very short strings (e.g. consisting of just one or two multi-byte sequences, the latter being invalid). Reason is that you need a minimal buffer length to do the check safely.

What you could do alternatively is to allocate the `msg` buffer with 5 leading bytes that you don't use, just zero-initialize. So the string start would be at msg+5. But that way, you can safely overstep the array with e.g. index - 5 without corruption.

src/hotspot/share/utilities/exceptions.cpp line 275:

> 273:   // we may also have a truncated UTF-8 sequence. In such cases we need to fix the buffer so the UTF-8
> 274:   // sequence is valid.
> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {

You should test for length >= 5 since it is the farthest you could access in `UTF8::truncate_to_legal_utf8` later:


  for (int index = length - 2; index > 0; index--) {
...
    assert(buffer[index - 3] == 0xED, "malformed sequence");

src/hotspot/share/utilities/exceptions.cpp line 276:

> 274:   // sequence is valid.
> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
> 276:     assert(msg[max_msg_size - 1] == '\0', "should be null terminated");

Would this always be true? For a formatting error, too?
Maybe just to be sure, instead of asserting set the last byte to zero.

src/hotspot/share/utilities/utf8.cpp line 407:

> 405: // To avoid that the caller can choose to check for validity first.
> 406: // The incoming buffer is still expected to be NUL-terminated.
> 407: void UTF8::truncate_to_legal_utf8(unsigned char* buffer, int length) {

Lets make buffer length size_t and avoid awkward casting

-------------

PR Review: https://git.openjdk.org/jdk/pull/20345#pullrequestreview-2200961895
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692526390
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692526795
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692524657


More information about the hotspot-dev mailing list