RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string
Thomas Stuefe
stuefe at openjdk.org
Fri Jul 26 05:39:33 UTC 2024
On Fri, 26 Jul 2024 04:03:10 GMT, David Holmes <dholmes at openjdk.org> wrote:
> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
>
> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
>
> Testing:
> - new gtest exercises the truncation code with the different possibilities for bad truncation
> - tiers 1-3 sanity testing
>
> Thanks.
This is a neat technique, but it won't work for very short strings (e.g. consisting of just one or two multi-byte sequences, the latter being invalid). Reason is that you need a minimal buffer length to do the check safely.
What you could do alternatively is to allocate the `msg` buffer with 5 leading bytes that you don't use, just zero-initialize. So the string start would be at msg+5. But that way, you can safely overstep the array with e.g. index - 5 without corruption.
src/hotspot/share/utilities/exceptions.cpp line 275:
> 273: // we may also have a truncated UTF-8 sequence. In such cases we need to fix the buffer so the UTF-8
> 274: // sequence is valid.
> 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
You should test for length >= 5 since it is the farthest you could access in `UTF8::truncate_to_legal_utf8` later:
for (int index = length - 2; index > 0; index--) {
...
assert(buffer[index - 3] == 0xED, "malformed sequence");
src/hotspot/share/utilities/exceptions.cpp line 276:
> 274: // sequence is valid.
> 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
> 276: assert(msg[max_msg_size - 1] == '\0', "should be null terminated");
Would this always be true? For a formatting error, too?
Maybe just to be sure, instead of asserting set the last byte to zero.
src/hotspot/share/utilities/utf8.cpp line 407:
> 405: // To avoid that the caller can choose to check for validity first.
> 406: // The incoming buffer is still expected to be NUL-terminated.
> 407: void UTF8::truncate_to_legal_utf8(unsigned char* buffer, int length) {
Lets make buffer length size_t and avoid awkward casting
-------------
PR Review: https://git.openjdk.org/jdk/pull/20345#pullrequestreview-2200961895
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692526390
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692526795
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692524657
More information about the hotspot-dev
mailing list