RFR: 8366421: ModifiedUtf.utfLen may overflow for giant string
Guanqiang Han
ghan at openjdk.org
Wed Sep 24 00:39:08 UTC 2025
On Thu, 18 Sep 2025 13:09:10 GMT, Chen Liang <liach at openjdk.org> wrote:
>> Hi @RogerRiggs @liach
>> Thanks for the suggestion.
>> Creating a string of Integer.MAX_VALUE/2 characters would require enormous memory, even using a file, since the JVM still needs to hold the string content in memory when reading it back.
>> Instead, i used a small string chunk with 1-, 2-, and 3-byte UTF-8 characters and repeatedly called ModifiedUtf.utfLen() in a loop, accumulating the total in a long. This safely simulates a total length exceeding Integer.MAX_VALUE and verifies that the change to long prevents overflow.
>> Could you please take another look when you have time ? Thanks!
>
>> This safely simulates a total length exceeding Integer.MAX_VALUE and verifies that the change to long prevents overflow.
>
> Can you try derive a **regression** test that fails on mainline and passes with your fix? You can generate a string with `chunk.repeat(iterations)` and run `ModifiedUtf.utfLen` on it.
@liach Thank you for the approval. I combined these tests into a single class mainly to reuse the large string.
@RogerRiggs Do you have any further suggestions? If not, I’ll proceed with integrating it.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/27285#issuecomment-3326012201
More information about the core-libs-dev
mailing list