RFR: 8366421: ModifiedUtf.utfLen may overflow for giant string

Wed Sep 24 00:39:08 UTC 2025

On Thu, 18 Sep 2025 13:09:10 GMT, Chen Liang <liach at openjdk.org> wrote:

>> Hi @RogerRiggs @liach 
>> Thanks for the suggestion. 
>> Creating a string of Integer.MAX_VALUE/2 characters would require enormous memory, even using a file, since the JVM still needs to hold the string content in memory when reading it back.
>> Instead, i used a small string chunk with 1-, 2-, and 3-byte UTF-8 characters and repeatedly called ModifiedUtf.utfLen() in a loop, accumulating the total in a long. This safely simulates a total length exceeding Integer.MAX_VALUE and verifies that the change to long prevents overflow.
>> Could you please take another look when you have time ? Thanks!
>
>> This safely simulates a total length exceeding Integer.MAX_VALUE and verifies that the change to long prevents overflow.
> 
> Can you try derive a **regression** test that fails on mainline and passes with your fix? You can generate a string with `chunk.repeat(iterations)` and run `ModifiedUtf.utfLen` on it.

@liach Thank you for the approval. I combined these tests into a single class mainly to reuse the large string.

@RogerRiggs  Do you have any further suggestions? If not, I’ll proceed with integrating it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27285#issuecomment-3326012201