RFR: 8366421: ModifiedUtf.utfLen may overflow for giant string

Thu Sep 18 13:02:16 UTC 2025

On Wed, 17 Sep 2025 13:32:01 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

>> Please review this patch.
>> 
>> **Description:**
>> 
>> Currently, ModifiedUtf.utfLen returns a signed int. For very large strings, this may overflow and produce negative values, leading to incorrect behavior in code that relies on the UTF length. This patch changes the return type to long, which fully resolves the issue and allows safe handling of giant strings.
>> 
>> **Test:**
>> 
>> GHA
>
> Can you add a test of the maximum length UTF-8 encoded string. 
> That would be a string of Integer.MAX_VALUE/2 characters that were > 0xff.
> It will likely have to write it to a file and read it back, ByteArrayIn/OutStream wouldn't be big enough.

Hi @RogerRiggs @liach 
Thanks for the suggestion. 
Creating a string of Integer.MAX_VALUE/2 characters would require enormous memory, even using a file, since the JVM still needs to hold the string content in memory when reading it back.
Instead, i used a small string chunk with 1-, 2-, and 3-byte UTF-8 characters and repeatedly called ModifiedUtf.utfLen() in a loop, accumulating the total in a long. This safely simulates a total length exceeding Integer.MAX_VALUE and verifies that the change to long prevents overflow.
Could you please take another look when you have time ? Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27285#issuecomment-3307322322