RFR: 8366421: ModifiedUtf.utfLen may overflow for giant string

Guanqiang Han ghan at openjdk.org
Sun Sep 21 11:15:14 UTC 2025


On Wed, 17 Sep 2025 13:32:01 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

>> Please review this patch.
>> 
>> **Description:**
>> 
>> Currently, ModifiedUtf.utfLen returns a signed int. For very large strings, this may overflow and produce negative values, leading to incorrect behavior in code that relies on the UTF length. This patch changes the return type to long, which fully resolves the issue and allows safe handling of giant strings.
>> 
>> **Test:**
>> 
>> GHA
>
> Can you add a test of the maximum length UTF-8 encoded string. 
> That would be a string of Integer.MAX_VALUE/2 characters that were > 0xff.
> It will likely have to write it to a file and read it back, ByteArrayIn/OutStream wouldn't be big enough.

@RogerRiggs @liach @dholmes-ora 

Thank you for your suggestion! 
I’ve optimized the regression test as requested — could you please take another look when you have time? Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27285#issuecomment-3315926338


More information about the core-libs-dev mailing list