RFR: 8366421: ModifiedUtf.utfLen may overflow for giant string [v2]

Guanqiang Han ghan at openjdk.org
Fri Sep 19 05:29:15 UTC 2025


On Fri, 19 Sep 2025 01:39:52 GMT, Chen Liang <liach at openjdk.org> wrote:

>> String.repeat() cannot generate a string whose total length exceeds Integer.MAX_VALUE due to internal limits. That’s why I used a small chunk and accumulated UTF-8 length in a loop.It seems that the String type cannot hold a string whose length exceeds Integer.MAX_VALUE.
>> https://github.com/openjdk/jdk/blob/e3a4c28409ac62feee9efe069e3a3482e7e2cdd2/src/java.base/share/classes/java/lang/String.java#L4875
>
> jshell --add-exports java.base/jdk.internal.util=ALL-UNNAMED
> |  Welcome to JShell -- Version 24
> |  For an introduction type: /help intro
> 
> jshell> import jdk.internal.util.ModifiedUtf;
> 
> jshell> var s = "\u0100\u0100\u2600".repeat(Integer.MAX_VALUE / 6 - 1);
> s ==> "???????????????????????????????????????????????? ... ?????????????????????????"
> 
> jshell> ModifiedUtf.utfLen(s)
> |  Error:
> |  method utfLen in class jdk.internal.util.ModifiedUtf cannot be applied to given types;
> |    required: java.lang.String,int
> |    found:    java.lang.String
> |    reason: actual and formal argument lists differ in length
> |  ModifiedUtf.utfLen(s)
> |  ^----------------^
> 
> jshell> ModifiedUtf.utfLen(s, 0)
> $3 ==> -1789569716
> 
> 
> You can construct such a string if the number of bytes in the Modified UTF 8 form is more than the number of bytes in UTF16 form, such as if you use all 3-byte characters.

Got it!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27285#discussion_r2361812961


More information about the core-libs-dev mailing list