Unexpected behaviour with larger Strings

Raffaello Giulietti raffaello.giulietti at gmail.com
Mon Apr 20 18:56:58 UTC 2020


Hi,

I'm on Linux, but the explanation might be the same as the following one.

An easier way to obtain the same error on OpenJDK8 + HotSpot is to execute
     byte[] b = new byte[Integer.MAX_VALUE];
which is exactly what happens behind the scenes in the UTF-8 case.

The encoder pessimistically assumes that each char will be encoded to at 
most 3 bytes. The expansion factor 3, however, is expressed as the float 
3.0f. This, in turn, is first converted to the double 3.0, multiplied by 
your length 1 << 30 and cast to int. As the product overflows the int 
range, the cast produces Integer.MAX_VALUE.

While Integer.MAX_VALUE should be considered a legal array size, I 
recall to have read somewhere that implementations are allowed to be a 
little bit more restrictive. Experimentally, the maximum size for a 
byte[] on OpenJDK8 + HotSpot / Linux is Integer.MAX_VALUE - 2. I guess 
it is the same on macOS.

Hope this helps.


Greetings
Raffaello


More information about the core-libs-dev mailing list