RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8

Roger Riggs rriggs at openjdk.org
Mon Apr 8 13:49:11 UTC 2024


On Mon, 8 Apr 2024 13:39:34 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

>> test/jdk/java/lang/String/CompactString/MaxSizeUTF16String.java line 143:
>> 
>>> 141:         // Strings of size min+1...min+2, throw OOME
>>> 142:         // The resulting byte array would exceed implementation limits
>>> 143:         for (int count = min + 1; count < max; count++) {
>> 
>> The case `min + 1` cannot lead to a `NegativeArraySizeException` in the current code, since `3 * (min + 1) <= MAX_VALUE`. In theory, it should succeed by returning the encoded `byte[]`, although It throws `OOME` for exceeding VM limits. That is, this case does not trigger the invocation of `computeSizeUTF8_UTF16()` in the proposed fix.
>> 
>> Only `min + 2` throws `NegativeArraySizeException` in the current code, and thus the invocation of `computeSizeUTF8_UTF16()` in the proposed fix.
>
> Indeed, different OOMEs are thrown in the two cases triggered by different limits, min +2 is due to integer overflow, while min +1  is due a VM limit on the size of byte[Integer.MAX_VALUE]. Different VM implementations may have different limits on the max size of a byte array.

There might be some merit in lowering the threshold at which an exact size computation is triggered.
The oversized allocation "wastes" quite a bit of memory and causes extra GC work and usually triggers a second copy of the final size.
However, some guess or heuristic would be needed to choose the threshold at which extra cpu work is needed to compute the exact size vs some metric as to the "cost" of wasted memory (and saving on the copy).
Most guesses would be somewhat arbitrary; bigger than 1Mb, 1GB, etc....? 
Choosing that number would be out of scope for the issue raised by this bug.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18663#discussion_r1555875973


More information about the core-libs-dev mailing list