JEP 254: Compact Strings - length limits
John Rose
john.r.rose at oracle.com
Tue Sep 6 21:11:35 UTC 2016
On Sep 6, 2016, at 12:58 PM, Charles Oliver Nutter <headius at headius.com> wrote:
>
> On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shen <xueming.shen at oracle.com>
> wrote:
>
>> Yes, it's a known "limit" given the nature of the approach. It is not
>> considered
>> to be an "incompatible change", because the max length the String class
>> and
>> the corresponding buffer/builder classes can support is really an
>> implementation
>> details, not a spec requirement. The conclusion from the discussion back
>> then
>> was this is something we can trade off for the benefits we gain from the
>> approach.
>> Do we have a real use case that impacted by this change?
>>
>
> Well, doesn't this mean that any code out there consuming String data
> that's longer than Integer.MAX_VALUE / 2 will suddenly start failing on
> OpenJDK 9?
>
> Not that such a case is a particularly good pattern, but I'm sure there's
> code out there doing it. On JRuby we routinely get bug reports complaining
> that we can't support strings larger than 2GB (and we have used byte[] for
> strings since 2006).
>
> - Charlie
The most basic scale requirement for strings is that they support class-file
constants, which top out at a UTF8-length of 2**16. Lengths beyond that,
to fill up the 'int' return value of String::length, are less well specified.
FTR, we could have chosen char[], int[], or long[] (not byte[]) as the backing
store for string data. With long[] we could have strings above 4G-chars.
But it would have come with a perf. tax, since the T[].length field would need
to be combined with an extra bit or two (from a flag byte) to complete the length.
That's 2-3 extra instructions for loading a string length, or else a redundant
length field. So it's a trade-off.
Likewise, choosing a third format deepens branch depth in order to get to payload.
Likewise, making the second format (of two) have a length field embedded in the
payload section requires a conditional load or branch, in order to load the string
length. Again, more instructions.
The team has looked at 20 possibilities like these. The current design is fastest.
I hope it flies.
— John
More information about the core-libs-dev
mailing list