question regarding the java.lang.String design
Xiaobin Lu
Xiaobin.Lu at Sun.COM
Fri Jan 30 22:20:06 PST 2009
Hi David,
I was ignoring the fact that substring could use the offset & count for
sharing purpose. I am thinking whether we should have a flag like
"isCharArrayShared" which will be set to true only for those string
returned from substring call. That way, for many other methods in
String, we could ignore loading offset & count fields which are mostly
set to 0 and val.length anyways (val is the character array).
Regards,
-Xiaobin
David Holmes - Sun Microsystems wrote:
> Hi Xiaobin,
>
> As you've probably gleaned by now the count and offset fields are to
> allow sharing of the underlying char[] - which is a safe thing to do
> exactly because a string is immutable. I've often thought this
> particular optimization was under-utilized.
>
> As others have said optimization of strings has been a recurring theme
> for many years now - there was even a paper on it at last year's ACM
> OOPSLA conference. IBM Research's Tokyo labs do a lot in this area -
> see for example "RT0750 A Quantitative Analysis of Space Waste from
> Java Strings and its Elimination at GC Time".
>
> I've occasionally thought that with all the duplicate strings that
> readily occur in Java it might be an option to have a few large tables
> of "text" containing all the characters, and then to define a String
> as one or more pairs of indices into these tables. But that's as far
> as I've thought about it :)
>
> Cheers,
> David Holmes
>
>
> Xiaobin Lu said the following on 01/31/09 04:42:
>> Resend the email to hotspot-dev at openjdk.java.net.
>> -Xiaobin
>>
>> Xiaobin Lu wrote:
>>> Hi folks,
>>>
>>> While I am looking at the java.lang.String implementation, I noticed
>>> that it has "offset" and "count" field in java.lang.String. For the
>>> offset field, I only found two places which set that field, but I
>>> believe they can be got rid of too. The two places are
>>> String(StringBuffer buffer) & String(StringBuilder builder).
>>>
>>> My question is that if String is immutable, why do we need to carry
>>> these two fields? String could be more compacted without these two
>>> fields. The equals to method can be more efficiently implemented as
>>> just calling java.util.Array.equals(v1, v2) which is intrinsified on
>>> x86 at least.
>>>
>>> Another crazy thought is that we can compact the character array to
>>> a byte array if we don't have any characters other than ASCII (which
>>> we might use a boolean flag to indicate that).
>>>
>>> I'd appreciate your insight on this.
>>>
>>> -Xiaobin
>>>
>>>
>>>
>>
More information about the hotspot-dev
mailing list