question regarding the java.lang.String design

Xiaobin Lu Xiaobin.Lu at Sun.COM
Fri Jan 30 22:20:06 PST 2009


Hi David,

I was ignoring the fact that substring could use the offset & count for 
sharing purpose. I am thinking whether we should have a flag like 
"isCharArrayShared" which will be set to true only for those string 
returned from substring call. That way, for many other methods in 
String, we could ignore loading offset & count fields which are mostly 
set to 0 and val.length anyways (val is the character array).

Regards,
-Xiaobin

David Holmes - Sun Microsystems wrote:
> Hi Xiaobin,
>
> As you've probably gleaned by now the count and offset fields are to 
> allow sharing of the underlying char[] - which is a safe thing to do 
> exactly because a string is immutable. I've often thought this 
> particular optimization was under-utilized.
>
> As others have said optimization of strings has been a recurring theme 
> for many years now - there was even a paper on it at last year's ACM 
> OOPSLA conference. IBM Research's Tokyo labs do a lot in this area - 
> see for example "RT0750 A Quantitative Analysis of Space Waste from 
> Java Strings and its Elimination at GC Time".
>
> I've occasionally thought that with all the duplicate strings that 
> readily occur in Java it might be an option to have a few large tables 
> of "text" containing all the characters, and then to define a String 
> as one or more pairs of indices into these tables. But that's as far 
> as I've thought about it :)
>
> Cheers,
> David Holmes
>
>
> Xiaobin Lu said the following on 01/31/09 04:42:
>> Resend the email to hotspot-dev at openjdk.java.net.
>> -Xiaobin
>>
>> Xiaobin Lu wrote:
>>> Hi folks,
>>>
>>> While I am looking at the java.lang.String implementation, I noticed 
>>> that it has "offset" and "count" field in java.lang.String. For the 
>>> offset field, I only found two places which set that field, but I 
>>> believe they can be got rid of too. The two places are 
>>> String(StringBuffer buffer) & String(StringBuilder builder).
>>>
>>> My question is that if String is immutable, why do we need to carry 
>>> these two fields? String could be more compacted without these two 
>>> fields. The equals to method can be more efficiently implemented as 
>>> just calling java.util.Array.equals(v1, v2) which is intrinsified on 
>>> x86 at least.
>>>
>>> Another crazy thought is that we can compact the character array to 
>>> a byte array if we don't have any characters other than ASCII (which 
>>> we might use a boolean flag to indicate that).
>>>
>>> I'd appreciate your insight on this.
>>>
>>> -Xiaobin
>>>
>>>
>>>
>>




More information about the hotspot-dev mailing list