Request for Review : CR#6924259: Remove String.count/String.offset
Rémi Forax
forax at univ-mlv.fr
Fri May 25 13:57:22 UTC 2012
Hi Mike, Hi Alan, Hi all,
in my opinion, EMPTY_STRING_VALUE is a premature optimization,
the idea is, I think, to avoid to create an empty char but if you want
to do that,
it's better to do that in StringBuilder.toString() to avoid to create
the String at all.
I'm worried about one pathological case where a lot of non empty String
will be created and then one empty will be created after the code will
be JITed,
in that case Hotspot will deoptimize all codes that have inlined the
constructor call.
I think it's better to have a simple implementation first, with no if at all
and see if using EMPTY_STRING_VALUE is better after.
Also, the changeset remplace several usages of
this.value = Arrays.copyOf(value, size);
by
this.value = new char[len];
System.arraycopy(value, 0, this.value, 0, len);
I think it's not a good idea, Arrays.copyOf is recognized as an intrinsics
and avoid to initialize the array with zeroes. I know that c2 is able
to transform new + arraycopy to copyOf but I don't think c1 does that.
String(StringBuilder) and String(StringBuffer) can be simplified
by using length() and getValue() (AbstractStringBuilder.getValue())
to extract the info.
this.value = Arrays.copyOf(builder.getValue(), builder.length());
and avoid to create a temporary String.
String(char value[], boolean share) should be
String(char[] value, boolean share) and is there a perf reason to
not use a static method like createSharedString() instead.
Removing String(int offset, int count, char value[]) will create trouble,
I've seen several libraries that use it by reflection to avoid to create
a copy of the array. I think this method should not disappear
but use Arrays.copyOfRange if the offset is not 0.
This method should be marked deprecated, and removed in Java 9.
Recently, _getChars(char[], int) was replaced by
getChars(char[],int). It was a stupid change because now
one can think that getChar(char[], int) and getChar(int,int,char[],int)
do the same things. but getChat(char[],int) don't do any bounds check.
So concat() and toCharArray() doesn't do any bound check !
toCharArray() should use Arrays.copyOf()
Overloads of encode in StringCoding are in my opinion not necessary
because each method is called once. So this doesn't really share code
but just add one level to the depth of the call stack.
cheers,
Rémi
On 05/25/2012 02:08 PM, Alan Bateman wrote:
> On 24/05/2012 21:45, Mike Duigou wrote:
>> Hello all;
>>
>> For a long time preparations and planing have been underway to remove
>> the offset and count fields from java.lang.String. These two fields
>> enable multiple String instances to share the same backing character
>> buffer. Shared character buffers were an important optimization for
>> old benchmarks but with current real world code and benchmarks it's
>> actually better to not share backing buffers. Shared char array
>> backing buffers only "win" with very heavy use of String.substring.
>> The negatively impacted situations can include parsers and compilers
>> however current testing shows that overall this change is beneficial.
>>
>> The current plan is to commit this change to jdk8 before build 40
>> followed by jdk7u6. Removing offset/count is a complicated process
>> because HotSpot has special case code for String that depends upon
>> the two fields. Vladimir Kozlov has committed the required HotSpot
>> changes already
>> (http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/8f972594effc).
>>
>> A patch of the changes to java.lang.String for JDK 8 are
>> at<http://cr.openjdk.java.net/~mduigou/6924259/0/webrev/>. The
>> changes for JDK 7 have only superficial differences (line offsets in
>> the patch).
>>
>> Comments are welcome.
>>
>> Mike
> I went through the latest webrev and don't see anything obviously
> wrong. EMPTY_STRING_VALUE is new, at least to me, as I don't think it
> was there when removing these fields was prototyped a few years ago.
>
> A minor point in the String(char[]) constructor is that you could move
> the "int len = .." to the top of the method to be consistent with
> other places where a local holds the value.length.
>
> The "use name version for caching" comment in StringCoding.java might
> be confusing to readers, maybe "use charset name as this is faster due
> to caching".
>
> A non-material comment is that there are a couple of style changes
> that I found annoying. Everyone is an expert on such matters, I'm not,
> but the one that bugged me a bit was adding the space after the cast, eg:
> (toffset > (long) value.length - len)
> I found myself needing to re-read it to see if the cast applied to
> value.length or value.length-len. It's a minor point but you know what
> I mean.
>
> -Alan.
More information about the core-libs-dev
mailing list