String.subSequence and CR#6924259: Remove offset and count fields from java.lang.String

Mike Duigou mike.duigou at oracle.com
Tue Jun 26 18:10:41 UTC 2012


On Jun 26 2012, at 07:13 , Martin Desruisseaux wrote:

> If String.substring(int, int) now performs a copy of the underlying char[] array and if there is no String.subSequence(int, int) providing the old functionality, maybe the following implications should be investigated?
> 
> 
> StringBuilder.append(...)
> --------------------
> Since, in order to avoid a useless array copy, the users may be advised to replace the following pattern:
> 
>      StringBuilder.append(string.substring(lower, upper));
> by:
>      StringBuilder.append(string, lower, upper);

This would seem to be a good refactoring regardless of the substring implementation as it avoids creation of a temporary object.

> 
> would it be worth to add a special-case in the AbstractStringBuilder.append(CharSequence, int, int) implementation for the String case in order to reach the efficiency of the AbstractStringBuilder.append(String) method? The later copies the data with a single call to System.arraycopy, as opposed to the former which invoke CharSequence.charAt(int) in a loop.

I think a microbenchmark to compare StringBuilder.append(string.substring(lower, upper)) with AbstractStringBuilder.append.append(CharSequence, int, int) would help. I wouldn't be surprised if the later is faster when a substring has to be created but slower when the string is an existing string.

> 
> Integer.parseInt(...)
> ----------------
> There was a thread one years ago about allowing Integer.parseInt(String) to accept a CharSequence.
> 
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-April/thread.html#9801
> 
> One invoked reason was performance, since the cost of calling CharSequence.toString() has been measured with the NetBeans profiler as significant (assuming that the CharSequence is not already a String) when reading large ASCII files. Now if the new String.substring(...) implementation copies the internal array, we may expect a performance cost similar to StringBuilder.toString(). Would it be worth to revisit the Integer.parseInt(String) case - and similar methods in other wrapper classes - for allowing CharSequence input?

Probably. 

>    Martin
> 
> 
> 
> Le 23/06/12 00:15, Mike Duigou a écrit :
>> I've made a test implementation of subSequence() utilizing an inner class with offset and count fields to try to understand all the parts that would be impacted. My observations thus far:
>> 
>> - The specification of the subSequence() method is currently too specific. It says that the result is a subString(). This would no longer be true. Hopefully nobody assumed that this meant they could cast the result to String. I know, why would you if you can just call subString() instead? I've learned to assume that somebody somewhere does always does the most unexpected thing.
>> - The CharSequences returned by subSequence would follow only the general CharSequence rules for equals()/hashCode(). Any current usages of the result of subSequence for equals() or hashing, even though it's not advised, would break. We could add equals() and hashCode() implementations to the CharSequence returned but they would probably be expensive.
>> - In general I wonder if parsers will be satisfied with a CharSequence that only implements identity equals().
>> - I also worry about applications that currently do use subSequence currently and which will fail when the result is not a String instance as String.equals() will return false for all CharSequences that aren't Strings. ie. CharSequence token =ine.subSequence(line, start, end); if (keyword.equals(token)) ... This would now fail.
>> 
>> At this point I wonder if this is a feature worth pursuing.
>> 
>> Mike
> 




More information about the core-libs-dev mailing list