String.subSequence and CR#6924259: Remove offset and count fields from java.lang.String
Martin Desruisseaux
martin.desruisseaux at geomatys.fr
Tue Jun 26 14:13:22 UTC 2012
If String.substring(int, int) now performs a copy of the underlying
char[] array and if there is no String.subSequence(int, int) providing
the old functionality, maybe the following implications should be
investigated?
StringBuilder.append(...)
--------------------
Since, in order to avoid a useless array copy, the users may be advised
to replace the following pattern:
StringBuilder.append(string.substring(lower, upper));
by:
StringBuilder.append(string, lower, upper);
would it be worth to add a special-case in the
AbstractStringBuilder.append(CharSequence, int, int) implementation for
the String case in order to reach the efficiency of the
AbstractStringBuilder.append(String) method? The later copies the data
with a single call to System.arraycopy, as opposed to the former which
invoke CharSequence.charAt(int) in a loop.
Integer.parseInt(...)
----------------
There was a thread one years ago about allowing Integer.parseInt(String)
to accept a CharSequence.
http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-April/thread.html#9801
One invoked reason was performance, since the cost of calling
CharSequence.toString() has been measured with the NetBeans profiler as
significant (assuming that the CharSequence is not already a String)
when reading large ASCII files. Now if the new String.substring(...)
implementation copies the internal array, we may expect a performance
cost similar to StringBuilder.toString(). Would it be worth to revisit
the Integer.parseInt(String) case - and similar methods in other wrapper
classes - for allowing CharSequence input?
Martin
Le 23/06/12 00:15, Mike Duigou a écrit :
> I've made a test implementation of subSequence() utilizing an inner class with offset and count fields to try to understand all the parts that would be impacted. My observations thus far:
>
> - The specification of the subSequence() method is currently too specific. It says that the result is a subString(). This would no longer be true. Hopefully nobody assumed that this meant they could cast the result to String. I know, why would you if you can just call subString() instead? I've learned to assume that somebody somewhere does always does the most unexpected thing.
> - The CharSequences returned by subSequence would follow only the general CharSequence rules for equals()/hashCode(). Any current usages of the result of subSequence for equals() or hashing, even though it's not advised, would break. We could add equals() and hashCode() implementations to the CharSequence returned but they would probably be expensive.
> - In general I wonder if parsers will be satisfied with a CharSequence that only implements identity equals().
> - I also worry about applications that currently do use subSequence currently and which will fail when the result is not a String instance as String.equals() will return false for all CharSequences that aren't Strings. ie. CharSequence token =ine.subSequence(line, start, end); if (keyword.equals(token)) ... This would now fail.
>
> At this point I wonder if this is a feature worth pursuing.
>
> Mike
More information about the core-libs-dev
mailing list