RFR: 8077242: Add default method CharSequence.getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)
Hi everyone! The String class has getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin) which is used to efficiently extract substrings, avoiding unnecessary copying. The AbstractStringBuilder has the method with the same signature. I propose to add a default implementation of this method to the CharSequence interface. It can used, for example, in AbstractStringBuilder.append(CharSequence s, int start, int end) instead of the for-loop. This, in turn, allows to simplify AbstractStringBuilder.append(CharSequence s). In the webrev, I also included a few optimizations of the kind: stringBuilder.append(string.substring(from, to)) => stringBuilder.append(string, from, to). The later variant becomes more effective after the proposed fix. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8077242 WEBREV: http://cr.openjdk.java.net/~igerasim/8077242/00/webrev/ If people agree that this change is useful, I'll file a CCC request. Sincerely yours, Ivan
On 09/05/2015 17:03, Ivan Gerasimov wrote:
Hi everyone!
The String class has getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin) which is used to efficiently extract substrings, avoiding unnecessary copying.
This has come up a few times, here's the last thread (and patch) that I could find: http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-April/015889.html
Thank you Alan for the pointer! I marked my bug as yet another duplicate of JDK-6813523. It's not clear, why Martin's fix hadn't been pushed then. Martin can you recollect if there were any concerns? Sincerely yours, Ivan On 09.05.2015 19:14, Alan Bateman wrote:
On 09/05/2015 17:03, Ivan Gerasimov wrote:
Hi everyone!
The String class has getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin) which is used to efficiently extract substrings, avoiding unnecessary copying.
This has come up a few times, here's the last thread (and patch) that I could find:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-April/015889.html
On Sat, May 9, 2015 at 10:07 AM, Ivan Gerasimov <ivan.gerasimov@oracle.com> wrote:
Thank you Alan for the pointer!
I marked my bug as yet another duplicate of JDK-6813523. It's not clear, why Martin's fix hadn't been pushed then. Martin can you recollect if there were any concerns?
It's absolutely true that I dropped the ball on this in jdk8, discouraged by David's message here: http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-May/017174.html No one seemed to want to tackle the issue that caling getChars with an evil charsequence could result in the char[] being retained with possible nefarious consequences. David (or others), do you have an opinion on what we *should* do, if anything? Should we be writing ugliferous code of the form if (charSequence.getClass().getClassLoader() == null) /* trusted */ use getChars() (High level: I feel that trying to have untrusted code coexist safely in the same process with trusted code hasn't really worked out for Java)
Sincerely yours, Ivan
On 09.05.2015 19:14, Alan Bateman wrote:
On 09/05/2015 17:03, Ivan Gerasimov wrote:
Hi everyone!
The String class has getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin) which is used to efficiently extract substrings, avoiding unnecessary copying.
This has come up a few times, here's the last thread (and patch) that I could find:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-April/015889.html
Thanks Martin for quick reply! On 10.05.2015 1:17, Martin Buchholz wrote:
Martin can you recollect if there were any concerns?
It's absolutely true that I dropped the ball on this in jdk8, discouraged by David's message here: http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-May/017174.html
Ah, yes. I see the problem. I'm not going to suggest a general solution, but would it make sense to at least create an overloaded method AbstractStringBuilder.append(String str, int start, int end)? Here's a draft webrev for this proposal: BUGURL: https://bugs.openjdk.java.net/browse/JDK-8077242 WEBREV: http://cr.openjdk.java.net/~igerasim/8077242/01/webrev/ Sincerely yours, Ivan
On 05/10/2015 01:40 AM, Ivan Gerasimov wrote:
Thanks Martin for quick reply!
On 10.05.2015 1:17, Martin Buchholz wrote:
Martin can you recollect if there were any concerns?
It's absolutely true that I dropped the ball on this in jdk8, discouraged by David's message here: http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-May/017174.html
Ah, yes. I see the problem. I'm not going to suggest a general solution, but would it make sense to at least create an overloaded method AbstractStringBuilder.append(String str, int start, int end)?
Here's a draft webrev for this proposal:
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8077242 WEBREV: http://cr.openjdk.java.net/~igerasim/8077242/01/webrev/
Sincerely yours, Ivan
Hi Ivan and others, I think it would be nice to give users back an official way to handle sub-strings as views without copying the characters. After offset & length were dropped from String, String.substring and String.subSequence started copying characters. That's ok for small sub-sequences, but unacceptable for large. If there was an explicit API that allowed old behaviour it could be used in scenarios where this has performance advantage. Here's what I'm thinking about: http://cr.openjdk.java.net/~plevart/jdk9-dev/ArrayCharSequence/webrev.01/ I know that this somehow conflicts with the efforts of JEP 254: Compact Strings - as it assumes that internal String.value character array can be shared between a String and an ArrayCharSequence, but above proposal could be tweaked to accommodate that too: instead of ArrayCharSequence(String, ...) constructors, there could be an internal or public implementation of a special ConstantLengthCharSequence that was used for the purpose of String sub-sequence views only. In that case the ArrayCharSequence as a public class could be dropped all together and new public String method(s) introduced: public CharSequence toCharSequenceView(); // and/or public CharSequence subSequenceView(int start, int end); ...that returned private implementation of a ConstantLengthCharSequence. I wonder how the above proposal when combined with StringJoiner based solution of optimizing String.replace compares with your index-array based specialized solution. Regards, Peter
Peter, Have we considered just embracing CharBuffer.wrap(CharSequence, int, int) work as way to support the old substring behavior? If you dig through the code, that creates a java.nio.StringCharBuffer which is a view of the wrapped string. We could then optimize all of the friend classes to special case CharBuffer and use the bulk get methods. The nice thing is that there should be no way to create a malicious CharBuffer which should fix the concern mentioned here: http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-May/017174.html. Regards, Jason
Hi Ivan and others,
I think it would be nice to give users back an official way to handle sub-strings as views without copying the characters. After offset & length were dropped from String, String.substring and String.subSequence started copying characters. That's ok for small sub-sequences, but unacceptable for large. If there was an explicit API that allowed old behaviour it could be used in scenarios where this has performance advantage. Here's what I'm thinking about:
http://cr.openjdk.java.net/~plevart/jdk9-dev/ArrayCharSequence/webrev.01/
I know that this somehow conflicts with the efforts of JEP 254: Compact Strings - as it assumes that internal String.value character array can be shared between a String and an ArrayCharSequence, but above proposal could be tweaked to accommodate that too: instead of ArrayCharSequence(String, ...) constructors, there could be an internal or public implementation of a special ConstantLengthCharSequence that was used for the purpose of String sub-sequence views only. In that case the ArrayCharSequence as a public class could be dropped all together and new public String method(s) introduced:
public CharSequence toCharSequenceView(); // and/or public CharSequence subSequenceView(int start, int end);
...that returned private implementation of a ConstantLengthCharSequence.
I wonder how the above proposal when combined with StringJoiner based solution of optimizing String.replace compares with your index-array based specialized solution.
Regards, Peter
On 05/28/2015 04:44 PM, Jason Mehrens wrote:
Peter,
Have we considered just embracing CharBuffer.wrap(CharSequence, int, int) work as way to support the old substring behavior? If you dig through the code, that creates a java.nio.StringCharBuffer which is a view of the wrapped string. We could then optimize all of the friend classes to special case CharBuffer and use the bulk get methods. The nice thing is that there should be no way to create a malicious CharBuffer which should fix the concern mentioned here: http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-May/017174.html.
Regards,
Jason
Thanks Jason for pointing me to this. So all we need is specialization in StringBuilder and perhaps StringJoiner (to not make the String from CharSequence immediately if it is a StringCharBuffer). I'll try to prepare that specialization patch to see what it takes... Regards, Peter
Hi Ivan and others,
I think it would be nice to give users back an official way to handle sub-strings as views without copying the characters. After offset & length were dropped from String, String.substring and String.subSequence started copying characters. That's ok for small sub-sequences, but unacceptable for large. If there was an explicit API that allowed old behaviour it could be used in scenarios where this has performance advantage. Here's what I'm thinking about:
http://cr.openjdk.java.net/~plevart/jdk9-dev/ArrayCharSequence/webrev.01/
I know that this somehow conflicts with the efforts of JEP 254: Compact Strings - as it assumes that internal String.value character array can be shared between a String and an ArrayCharSequence, but above proposal could be tweaked to accommodate that too: instead of ArrayCharSequence(String, ...) constructors, there could be an internal or public implementation of a special ConstantLengthCharSequence that was used for the purpose of String sub-sequence views only. In that case the ArrayCharSequence as a public class could be dropped all together and new public String method(s) introduced:
public CharSequence toCharSequenceView(); // and/or public CharSequence subSequenceView(int start, int end);
...that returned private implementation of a ConstantLengthCharSequence.
I wonder how the above proposal when combined with StringJoiner based solution of optimizing String.replace compares with your index-array based specialized solution.
Regards, Peter
participants (5)
-
Alan Bateman
-
Ivan Gerasimov
-
Jason Mehrens
-
Martin Buchholz
-
Peter Levart