FYC: 7197183 : Provide CharSequence.subSequenceView which allows for sub-sequence views of character sequences.
Claes Redestad
claes.redestad at oracle.com
Thu Jul 17 14:53:04 UTC 2014
Hi Mike,
while nicely abstracting the problem, promoting developers to
re-introduce some of the leakiness that the 7u6 String changes helped
remove is something I think we should be wary of. If anything I think
such an object should not be Serializable; maybe even have hashCode()
throw NotSupportedOperationException() to make sure it's only ever used
as a temporary object and prevent abuse? CharSequenceSlice?
The alternative - or existing - approach, which admittedly isn't
pretty, would be to provide methods with offsets such as
Appendable.append(CharSequence, int, int). This avoids need for
substring/subSequence or creating any temporary object altogether. The
idea with an IntSupplier is pretty neat, though, but could be adopted
for the existing offset methods as well.
/Claes
On 07/17/2014 02:09 AM, Mike Duigou wrote:
> Hello all;
>
> In Java 7u6 there was a significant change in the implementation of java.lang.String (JDK-6924259). This was done to reduce the size of String instances and it has been generally regarded as a positive change. As with almost any significant change to a class as core to Java as String there have also been applications negatively impacted. Most of the problems involve applications which make heavy use of String.substring() as sub-string instances now involve creation of their own copies of the backing characters.
>
> There have been previous discussions of mitigations to the 6924259 change in String.substring() behaviour. These discussions haven't come to positive conclusions mostly because they generally require too many changes to the specification or behaviour of String. So here's another proposal (enclosed) that doesn't change the behaviour of any existing classes. It adds two new methods to CharSequence to create sub-sequence views of character sequences. The size of sub-sequence instances very closely matches the size of pre-6924259 String instances and indeed the implementation has the same pre-6924259 limitations, namely that the entire source CharSequence remains alive as long as the sub-sequence is referenced.
>
> Unlike pre-6924259 the CharSubSequenceView can not be reliably compared via equals() to String instances and it is unsuitable for use as a hash map key.
>
> With these benefits and caveats in mind, would you use this?
>
> Mike
>
> diff -r 66f582158e1c src/share/classes/java/lang/CharSequence.java
> --- a/src/share/classes/java/lang/CharSequence.java Wed Jul 16 20:43:53 2014 +0100
> +++ b/src/share/classes/java/lang/CharSequence.java Wed Jul 16 16:58:52 2014 -0700
> @@ -25,11 +25,14 @@
>
> package java.lang;
>
> +import java.io.Serializable;
> import java.util.NoSuchElementException;
> +import java.util.Objects;
> import java.util.PrimitiveIterator;
> import java.util.Spliterator;
> import java.util.Spliterators;
> import java.util.function.IntConsumer;
> +import java.util.function.IntSupplier;
> import java.util.stream.IntStream;
> import java.util.stream.StreamSupport;
>
> @@ -231,4 +234,114 @@
> Spliterator.ORDERED,
> false);
> }
> +
> + /**
> + * Provides a sub-sequence view on a character sequence. Changes in the
> + * source will be reflected in the sub-sequence. The sub-sequence must, at
> + * all times, be a proper sub-sequence of the source character sequence.
> + *
> + * @since 1.9
> + */
> + static final class CharSubSequenceView implements CharSequence, Serializable {
> +
> + private final CharSequence source;
> + private final int fromInclusive;
> + private final IntSupplier toExclusive;
> +
> + CharSubSequenceView(CharSequence source, int fromInclusive, int toExclusive) {
> + this(source, fromInclusive, () -> toExclusive);
> + }
> +
> + CharSubSequenceView(CharSequence source, int fromInclusive, IntSupplier toExclusive) {
> + this.source = Objects.requireNonNull(source);
> + if(fromInclusive < 0 || fromInclusive >= source.length() ||
> + toExclusive.getAsInt() < fromInclusive || toExclusive.getAsInt() > source.length()) {
> + throw new IllegalArgumentException("Invalid index");
> + }
> + this.fromInclusive = fromInclusive;
> + this.toExclusive = toExclusive;
> + }
> +
> + @Override
> + public int length() {
> + return toExclusive.getAsInt() - fromInclusive;
> + }
> +
> + @Override
> + public char charAt(int index) {
> + if(index >= length()) {
> + throw new IllegalArgumentException("Invalid Index");
> + }
> + //
> + return source.charAt(fromInclusive + index);
> + }
> +
> + @Override
> + public CharSequence subSequence(int start, int end) {
> + if (end > length()) {
> + throw new IllegalArgumentException("Invalid Index");
> + }
> + return source.subSequence(fromInclusive + start, fromInclusive + end);
> + }
> +
> + @Override
> + public String toString() {
> + int len = length();
> + char[] chars = new char[len];
> + for(int each = 0; each < len; each++) {
> + chars[each] = charAt(each);
> + }
> + return new String(chars, true);
> + }
> + }
> +
> + /**
> + * Returns as a character sequence the specified sub-sequence view of the
> + * provided source character sequence. Changes in the source will be
> + * reflected in the sub-sequence. The sub-sequence must, at all times, be
> + * a proper sub-sequence of the source character sequence.
> + *
> + * @param source The character sequence from which the sub-sequence is
> + * derived.
> + * @param startInclusive The index of the character in the source character
> + * sequence which will be the first character in the sub-sequence.
> + * @param endExclusive The index after the last the character in the source
> + * character sequence which will be the last character in the sub-sequence
> + * @return the character sub-sequence.
> + * @since 1.9
> + */
> + static CharSequence subSequenceView(CharSequence source, int startInclusive, int endExclusive) {
> + return new CharSubSequenceView(source, startInclusive, endExclusive);
> + }
> +
> + /**
> + * Returns as a character sequence the specified sub-sequence view of the
> + * provided source character sequence. Changes in the source will be
> + * reflected in the sub-sequence. The sub-sequence must, at all times, be
> + * a proper sub-sequence of the source character sequence. This variation
> + * allows for the size of the sub-sequence to vary, usually to follow the
> + * size of a growing character sequence.
> + *
> + * @apiNote The most common usage of this subSequence is to follow changes
> + * in the size of the source.
> + * {@code
> + * StringBuilder source = new StringBuilder("prefix:");
> + * CharSeqence toEnd = CharSequence.subSequence(source, 7, source::length);
> + * }
> + * In this example the value of {@code toEnd} will always be a sub-sequence
> + * of {@code source} but will omit the first 7 characters.
> + *
> + * @param source The character sequence from which the sub-sequence is
> + * derived.
> + * @param startInclusive The index of the character in the source character
> + * sequence which will be the first character in the sub-sequence.
> + * @param endExclusive A supplier which returns the index after the last the
> + * character in the source character sequence which will be the last
> + * character in the sub-sequence
> + * @return the character sub-sequence.
> + * @since 1.9
> + */
> + static CharSequence subSequenceView(CharSequence source, int startInclusive, IntSupplier endExclusive) {
> + return new CharSubSequenceView(source, startInclusive, endExclusive);
> + }
> }
>
More information about the core-libs-dev
mailing list