FYC: 7197183 : Provide CharSequence.subSequenceView which allows for sub-sequence views of character sequences.

Michael Kay mike at saxonica.com
Thu Jul 17 07:26:22 UTC 2014


In my own product (Saxon) I have a class CharSlice which is pretty much identical to your CharSubSequenceView. So yes, I think it is useful.

Michael Kay
Saxonica
mike at saxonica.com
+44 (0118) 946 5893



On 17 Jul 2014, at 01:09, Mike Duigou <mike.duigou at oracle.com> wrote:

> Hello all;
> 
> In Java 7u6 there was a significant change in the implementation of java.lang.String (JDK-6924259). This was done to reduce the size of String instances and it has been generally regarded as a positive change. As with almost any significant change to a class as core to Java as String there have also been applications negatively impacted. Most of the problems involve applications which make heavy use of String.substring() as sub-string instances now involve creation of their own copies of the backing characters.
> 
> There have been previous discussions of mitigations to the 6924259 change in String.substring() behaviour. These discussions haven't come to positive conclusions mostly because they generally require too many changes to the specification or behaviour of String. So here's another proposal (enclosed) that doesn't change the behaviour of any existing classes. It adds two new methods to CharSequence to create sub-sequence views of character sequences. The size of sub-sequence instances very closely matches the size of pre-6924259 String instances and indeed the implementation has the same pre-6924259 limitations, namely that the entire source CharSequence remains alive as long as the sub-sequence is referenced.
> 
> Unlike pre-6924259 the CharSubSequenceView can not be reliably compared via equals() to String instances and it is unsuitable for use as a hash map key. 
> 
> With these benefits and caveats in mind, would you use this?
> 
> Mike
> 
> diff -r 66f582158e1c src/share/classes/java/lang/CharSequence.java
> --- a/src/share/classes/java/lang/CharSequence.java	Wed Jul 16 20:43:53 2014 +0100
> +++ b/src/share/classes/java/lang/CharSequence.java	Wed Jul 16 16:58:52 2014 -0700
> @@ -25,11 +25,14 @@
> 
> package java.lang;
> 
> +import java.io.Serializable;
> import java.util.NoSuchElementException;
> +import java.util.Objects;
> import java.util.PrimitiveIterator;
> import java.util.Spliterator;
> import java.util.Spliterators;
> import java.util.function.IntConsumer;
> +import java.util.function.IntSupplier;
> import java.util.stream.IntStream;
> import java.util.stream.StreamSupport;
> 
> @@ -231,4 +234,114 @@
>                 Spliterator.ORDERED,
>                 false);
>     }
> +
> +    /**
> +     * Provides a sub-sequence view on a character sequence. Changes in the
> +     * source will be reflected in the sub-sequence. The sub-sequence must, at
> +     * all times, be a proper sub-sequence of the source character sequence.
> +     *
> +     * @since 1.9
> +     */
> +    static final class CharSubSequenceView implements CharSequence, Serializable {
> +
> +        private final CharSequence source;
> +        private final int fromInclusive;
> +        private final IntSupplier toExclusive;
> +
> +        CharSubSequenceView(CharSequence source, int fromInclusive, int toExclusive) {
> +            this(source, fromInclusive, () -> toExclusive);
> +        }
> +
> +        CharSubSequenceView(CharSequence source, int fromInclusive, IntSupplier toExclusive) {
> +            this.source = Objects.requireNonNull(source);
> +            if(fromInclusive < 0 || fromInclusive >= source.length() ||
> +               toExclusive.getAsInt() < fromInclusive || toExclusive.getAsInt() > source.length()) {
> +                throw new IllegalArgumentException("Invalid index");
> +            }
> +            this.fromInclusive = fromInclusive;
> +            this.toExclusive = toExclusive;
> +        }
> +
> +        @Override
> +        public int length() {
> +            return toExclusive.getAsInt() - fromInclusive;
> +        }
> +
> +        @Override
> +        public char charAt(int index) {
> +            if(index >= length()) {
> +                throw new IllegalArgumentException("Invalid Index");
> +            }
> +            //
> +            return source.charAt(fromInclusive + index);
> +        }
> +
> +        @Override
> +        public CharSequence subSequence(int start, int end) {
> +           if (end > length()) {
> +               throw new IllegalArgumentException("Invalid Index");
> +           }
> +           return source.subSequence(fromInclusive + start, fromInclusive + end);
> +        }
> +
> +        @Override
> +        public String toString() {
> +            int len = length();
> +            char[] chars = new char[len];
> +            for(int each = 0; each < len; each++) {
> +                chars[each] = charAt(each);
> +            }
> +            return new String(chars, true);
> +        }
> +    }
> +
> +    /**
> +     * Returns as a character sequence the specified sub-sequence view of the
> +     * provided source character sequence. Changes in the source will be
> +     * reflected in the sub-sequence. The sub-sequence must, at all times, be
> +     * a proper sub-sequence of the source character sequence.
> +     *
> +     * @param source The character sequence from which the sub-sequence is
> +     * derived.
> +     * @param startInclusive The index of the character in the source character
> +     * sequence which will be the first character in the sub-sequence.
> +     * @param endExclusive The index after the last the character in the source
> +     * character sequence which will be the last character in the sub-sequence
> +     * @return the character sub-sequence.
> +     * @since 1.9
> +     */
> +    static CharSequence subSequenceView(CharSequence source, int startInclusive, int endExclusive) {
> +        return new CharSubSequenceView(source, startInclusive, endExclusive);
> +    }
> +
> +    /**
> +     * Returns as a character sequence the specified sub-sequence view of the
> +     * provided source character sequence. Changes in the source will be
> +     * reflected in the sub-sequence. The sub-sequence must, at all times, be
> +     * a proper sub-sequence of the source character sequence. This variation
> +     * allows for the size of the sub-sequence to vary, usually to follow the
> +     * size of a growing character sequence.
> +     *
> +     * @apiNote The most common usage of this subSequence is to follow changes
> +     * in the size of the source.
> +     * {@code
> +     * StringBuilder source = new StringBuilder("prefix:");
> +     * CharSeqence toEnd = CharSequence.subSequence(source, 7, source::length);
> +     * }
> +     * In this example the value of {@code toEnd} will always be a sub-sequence
> +     * of {@code source} but will omit the first 7 characters.
> +     *
> +     * @param source The character sequence from which the sub-sequence is
> +     * derived.
> +     * @param startInclusive The index of the character in the source character
> +     * sequence which will be the first character in the sub-sequence.
> +     * @param endExclusive A supplier which returns the index after the last the
> +     * character in the source character sequence which will be the last
> +     * character in the sub-sequence
> +     * @return the character sub-sequence.
> +     * @since 1.9
> +     */
> +    static CharSequence subSequenceView(CharSequence source, int startInclusive, IntSupplier endExclusive) {
> +        return new CharSubSequenceView(source, startInclusive, endExclusive);
> +    }
> }
> 




More information about the core-libs-dev mailing list