RFR: 8012665: CharSequence.chars, CharSequence.codePoints

Mon Apr 29 13:00:16 PDT 2013

Sure, micro-optimization doesn't necessarily buy much, and might be useless
with a sufficiently smart JIT and micro-benchmarking is known to be
difficult and surprising.  Nevertheless, we who have been writing
performance-critical code in core libraries have developed a style that
hotspot is known to love.

The biggest question for me is why length() is being called repeatedly.
 Can't you just copy the length once?  Since it's an interface call, it
might not always be easy for the JIT to inline.

Untested, but I would write this in bonkers-for-performance style like this:

    public default IntStream codePoints3() {
        class CodePointIterator implements PrimitiveIterator.OfInt {
            int cur = 0;
            final int length = length();

            @Override
            public void forEachRemaining(IntConsumer block) {
                final int length = this.length;
                for (int i = cur; i < length;) {
                    char c = charAt(i++);
                    if (!isHighSurrogate(c))
                        block.accept(c);
                    else {
                        forEachRemainingSupplementary(block, i);
                        break;
                    }
                }
                cur = length;
            }

            private void forEachRemainingSupplementary(IntConsumer block,
                                                       int i) {
                int cp;
                for (i--; i < length; i += charCount(cp)) {
                    cp = codePointAt(CharSequence.this, cur);
                    block.accept(cp);
                }
            }

            public boolean hasNext() {
                return cur < length;
            }

            public int nextInt() {
                if (cur >= length) {
                    throw new NoSuchElementException();
                }
                char c = charAt(cur++);
                return isHighSurrogate(c) ? nextIntSupplementary(c) : c;
            }

            private int nextIntSupplementary(char highSurrogate) {
                if (cur < length) {
                    char c = charAt(cur);
                    if (isLowSurrogate(c)) {
                        cur++;
                        return toCodePoint(highSurrogate, c);
                    }
                }
                return highSurrogate;
            }
        }

        return StreamSupport.intStream(() ->
                Spliterators.spliteratorUnknownSize(
                        new CodePointIterator(),
                        Spliterator.ORDERED),
                Spliterator.SUBSIZED | Spliterator.SIZED |
Spliterator.ORDERED);
    }

On Mon, Apr 29, 2013 at 9:47 AM, Henry Jen <henry.jen at oracle.com> wrote:

> Hi Martin,
>
> Thanks for the comment, I looked at this when I first saw a similar
> comment in the code, and didn't change it because the charCount() is a
> small operation. The code is just,
>
> > codePoint >= MIN_SUPPLEMENTARY_CODE_POINT ? 2 : 1;
>
> Another reason I didn't change it is to avoid repeated code.
>
> I suspect there is much to gain. We can follow up this in a separate
> issue and get this version in first before feature freeze?
>
> Cheers,
> Henry
>
> On 04/25/2013 04:14 PM, Martin Buchholz wrote:
> > I think core library code should write the slightly lower-level code for
> > performance
> >
> > +                int cp = Character.codePointAt(CharSequence.this, cur);
> > +                cur += Character.charCount(cp);
> >
> > int length = length();
> > if (cur == length) throw NSEE;
> > char c1 = charAt(cur++), c2;
> > if (!isHighSurrogate(c1) || cur == length || !isLowSurrogate(c2 =
> > charAt(cur))
> >   return c1;
> > cur++;
> > return toCodePoint(c1, c2);
> >
> >
> >
> >
> > On Thu, Apr 25, 2013 at 1:25 PM, Henry Jen <henry.jen at oracle.com
> > <mailto:henry.jen at oracle.com>> wrote:
> >
> >     Hi,
> >
> >     Please review two default methods add to CharSequence returns
> IntStream
> >     of char value or code point value.
> >
> >     http://cr.openjdk.java.net/~henryjen/tl/8012665.0/webrev/
> >
> >     The synchronization test is relieved so lambda and other synthetic
> >     method is not tested. If synchronization is needed for those two
> default
> >     methods, subclass should override the methods.
> >
> >     With charAt and codePointAt properly synchronized, the default
> >     implementation is sufficient. However as noted, if the sequence is
> >     mutated while the stream is being read, the result is undefined.
> >
> >     Cheers,
> >     Henry
> >
> >
>
>