Proposed: StringUTF16 bug fix with optimization - Part 1 of 2, StringUTF16 Patch

Wed Mar 31 21:20:55 UTC 2021

Hi Chris,

Thank you for your contribution. I believe this can be divided into two 
parts, one is the bug in the current implementation, and the other is 
the enhancement to refactor the whole implementation for performance. I 
have created two JIRA issues for each:

https://bugs.openjdk.java.net/browse/JDK-8264544
https://bugs.openjdk.java.net/browse/JDK-8264545

The former one is really an embarassing one that I would want to fix 
right away. I think the latter one need more eyes. Comments from others 
are welcome.

Naoto

On 3/30/21 4:44 PM, Chris Johnson wrote:
> Historically, the methods currently known as "compareToCI" and
> "regionMatchesCI", and located in "java.lang.StringUTF16", have lacked
> support for Supplementary Multilingual Plane code-points. (I've seen no
> associated bug.)
> 
> On July 23, 2020 the first fix for the bug was committed. However, it
> includes two simple bugs of its own. They're not much more than typos,
> but they break some things nonetheless, as demonstrated by the unit
> tests comprising part 2 of this contribution.
> 
> (Those two bugs: In "StringUTF16.compareToCIImpl", change statements
> "cp1 -= cp1;" and "cp2 -= cp2;" to, respectively, "cp1 = -cp1;" and
> "cp2 = -cp2;", and those bugs are history.)
> 
> The fixed "compareToCI" and "regionMatchesCI" methods in this patch are
> different, however.[1] Notably:
> 
>    1. Case-insensitive UTF-16 string comparison and equality-testing is
>       2.2 to 2.9 times faster than the current methods, depending on data
>       set composition. (So, meaningfully faster, but the degree varies
>       with the strings compared.)
> 
>    2. 100% TestNG unit test coverage.
> 
>    3. A utility method, "compareCodePointsIgnoringCase", created for these
>       methods increased their performance by 24%, and could, in the future,
>       be applied easily to "StringLatin1" methods "compareToCI*", and
>       "regionMatchesCI*". (My guess is the performance gain there will be
>       smaller, but still useful.)
> 
> This patch to "StringUTF16" may appear quite large. For better or worse
> (your call, gentle reader), the vast majority of it is comments. The code
> itself is minimal by comparison.
> 
> Thanks in advance for your consideration,
> 
> ----Chris
> 
> Chris W. Johnson
> chriswjohnson.jdk at gmail.com
> http://www.panojohnson.com/
> 
> 
> Footnote:
> 
> 1. Work on this code began around 4 years ago, when I failed to find a
>     bug report link on the OpenJDK page, and supposed the message to devs
>     might be something like "contribute or go away". Since then, life,
>     death, work, learning to build & test OpenJDK, registering as a
>     contributor, social anxiety, unit test development, benchmarking,
>     experimentation, and being unable to look at my own code without
>     seeing *something* to improve (especially considering it might end-up
>     in the JDK), are among the reasons for the long delay in attempting
>     this contribution.
> 
> 
> 
> 
> 
> 
> Index: src/java.base/share/classes/java/lang/StringUTF16.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>US-ASCII
> ===================================================================
> diff --git a/src/java.base/share/classes/java/lang/StringUTF16.java
> b/src/java.base/share/classes/java/lang/StringUTF16.java
> --- a/src/java.base/share/classes/java/lang/StringUTF16.java    (revision
> 60819:ee1d592a9f5389725a0338a4b5dfcf4fc3fcf20c)
> +++ b/src/java.base/share/classes/java/lang/StringUTF16.java    (revision
> 60819+:ee1d592a9f53+)
> @@ -41,7 +41,32 @@
>   import static java.lang.String.LATIN1;
> 
>   final class StringUTF16 {
> -
> +
> +    /**
> +     * Number of low-order bits storing the payload of Unicode surrogate
> code-units.
> +     * (The number of surrogate header bits is {@link Character#SIZE}
> minus this value.)
> +     */
> +    private static final int    SURROGATE_PAYLOAD_BIT_COUNT = 10;
> +
> +    /**
> +     * The high six bits common to all high-surrogate code-units (bits
> 15-10) right-shifted into bits
> +     * 5-0 of an "{@code int}" primitive. ({@code 0b1101_10**_****_****
>>>> 10 == 0b11_0110 == 0x36})
> +     */
> +    private static final int    SURROGATE_HI_HEADER_RIGHT_SHIFTED_10 =
> Character.MIN_HIGH_SURROGATE >>> SURROGATE_PAYLOAD_BIT_COUNT;  // == 0x36
> +
> +    /**
> +     * Produces a Supplementary Multilingual Plane code-point when added
> to a valid surrogate-pair
> +     * combined as follows: {@code (highSurrogate <<
> SURROGATE_PAYLOAD_BIT_COUNT) + lowSurrogate}.
> +     * The result is undefined if either would-be surrogate is invalid
> (not a surrogate code-unit,
> +     * or the wrong surrogate type [low instead of high, or vice versa]).
> +     *
> +     * @see Character#toCodePoint(char, char)
> +     */
> +    private static final int    SURROGATE_COMBO_TO_CODE_POINT_DELTA =
>   Character.MIN_SUPPLEMENTARY_CODE_POINT
> +                                                                    -
> (Character.MIN_HIGH_SURROGATE << SURROGATE_PAYLOAD_BIT_COUNT)
> +                                                                    -
>   Character.MIN_LOW_SURROGATE;  // -56_613_888
> +
> +
>       public static byte[] newBytesFor(int len) {
>           if (len < 0) {
>               throw new NegativeArraySizeException();
> @@ -318,93 +343,429 @@
>           return -StringLatin1.compareToUTF16(other, value, len2, len1);
>       }
> 
> -    public static int compareToCI(byte[] value, byte[] other) {
> -        return compareToCIImpl(value, 0, length(value), other, 0,
> length(other));
> +    /**
> +     * Compares case-insensitively UTF-16 sequences in {@code byte}
> arrays. Performs
> +     * {@linkplain #compareCodePointsIgnoringCase case conversions
> equivalent to
> +     *
> <code>Character.toLowerCase(Character.toUpperCase(codePoint))</code>} before
> +     * comparing unequal code-points.
> +     *
> +     * @param charsA array of byte pairs representing 16-bit ({@code
> char}) quantities,
> +     *               with byte order mandated by {@code
> StringUTF16.isBigEndian()}
> +     * @param charsB array of byte pairs representing 16-bit ({@code
> char}) quantities,
> +     *               with byte order mandated by {@code
> StringUTF16.isBigEndian()}
> +     *
> +     * @return negative if {@code charsA} is lexicographically less than
> {@code charsB},
> +     * positive if it is greater, or zero if they are equal
> +     */
> +    static int compareToCI
> +    (
> +        final byte[]            charsA,
> +        final byte[]            charsB
> +    ){
> +    //  Consider: If the G1 garbage collector's string-deduplication mode
> is enabled, performance may
> +    //            be improved by first testing "charsA" and "charsB" for
> reference equality, and
> +    //            returning zero in that case. However, the same is
> equally true for all invocations
> +    //            of "String.compareToIgnoreCase" (not just those
> operating on two UTF-16 strings),
> +    //            so the test belongs in that method (ideally with magic
> causing the array reference
> +    //            equality test to occur only when G1 and its
> string-deduplication mode are active).
> +    //                Similarly modifying methods
> "String.equalsIgnoreCase" and "String.equals" may be
> +    //            even more beneficial, because they already include
> "String" object reference equality
> +    //            fast-paths, unlike "String.compareToIgnoreCase".
> +    //                Indeed, string-deduplication makes reference
> equality more likely between backing-
> +    //            store arrays than "String" objects, and therefore the
> more important fast-path test,
> +    //            provided real-world use of those methods frequently
> involves "String" objects whose
> +    //            tenure (number of garbage-collections survived) is >=
> the deduplication threshold (3
> +    //            by default, as of JDK 15, or the
> "-XX:StringDeduplicationAgeThreshold" value).
> +
> +        final int               lengthSignum = charsA.length -
> charsB.length;
> +
> +    //  If the array lengths are identical, return the "compareToCI"
> result; no additional
> +    //  considerations apply.
> +
> +        if (lengthSignum == 0)
> +            return compareToCI(charsA, 0, charsB, 0, length(charsA));
> +
> +    //  The array lengths differ. Compare the elements present in both
> "charsA" and "charsB".
> +    //  If the comparison result is inequality, return the comparison
> result. If the result
> +    //  is equality, the shorter array is considered less than the longer
> array.
> +
> +        final int               compareSignum = compareToCI(charsA, 0,
> charsB, 0, length(lengthSignum <= 0 ? charsA : charsB));
> +
> +    //  The right-shift (division by two) applied to "lengthSignum" below
> is necessary only
> +    //  if compatibility is required with code expecting specific return
> values (instead of
> +    //  any negative or positive value) from known comparisons.
> +
> +        return compareSignum != 0 ? compareSignum : lengthSignum >> 1;
>       }
> -
> -    private static int compareToCIImpl(byte[] value, int toffset, int tlen,
> -                                      byte[] other, int ooffset, int olen)
> {
> -        int tlast = toffset + tlen;
> -        int olast = ooffset + olen;
> -        assert toffset >= 0 && ooffset >= 0;
> -        assert tlast <= length(value);
> -        assert olast <= length(other);
> -
> -        for (int k1 = toffset, k2 = ooffset; k1 < tlast && k2 < olast;
> k1++, k2++) {
> -            int cp1 = (int)getChar(value, k1);
> -            int cp2 = (int)getChar(other, k2);
> -
> -            if (cp1 == cp2 || compareCodePointCI(cp1, cp2) == 0) {
> +
> +    /**
> +     * Compares case-insensitively UTF-16 sequences in {@code byte} array
> subsections.
> +     * Performs {@linkplain #compareCodePointsIgnoringCase case
> conversions equivalent
> +     * to
> <code>Character.toLowerCase(Character.toUpperCase(codePoint))</code>} before
> +     * comparing unequal code-points.
> +     *
> +     * @param charsA      array of byte pairs representing 16-bit ({@code
> char}) quantities,
> +     *                    with byte order mandated by {@code
> StringUTF16.isBigEndian()}
> +     * @param charsAIndex index, in 16-bit quantities, at which comparison
> of "{@code charsA}"
> +     *                    begins. Caller must enforce constraints "{@code
> 0 <= charsAIndex}"
> +     *                    and "{@code 0 <= charsAIndex + totalChars <=
> charsA.length / 2}".
> +     * @param charsB      array of byte pairs representing 16-bit ({@code
> char}) quantities,
> +     *                    with byte order mandated by {@code
> StringUTF16.isBigEndian()}
> +     * @param charsBIndex index, in 16-bit quantities, at which comparison
> of "{@code charsB}"
> +     *                    begins. Caller must enforce constraints "{@code
> 0 <= charsBIndex}"
> +     *                    and "{@code 0 <= charsBIndex + totalChars <=
> charsB.length / 2}".
> +     * @param totalChars  maximum number of {@code char}s to compare.
> Caller must enforce
> +     *                    constraint "{@code 0 <= totalChars}".
> +     *
> +     * @return negative if the compared portion of {@code charsA} is
> lexicographically less
> +     * than that of {@code charsB}, positive if it is greater, or zero if
> they are equal
> +     *
> +     * @implNote <p>This method relies on the following "<b>Unicode
> Empirical Properties</b>":</p>
> +     * <ol>
> +     *     <li>
> +     *         <p>Only code-points encoded using the same UTF-16
> high-surrogate may be case-insensitively equal.</p>
> +     *         <p>Several facts suggest sufficient proximity of code-point
> case-variants within the Supplementary
> +     *         Multilingual Plane (SMP):</p>
> +     *         <ul>
> +     *             <li>Related code-points are grouped into a Unicode
> "block".</li>
> +     *             <li>Blocks start at multiples of 256.</li>
> +     *             <li>High-surrogates encode 1,024 code-points starting
> with multiples of 1,024.</li>
> +     *             <li>Few SMP code-points are case-sensitive (450, as of
> Unicode 13.0.0).</li>
> +     *         </ul>
> +     *         <p>Thus, it is not surprising this property has held for 25
> years, as of February, 2021.</p>
> +     *     <li><p>For all SMP code-points, both {@code
> Character.toLowerCase(Character.toUpperCase(int))} and
> +     *         {@code Character.toUpperCase(int)} return a SMP code-point,
> never a Basic Multilingual Plane (BMP)
> +     *         code-point.</p>
> +     *         <p>Supporting points:</p>
> +     *         <ul>
> +     *             <li>This is expected due to related code-point
> grouping.</li>
> +     *             <li>If this property were ever invalidated, every
> Unicode case-insensitive equality test
> +     *             which requires equal-length input sequences (measured
> in {@code byte}s or {@code char}s)
> +     *             would break. In other words, this property is already
> utilized more-or-less universally by
> +     *             Unicode case-insensitive equality tests, whether or not
> their authors realize it. The
> +     *             Unicode Consortium has likely been aware of, and
> sensitive to, this issue for years.
> +     *             </li>
> +     *         </ul>
> +     *     </li>
> +     *     <li><p>For all BMP code-points, both {@code
> Character.toLowerCase(Character.toUpperCase(int))} and
> +     *         {@code Character.toUpperCase(int)} return a BMP code-point,
> never a SMP code-point.</p>
> +     *         <p>Supporting points:</p>
> +     *         <ul>
> +     *             <li>This is expected due to related code-point
> grouping.</li>
> +     *             <li>Most scripts affected by letter case were added
> early in Unicode's history, and therefore
> +     *             placed in the BMP. (As of Unicode 13.0.0, there are
> 2,349 case-sensitive code-points in the
> +     *             BMP, but only 450 in the SMP.)
> +     *             </li>
> +     *             <li>If this property were ever invalidated, every
> Unicode case-insensitive equality test
> +     *             which requires equal-length input sequences (measured
> in {@code byte}s or {@code char}s)
> +     *             would break. In other words, this property is already
> utilized more-or-less universally by
> +     *             Unicode case-insensitive equality tests, whether or not
> their authors realize it. The
> +     *             Unicode Consortium has likely been aware of, and
> sensitive to, this issue for years.
> +     *             </li>
> +     *         </ul>
> +     *     </li>
> +     * </ol>
> +     * <p>Validation of each property is performed automatically by unit
> tests in {@code CompareIC}
> +     * ("test/jdk/java/lang/String/CompareIC.java") during normal,
> post-build JDK testing. Those unit tests
> +     * methods are, respectively:
> +     * </p>
> +     * <ol>
> +     *     <li>{@code
> validatePremise_AllSMPCodePointCaseVariantsUseTheSameHighSurrogate}</li>
> +     *     <li>{@code
> validatePremise_AllSMPCodePointCaseVariantsAreSMPCodePoints}</li>
> +     *     <li>{@code
> validatePremise_AllBMPCodePointCaseVariantsAreBMPCodePoints}</li>
> +     * </ol>
> +     * <p>Those tests' Javadocs include discussion of the consequences
> should they ever fail.
> +     * </p>
> +     */
> +    private static int compareToCI
> +    (
> +        final byte[] charsA, int charsAIndex,
> +        final byte[] charsB, int charsBIndex, int totalChars
> +    ){
> +    //  Assertions carried-over from the previous "regionMatchesCI"
> implementation, with additional
> +    //  negative value testing. (Are these assertions still necessary or
> desirable?)
> +
> +        assert (charsAIndex | charsBIndex | totalChars) >= 0;  // Ensure
> these parameters are non-negative.
> +        assert  charsAIndex + totalChars <= length(charsA);
> +        assert  charsBIndex + totalChars <= length(charsB);
> +
> +    //  Variable "priorExactlyEqualChar" (PEEC) is used by the following
> loop to store the "char" value read
> +    //  by its prior iteration, if the same "char" value was read from
> both inputs ("charsA" and "charsB").
> +    //  Otherwise, it is set to zero. PEEC's value is used only during the
> UTF-16 decoding attempt triggered
> +    //  by reading a pair of unequal low-surrogates (see Unicode empirical
> property no. 1). It removes from
> +    //  this algorithm all read-behind/-ahead operations, along with their
> extra tests & branches, and
> +    //  redundant array accesses.
> +    //      Note: PEEC can be zero for three reasons: (1) the loop is in
> its first iteration, (2) the prior
> +    //  iteration read code-point zero from both inputs, or (3) the prior
> iteration read different, but case-
> +    //  insensitively equal, Basic Multilingual Plane code-points. In all
> three cases, zero signifies absence
> +    //  of preceding, identical high-surrogates (as does any
> non-high-surrogate value). Thus, a zero value's
> +    //  cause is ambiguous, but its meaning is clear.
> +
> +        int                     priorExactlyEqualChar = 0;  // AKA "PEEC".
> +
> +        while (--totalChars >= 0) {
> +            int                     cA = getChar(charsA, charsAIndex++);
> +            int                     cB = getChar(charsB, charsBIndex++);
> +
> +            if (cA == cB) {
> +
> +            //  If the next iteration's "cA" and "cB" are different
> low-surrogate code-units, it will need this
> +            //  iteration's value of "cA"/"cB" (their presumed
> high-surrogate code-unit), so it is stored in
> +            //  "priorExactlyEqualChar" (PEEC). In all other cases, this
> caching of "cA"/"cB" in PEEC is a small
> +            //  waste of time. However, the time saved by eliminating the
> tests, branches and reads necessary to
> +            //  reacquire pairs of high-surrogates should compensate for
> many unnecessary writes to PEEC.
> +
> +                priorExactlyEqualChar = cA;
>                   continue;
>               }
> -
> -            // Check for supplementary characters case
> -            cp1 = codePointIncluding(value, cp1, k1, toffset, tlast);
> -            if (cp1 < 0) {
> -                k1++;
> -                cp1 -= cp1;
> -            }
> -            cp2 = codePointIncluding(other, cp2, k2, ooffset, olast);
> -            if (cp2 < 0) {
> -                k2++;
> -                cp2 -= cp2;
> -            }
> -
> -            int diff = compareCodePointCI(cp1, cp2);
> -            if (diff != 0) {
> -                return diff;
> -            }
> -        }
> -        return tlen - olen;
> -    }
> -
> -    // Case insensitive comparison of two code points
> -    private static int compareCodePointCI(int cp1, int cp2) {
> -        // try converting both characters to uppercase.
> -        // If the results match, then the comparison scan should
> -        // continue.
> -        cp1 = Character.toUpperCase(cp1);
> -        cp2 = Character.toUpperCase(cp2);
> -        if (cp1 != cp2) {
> -            // Unfortunately, conversion to uppercase does not work
> properly
> -            // for the Georgian alphabet, which has strange rules about
> case
> -            // conversion.  So we need to make one last check before
> -            // exiting.
> -            cp1 = Character.toLowerCase(cp1);
> -            cp2 = Character.toLowerCase(cp2);
> -            if (cp1 != cp2) {
> -                return cp1 - cp2;
> +
> +        //  KNOWN: "cA" and "cB" are not precisely equal.
> +
> +        //  When "priorExactlyEqualChar" is a high-surrogate, and both
> "cA" and "cB" are low-surrogates, the
> +        //  following "if" block performs UTF-16 decoding, replacing the
> code-units in "cA" and "cB" with
> +        //  code-points. The "if" block then falls-through to the
> code-point case-insensitive comparison.
> +        //
> +        //  If any UTF-16 decoding precondition is unmet, "cA" and "cB"
> will contain Basic Multilingual Plane
> +        //  (BMP) code-points suitable for case-insensitive comparison,
> unless UTF-16 encoding is corrupt in
> +        //  "charsA" and/or "charsB". The following cases are possible:
> +        //      * "priorExactlyEqualChar" is not a high-surrogate, so the
> "if" block is not entered.
> +        //         Possible states of "cA" and "cB":
> +        //          * "cA" and "cB" are unequal BMP code-points,
> potentially case-insensitively equal.
> +        //          * "cA" and "cB" are unequal high-surrogates.
> Case-insensitive equality testing inevitably
> +        //            fails for those code-units, terminating this method.
> Low-surrogate retrieval and UTF-16
> +        //            decoding are unnecessary due to Unicode empirical
> property no. 1.
> +        //          * "cA" and "cB" are unequal low-surrogates. UTF-16
> encoding is corrupt in both "charsA"
> +        //            and "charsB" (low-surrogates not preceded by
> high-surrogates). Case-insensitive equality
> +        //            testing inevitably fails for those code-units,
> terminating this method.
> +        //          * Either "cA" or "cB" is a low-surrogate, while the
> other is a high-surrogate or BMP code-
> +        //            point. UTF-16 encoding is corrupt in the array
> supplying the low-surrogate (no preceding
> +        //            high-surrogate). Case-insensitive equality testing
> inevitably fails, terminating this
> +        //            method.
> +        //      * "priorExactlyEqualChar" is a high-surrogate present in
> both "charsA" and "charsB".
> +        //        Possible states of "cA" and "cB":
> +        //          * "cA" and "cB" are unequal low-surrogates. The "if"
> block is entered, converting "cA"
> +        //            and "cB" into Supplementary Multilingual Plane (SMP)
> code-points, potentially case-
> +        //            insensitively equal.
> +        //          * "cA" and "cB" are unequal high-surrogates. UTF-16
> encoding is corrupt in both "charsA"
> +        //            and "charsB" (high-surrogates following
> high-surrogates). Even if the next loop
> +        //            iteration would retrieve low-surrogates, Unicode
> empirical property no. 1 rules-out
> +        //            case-insensitive equality based on the unequal
> high-surrogates alone.
> +        //          * "cA" and "cB" are unequal BMP code-points,
> potentially case-insensitively equal. UTF-16
> +        //            encoding is corrupt in both "charsA" and "charsB"
> (high-surrogates not followed by low-
> +        //            surrogates), but the corruption is ignored because
> the high-surrogates were identical
> +        //            (equality exists thus far, corrupt encoding
> notwithstanding).
> +        //          * Either "cA" or "cB" is a low-surrogate, while the
> other is a high-surrogate or BMP code-
> +        //            point. UTF-16 encoding is corrupt in the array
> supplying the non-low-surrogate (high-
> +        //            surrogate not followed by low-surrogate). UTF-16
> decoding using the lone low-surrogate
> +        //            is unnecessary, because either:
> +        //              1. The non-low-surrogate is a high-surrogate,
> ruling-out case-insensitive equality.
> +        //                 (Two independent rationales: [1] A
> high-surrogate is not a low-surrogate, by
> +        //                 definition. Also by definition, [2] an SMP
> code-point--decoded from high-surrogate
> +        //                 "priorExactlyEqualChar" and the lone
> low-surrogate--is not a high-surrogate code-
> +        //                 unit.)
> +        //              2. The non-low-surrogate is a BMP code-point.
> Unicode empirical properties 2 and 3
> +        //                 rule-out case-insensitive equality between SMP
> and BMP code-points.
> +        //                     (Alternately, if those properties did not
> exist, deciding equality exists based
> +        //                 on code-points at different positions within
> the compared string sections--whose
> +        //                 comparison requires overlooking an inconvenient
> preceding code-unit in one string--
> +        //                 would be logically dubious on multiple grounds.
> For instance, if one preceding
> +        //                 orphan surrogate can be ignored to prove
> equality, is the same true for two or more
> +        //                 orphans consecutively and/or separately? If so,
> existing code breaks. Notably, most
> +        //                 string equality tests would break due to
> invalidation of their equal-length strings
> +        //                 precondition. [A precondition also invalidated
> if the Unicode Consortium ever
> +        //                 asserts case-insensitive equality between a BMP
> and SMP code-point. Thus, Unicode
> +        //                 empirical properties 2 and 3 will likely
> persist.])
> +        //
> +        //  (Aside: Of the following three tests, only the test of
> "priorExactlyEqualChar" would be necessary
> +        //  if Unicode [non-compact] "String" objects guaranteed valid
> UTF-16 encoding. It seems certain
> +        //  enforcing that guarantee during creation of every "String" [or
> string-like object] is infeasible
> +        //  due to runtime cost, and backwards-incompatibility.
> Nonetheless, this writer [CWJ] is tantalized
> +        //  by the many code simplifications guaranteed-correct "String"
> objects would permit, as well as the
> +        //  clues they could confer on some developers [including this
> writer, once upon a time].)
> +
> +        //  Branch Reduction
> +        //
> +        //  The "if" statement used below has two tests/branches, instead
> of three, due to combining two
> +        //  low-surrogate tests into one. Multiple comparative benchmarks
> (each using 300 random data sets
> +        //  of 200,000 comparisons, repeated 800 times [160 million
> comparisons per data set; 48 billion
> +        //  total comparisons]), run on two different OS & hardware
> combinations, yielded respective best-
> +        //  case improvements of 1.24% and 0.98%, and mean improvements of
> 1.79% and 0.62%, relative to
> +        //  three-branch "if" statement:
> +        //
> +        //      if (priorExactlyEqualChar >>> SURROGATE_PAYLOAD_BIT_COUNT
> == SURROGATE_HI_HEADER_RIGHT_SHIFTED_10
> +        //      &&  cA                    >>> SURROGATE_PAYLOAD_BIT_COUNT
> == SURROGATE_LO_HEADER_RIGHT_SHIFTED_10
> +        //      &&  cB                    >>> SURROGATE_PAYLOAD_BIT_COUNT
> == SURROGATE_LO_HEADER_RIGHT_SHIFTED_10)
> +        //
> +        //  Conversely, combining all three tests into one yields worse
> performance than the three-test
> +        //  "if" shown above, because high-surrogate testing of
> "priorExactlyEqualChar" fails frequently
> +        //  (50% minimum failure rate for well-formed UTF-16), and, when
> it fails, low-surrogate testing
> +        //  of "cA" and "cB" wastes time.
> +
> +            if (priorExactlyEqualChar >>> SURROGATE_PAYLOAD_BIT_COUNT ==
> SURROGATE_HI_HEADER_RIGHT_SHIFTED_10
> +            && (cA ^ Character.MIN_LOW_SURROGATE | cB ^
> Character.MIN_LOW_SURROGATE) >>> SURROGATE_PAYLOAD_BIT_COUNT == 0
> +            ){
> +            //  KNOWN: "cA" and "cB" are not precisely equal.
> +            //  KNOWN: "priorExactlyEqualChar" is a high-surrogate
> code-unit.
> +            //  KNOWN: "cA" and "cB" are low-surrogate code-units. (Bits
> 15 to 10 of both are 110111.)
> +
> +            //  Perform UTF-16 decoding, accelerated by repurposing
> "priorExactlyEqualChar" (PEEC) to store
> +            //  the result of decoding operations common to both "cA" and
> "cB" (because they share the high-
> +            //  surrogate in PEEC). Near-total commonality of operations
> enables decoding two code-points
> +            //  using only one more addition operation than decoding a
> single code-point. (PEEC's repurposing
> +            //  is brief; it is always set to zero immediately after this
> "if" block.)
> +
> +                priorExactlyEqualChar = (priorExactlyEqualChar <<
> SURROGATE_PAYLOAD_BIT_COUNT) + SURROGATE_COMBO_TO_CODE_POINT_DELTA;
> +
> +                cA += priorExactlyEqualChar;
> +                cB += priorExactlyEqualChar;
>               }
> +
> +        //  "cA" and "cB" are not precisely equal, therefore set
> "priorExactlyEqualChar" (PEEC) to zero for
> +        //  the next iteration's potential benefit. ("Potential benefit,"
> because the next iteration, if any,
> +        //  uses PEEC's value only when its "cA" and "cB" are different
> low-surrogate code-units.)
> +
> +            priorExactlyEqualChar = 0;
> +
> +        //  Having performed UTF-16 decoding (if necessary and possible),
> compare case-insensitively code-
> +        //  points "cA" and "cB".
> +        //
> +        //  Note: Code-units are also possible. Normal terminating
> conditions include unequal high-surrogates
> +        //        (see Unicode empirical property no. 1), and a
> high-surrogate/code-point combination. An
> +        //        abnormal terminating condition is a low-surrogate and a
> code-point, meaning at least one
> +        //        input string's UTF-16 encoding is corrupt (a
> low-surrogate not preceded by a high-surrogate,
> +        //        or matching high-surrogates not followed--in one
> string--by a low-surrogate). None of those
> +        //        conditions undermines the code below, because
> "compareCodePointsIgnoringCase" evaluates
> +        //        "caseless" inputs unchanged, including code-units.
> +
> +            cA = compareCodePointsIgnoringCase(cA, cB);
> +            if (cA != 0)
> +                return cA;
>           }
> +
>           return 0;
>       }
> -
> -    // Returns a code point from the code unit pointed by "index". If it is
> -    // not a surrogate or an unpaired surrogate, then the code unit is
> -    // returned as is. Otherwise, it is combined with the code unit before
> -    // or after, depending on the type of the surrogate at index, to make a
> -    // supplementary code point. The return value will be negated if the
> code
> -    // unit pointed by index is a high surrogate, and index + 1 is a low
> surrogate.
> -    private static int codePointIncluding(byte[] ba, int cp, int index,
> int start, int end) {
> -        // fast check
> -        if (!Character.isSurrogate((char)cp)) {
> -            return cp;
> -        }
> -        if (Character.isLowSurrogate((char)cp)) {
> -            if (index > start) {
> -                char c = getChar(ba, index - 1);
> -                if (Character.isHighSurrogate(c)) {
> -                    return Character.toCodePoint(c, (char)cp);
> -                }
> -            }
> -        } else if (index + 1 < end) { // cp == high surrogate
> -            char c = getChar(ba, index + 1);
> -            if (Character.isLowSurrogate(c)) {
> -                // negate the code point
> -                return - Character.toCodePoint((char)cp, c);
> -            }
> -        }
> -        return cp;
> +
> +    /**
> +     * Gets the case-insensitive, lexicographical relationship between
> code-points.
> +     * Though not required, passing this method only unequal code-points
> ("{@code
> +     * codePointA != codePointB}") is most efficient.
> +     *
> +     * @param codePointA first code-point
> +     * @param codePointB second code-point
> +     *
> +     * @return "{@code codePointA - codePointB}" after both undergo
> case-conversion equivalent
> +     * to, but faster than, {@code
> Character.toLowerCase(Character.toUpperCase(codePoint))}
> +     *
> +     * @implNote This method uses two, duplicate {@code switch}
> expressions, because
> +     * benchmarking showed a noticeable performance improvement over a
> single {@code
> +     * switch} wrapped in the most minimal of two-iteration loops. The
> duplication
> +     * should create no maintenance problems, because, as detailed by a
> comment within
> +     * the method, the {@code switch} source code is programmatically
> generated when a
> +     * "CompareIC" unit test detects a mishandled "triple-case"
> code-point. Otherwise,
> +     * modifying these expressions is unnecessary. (The unit test does not
> alter this,
> +     * or any, file; it merely includes updated {@code switch} source in
> its error
> +     * message, leaving a human responsible for pasting it here, twice.)
> +     */
> +    static int compareCodePointsIgnoringCase
> +    (
> +        final int               codePointA,
> +        final int               codePointB
> +    ){
> +    /*  A code-point is here dubbed "triple-case" if "codePoint !=
> Character.toUpperCase(codePoint) &&
> +        codePoint !=
> Character.toLowerCase(Character.toUpperCase(codePoint))". Such code-points
> are rare
> +        (see "switch" expressions below), but force most code-point
> comparisons to invoke both "toUpperCase"
> +        and "toLowerCase". If "triple-case" code-points did not exist,
> invoking only "toLowerCase" would
> +        suffice.
> +
> +        This method comes close to realizing the "no triple-case
> code-points" scenario by directly converting
> +        "triple-case" code-points to their final, lowercase forms using
> "switch" expressions, and invoking
> +        only "toLowerCase" on all other code-points. The effect is a
> benchmark performance gain of 24.3%.
> +
> +        Note: The Georgian alphabet includes no "triple-case" code-points,
> as shown below by the "progression"
> +              columns of the "switch" expressions' comments. (As of
> Unicode 13.0.0, there are three Georgian
> +              blocks: "Georgian" 10A0-10FF, "Georgian Extended" 1C90-1CBF,
> and "Georgian Supplement" 2D00-2D2F.)
> +
> +
> +        TESTING AND CODE GENERATION
> +
> +        A unit test in "CompareIC"
> ("test/jdk/java/lang/String/CompareIC.java") searches all 1,114,112 Unicode
> +        code-points/-units for triple-case code-points, then validates
> their handling by "String.equalsIgnoreCase"
> +        and "String.compareToIgnoreCase". If errors are detected, those
> tests fail with a message detailing each
> +        error and supplying freshly generated source code for the "switch"
> expression used twice below. (If the
> +        new and existing set of "case" clauses differ, pasting the former
> over the latter will fix the problem.
> +        If "case" clauses are identical, the problem lies elsewhere.)
> +    */
> +                                          //  Triple-Case Code-Points, as
> of Java 15.0.2. (Written by
> "java.lang.CompareIC.generateTripleCaseCodePointsSwitchExpression".)
> +                                          //
> +                                          //  Code-Point Progression
>     Code-Point Progression by Name
> +        return switch (codePointA) {      //  --------------------------
> 
>   -------------------------------------------------------------------------------------------------------------------------------------
> +            case '\u00B5' -> '\u03BC';    //  U+00B5 -> U+039C -> U+03BC
>     MICRO SIGN                                            -> GREEK CAPITAL
> LETTER MU               -> GREEK SMALL LETTER MU
> +            case '\u0131' -> '\u0069';    //  U+0131 -> U+0049 -> U+0069
>     LATIN SMALL LETTER DOTLESS I                          -> LATIN CAPITAL
> LETTER I                -> LATIN SMALL LETTER I
> +            case '\u017F' -> '\u0073';    //  U+017F -> U+0053 -> U+0073
>     LATIN SMALL LETTER LONG S                             -> LATIN CAPITAL
> LETTER S                -> LATIN SMALL LETTER S
> +            case '\u01C5' -> '\u01C6';    //  U+01C5 -> U+01C4 -> U+01C6
>     LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON -> LATIN CAPITAL
> LETTER DZ WITH CARON    -> LATIN SMALL LETTER DZ WITH CARON
> +            case '\u01C8' -> '\u01C9';    //  U+01C8 -> U+01C7 -> U+01C9
>     LATIN CAPITAL LETTER L WITH SMALL LETTER J            -> LATIN CAPITAL
> LETTER LJ               -> LATIN SMALL LETTER LJ
> +            case '\u01CB' -> '\u01CC';    //  U+01CB -> U+01CA -> U+01CC
>     LATIN CAPITAL LETTER N WITH SMALL LETTER J            -> LATIN CAPITAL
> LETTER NJ               -> LATIN SMALL LETTER NJ
> +            case '\u01F2' -> '\u01F3';    //  U+01F2 -> U+01F1 -> U+01F3
>     LATIN CAPITAL LETTER D WITH SMALL LETTER Z            -> LATIN CAPITAL
> LETTER DZ               -> LATIN SMALL LETTER DZ
> +            case '\u0345' -> '\u03B9';    //  U+0345 -> U+0399 -> U+03B9
>     COMBINING GREEK YPOGEGRAMMENI                         -> GREEK CAPITAL
> LETTER IOTA             -> GREEK SMALL LETTER IOTA
> +            case '\u03C2' -> '\u03C3';    //  U+03C2 -> U+03A3 -> U+03C3
>     GREEK SMALL LETTER FINAL SIGMA                        -> GREEK CAPITAL
> LETTER SIGMA            -> GREEK SMALL LETTER SIGMA
> +            case '\u03D0' -> '\u03B2';    //  U+03D0 -> U+0392 -> U+03B2
>     GREEK BETA SYMBOL                                     -> GREEK CAPITAL
> LETTER BETA             -> GREEK SMALL LETTER BETA
> +            case '\u03D1' -> '\u03B8';    //  U+03D1 -> U+0398 -> U+03B8
>     GREEK THETA SYMBOL                                    -> GREEK CAPITAL
> LETTER THETA            -> GREEK SMALL LETTER THETA
> +            case '\u03D5' -> '\u03C6';    //  U+03D5 -> U+03A6 -> U+03C6
>     GREEK PHI SYMBOL                                      -> GREEK CAPITAL
> LETTER PHI              -> GREEK SMALL LETTER PHI
> +            case '\u03D6' -> '\u03C0';    //  U+03D6 -> U+03A0 -> U+03C0
>     GREEK PI SYMBOL                                       -> GREEK CAPITAL
> LETTER PI               -> GREEK SMALL LETTER PI
> +            case '\u03F0' -> '\u03BA';    //  U+03F0 -> U+039A -> U+03BA
>     GREEK KAPPA SYMBOL                                    -> GREEK CAPITAL
> LETTER KAPPA            -> GREEK SMALL LETTER KAPPA
> +            case '\u03F1' -> '\u03C1';    //  U+03F1 -> U+03A1 -> U+03C1
>     GREEK RHO SYMBOL                                      -> GREEK CAPITAL
> LETTER RHO              -> GREEK SMALL LETTER RHO
> +            case '\u03F5' -> '\u03B5';    //  U+03F5 -> U+0395 -> U+03B5
>     GREEK LUNATE EPSILON SYMBOL                           -> GREEK CAPITAL
> LETTER EPSILON          -> GREEK SMALL LETTER EPSILON
> +            case '\u1C80' -> '\u0432';    //  U+1C80 -> U+0412 -> U+0432
>     CYRILLIC SMALL LETTER ROUNDED VE                      -> CYRILLIC
> CAPITAL LETTER VE            -> CYRILLIC SMALL LETTER VE
> +            case '\u1C81' -> '\u0434';    //  U+1C81 -> U+0414 -> U+0434
>     CYRILLIC SMALL LETTER LONG-LEGGED DE                  -> CYRILLIC
> CAPITAL LETTER DE            -> CYRILLIC SMALL LETTER DE
> +            case '\u1C82' -> '\u043E';    //  U+1C82 -> U+041E -> U+043E
>     CYRILLIC SMALL LETTER NARROW O                        -> CYRILLIC
> CAPITAL LETTER O             -> CYRILLIC SMALL LETTER O
> +            case '\u1C83' -> '\u0441';    //  U+1C83 -> U+0421 -> U+0441
>     CYRILLIC SMALL LETTER WIDE ES                         -> CYRILLIC
> CAPITAL LETTER ES            -> CYRILLIC SMALL LETTER ES
> +            case '\u1C84' -> '\u0442';    //  U+1C84 -> U+0422 -> U+0442
>     CYRILLIC SMALL LETTER TALL TE                         -> CYRILLIC
> CAPITAL LETTER TE            -> CYRILLIC SMALL LETTER TE
> +            case '\u1C85' -> '\u0442';    //  U+1C85 -> U+0422 -> U+0442
>     CYRILLIC SMALL LETTER THREE-LEGGED TE                 -> CYRILLIC
> CAPITAL LETTER TE            -> CYRILLIC SMALL LETTER TE
> +            case '\u1C86' -> '\u044A';    //  U+1C86 -> U+042A -> U+044A
>     CYRILLIC SMALL LETTER TALL HARD SIGN                  -> CYRILLIC
> CAPITAL LETTER HARD SIGN     -> CYRILLIC SMALL LETTER HARD SIGN
> +            case '\u1C87' -> '\u0463';    //  U+1C87 -> U+0462 -> U+0463
>     CYRILLIC SMALL LETTER TALL YAT                        -> CYRILLIC
> CAPITAL LETTER YAT           -> CYRILLIC SMALL LETTER YAT
> +            case '\u1C88' -> '\uA64B';    //  U+1C88 -> U+A64A -> U+A64B
>     CYRILLIC SMALL LETTER UNBLENDED UK                    -> CYRILLIC
> CAPITAL LETTER MONOGRAPH UK  -> CYRILLIC SMALL LETTER MONOGRAPH UK
> +            case '\u1E9B' -> '\u1E61';    //  U+1E9B -> U+1E60 -> U+1E61
>     LATIN SMALL LETTER LONG S WITH DOT ABOVE              -> LATIN CAPITAL
> LETTER S WITH DOT ABOVE -> LATIN SMALL LETTER S WITH DOT ABOVE
> +            case '\u1FBE' -> '\u03B9';    //  U+1FBE -> U+0399 -> U+03B9
>     GREEK PROSGEGRAMMENI                                  -> GREEK CAPITAL
> LETTER IOTA             -> GREEK SMALL LETTER IOTA
> +                                          //  --------------------------
> 
>   -------------------------------------------------------------------------------------------------------------------------------------
> +        //  All other case-sensitive code-points are either uppercase (in
> which case they are changed
> +        //  below), or lowercase already (in which case they are not).
> Therefore, only "toLowerCase"
> +        //  is necessary. Code-units and case-insensitive code-points are
> unchanged by "toLowerCase".
> +            default -> Character.toLowerCase(codePointA);
> +
> +        } - switch (codePointB) {         //  --------------------------
> 
>   -------------------------------------------------------------------------------------------------------------------------------------
> +            case '\u00B5' -> '\u03BC';    //  U+00B5 -> U+039C -> U+03BC
>     MICRO SIGN                                            -> GREEK CAPITAL
> LETTER MU               -> GREEK SMALL LETTER MU
> +            case '\u0131' -> '\u0069';    //  U+0131 -> U+0049 -> U+0069
>     LATIN SMALL LETTER DOTLESS I                          -> LATIN CAPITAL
> LETTER I                -> LATIN SMALL LETTER I
> +            case '\u017F' -> '\u0073';    //  U+017F -> U+0053 -> U+0073
>     LATIN SMALL LETTER LONG S                             -> LATIN CAPITAL
> LETTER S                -> LATIN SMALL LETTER S
> +            case '\u01C5' -> '\u01C6';    //  U+01C5 -> U+01C4 -> U+01C6
>     LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON -> LATIN CAPITAL
> LETTER DZ WITH CARON    -> LATIN SMALL LETTER DZ WITH CARON
> +            case '\u01C8' -> '\u01C9';    //  U+01C8 -> U+01C7 -> U+01C9
>     LATIN CAPITAL LETTER L WITH SMALL LETTER J            -> LATIN CAPITAL
> LETTER LJ               -> LATIN SMALL LETTER LJ
> +            case '\u01CB' -> '\u01CC';    //  U+01CB -> U+01CA -> U+01CC
>     LATIN CAPITAL LETTER N WITH SMALL LETTER J            -> LATIN CAPITAL
> LETTER NJ               -> LATIN SMALL LETTER NJ
> +            case '\u01F2' -> '\u01F3';    //  U+01F2 -> U+01F1 -> U+01F3
>     LATIN CAPITAL LETTER D WITH SMALL LETTER Z            -> LATIN CAPITAL
> LETTER DZ               -> LATIN SMALL LETTER DZ
> +            case '\u0345' -> '\u03B9';    //  U+0345 -> U+0399 -> U+03B9
>     COMBINING GREEK YPOGEGRAMMENI                         -> GREEK CAPITAL
> LETTER IOTA             -> GREEK SMALL LETTER IOTA
> +            case '\u03C2' -> '\u03C3';    //  U+03C2 -> U+03A3 -> U+03C3
>     GREEK SMALL LETTER FINAL SIGMA                        -> GREEK CAPITAL
> LETTER SIGMA            -> GREEK SMALL LETTER SIGMA
> +            case '\u03D0' -> '\u03B2';    //  U+03D0 -> U+0392 -> U+03B2
>     GREEK BETA SYMBOL                                     -> GREEK CAPITAL
> LETTER BETA             -> GREEK SMALL LETTER BETA
> +            case '\u03D1' -> '\u03B8';    //  U+03D1 -> U+0398 -> U+03B8
>     GREEK THETA SYMBOL                                    -> GREEK CAPITAL
> LETTER THETA            -> GREEK SMALL LETTER THETA
> +            case '\u03D5' -> '\u03C6';    //  U+03D5 -> U+03A6 -> U+03C6
>     GREEK PHI SYMBOL                                      -> GREEK CAPITAL
> LETTER PHI              -> GREEK SMALL LETTER PHI
> +            case '\u03D6' -> '\u03C0';    //  U+03D6 -> U+03A0 -> U+03C0
>     GREEK PI SYMBOL                                       -> GREEK CAPITAL
> LETTER PI               -> GREEK SMALL LETTER PI
> +            case '\u03F0' -> '\u03BA';    //  U+03F0 -> U+039A -> U+03BA
>     GREEK KAPPA SYMBOL                                    -> GREEK CAPITAL
> LETTER KAPPA            -> GREEK SMALL LETTER KAPPA
> +            case '\u03F1' -> '\u03C1';    //  U+03F1 -> U+03A1 -> U+03C1
>     GREEK RHO SYMBOL                                      -> GREEK CAPITAL
> LETTER RHO              -> GREEK SMALL LETTER RHO
> +            case '\u03F5' -> '\u03B5';    //  U+03F5 -> U+0395 -> U+03B5
>     GREEK LUNATE EPSILON SYMBOL                           -> GREEK CAPITAL
> LETTER EPSILON          -> GREEK SMALL LETTER EPSILON
> +            case '\u1C80' -> '\u0432';    //  U+1C80 -> U+0412 -> U+0432
>     CYRILLIC SMALL LETTER ROUNDED VE                      -> CYRILLIC
> CAPITAL LETTER VE            -> CYRILLIC SMALL LETTER VE
> +            case '\u1C81' -> '\u0434';    //  U+1C81 -> U+0414 -> U+0434
>     CYRILLIC SMALL LETTER LONG-LEGGED DE                  -> CYRILLIC
> CAPITAL LETTER DE            -> CYRILLIC SMALL LETTER DE
> +            case '\u1C82' -> '\u043E';    //  U+1C82 -> U+041E -> U+043E
>     CYRILLIC SMALL LETTER NARROW O                        -> CYRILLIC
> CAPITAL LETTER O             -> CYRILLIC SMALL LETTER O
> +            case '\u1C83' -> '\u0441';    //  U+1C83 -> U+0421 -> U+0441
>     CYRILLIC SMALL LETTER WIDE ES                         -> CYRILLIC
> CAPITAL LETTER ES            -> CYRILLIC SMALL LETTER ES
> +            case '\u1C84' -> '\u0442';    //  U+1C84 -> U+0422 -> U+0442
>     CYRILLIC SMALL LETTER TALL TE                         -> CYRILLIC
> CAPITAL LETTER TE            -> CYRILLIC SMALL LETTER TE
> +            case '\u1C85' -> '\u0442';    //  U+1C85 -> U+0422 -> U+0442
>     CYRILLIC SMALL LETTER THREE-LEGGED TE                 -> CYRILLIC
> CAPITAL LETTER TE            -> CYRILLIC SMALL LETTER TE
> +            case '\u1C86' -> '\u044A';    //  U+1C86 -> U+042A -> U+044A
>     CYRILLIC SMALL LETTER TALL HARD SIGN                  -> CYRILLIC
> CAPITAL LETTER HARD SIGN     -> CYRILLIC SMALL LETTER HARD SIGN
> +            case '\u1C87' -> '\u0463';    //  U+1C87 -> U+0462 -> U+0463
>     CYRILLIC SMALL LETTER TALL YAT                        -> CYRILLIC
> CAPITAL LETTER YAT           -> CYRILLIC SMALL LETTER YAT
> +            case '\u1C88' -> '\uA64B';    //  U+1C88 -> U+A64A -> U+A64B
>     CYRILLIC SMALL LETTER UNBLENDED UK                    -> CYRILLIC
> CAPITAL LETTER MONOGRAPH UK  -> CYRILLIC SMALL LETTER MONOGRAPH UK
> +            case '\u1E9B' -> '\u1E61';    //  U+1E9B -> U+1E60 -> U+1E61
>     LATIN SMALL LETTER LONG S WITH DOT ABOVE              -> LATIN CAPITAL
> LETTER S WITH DOT ABOVE -> LATIN SMALL LETTER S WITH DOT ABOVE
> +            case '\u1FBE' -> '\u03B9';    //  U+1FBE -> U+0399 -> U+03B9
>     GREEK PROSGEGRAMMENI                                  -> GREEK CAPITAL
> LETTER IOTA             -> GREEK SMALL LETTER IOTA
> +                                          //  --------------------------
> 
>   -------------------------------------------------------------------------------------------------------------------------------------
> +        //  All other case-sensitive code-points are either uppercase (in
> which case they are changed
> +        //  below), or lowercase already (in which case they are not).
> Therefore, only "toLowerCase"
> +        //  is necessary. Code-units and case-insensitive code-points are
> unchanged by "toLowerCase".
> +            default -> Character.toLowerCase(codePointB);
> +        };
>       }
> 
>       public static int compareToCI_Latin1(byte[] value, byte[] other) {
> @@ -780,10 +1141,35 @@
>           }
>           return new String(result, UTF16);
>       }
> -
> -    public static boolean regionMatchesCI(byte[] value, int toffset,
> -                                          byte[] other, int ooffset, int
> len) {
> -        return compareToCIImpl(value, toffset, len, other, ooffset, len)
> == 0;
> +
> +    /**
> +     * Tests case-insensitive equality of UTF-16 sequences in {@code byte}
> array subsections.
> +     * Performs {@linkplain #compareCodePointsIgnoringCase case
> conversions equivalent to
> +     *
> <code>Character.toLowerCase(Character.toUpperCase(codePoint))</code>}
> before comparing
> +     * unequal code-points.
> +     *
> +     * @param charsA      array of byte pairs representing 16-bit ({@code
> char}) quantities,
> +     *                    with byte order mandated by {@code
> StringUTF16.isBigEndian()}
> +     * @param charsAIndex index, in 16-bit quantities, at which comparison
> of "{@code charsA}"
> +     *                    begins. Caller must enforce constraints "{@code
> 0 <= charsAIndex}"
> +     *                    and "{@code 0 <= charsAIndex + totalChars <=
> charsA.length / 2}".
> +     * @param charsB      array of byte pairs representing 16-bit ({@code
> char}) quantities,
> +     *                    with byte order mandated by {@code
> StringUTF16.isBigEndian()}
> +     * @param charsBIndex index, in 16-bit quantities, at which comparison
> of "{@code charsB}"
> +     *                    begins. Caller must enforce constraints "{@code
> 0 <= charsBIndex}"
> +     *                    and "{@code 0 <= charsBIndex + totalChars <=
> charsB.length / 2}".
> +     * @param totalChars  maximum number of {@code char}s to compare.
> Caller must enforce
> +     *                    constraint "{@code 0 <= totalChars}".
> +     *
> +     * @return {@code true} if the tested portions of {@code charsA} and
> {@code charsB} are
> +     * case-insensitively equal, otherwise {@code false}
> +     */
> +    static boolean regionMatchesCI
> +    (
> +        final byte[] charsA, final int charsAIndex,
> +        final byte[] charsB, final int charsBIndex, final int totalChars
> +    ){
> +        return compareToCI(charsA, charsAIndex, charsB, charsBIndex,
> totalChars) == 0;
>       }
> 
>       public static boolean regionMatchesCI_Latin1(byte[] value, int toffset,
>