RFR: 8367129: Move input validation checks to Java for java.lang.StringLatin1 intrinsics [v2]
Volkan Yazici
vyazici at openjdk.org
Fri Feb 13 18:20:26 UTC 2026
On Fri, 13 Feb 2026 12:46:01 GMT, Raffaello Giulietti <rgiulietti at openjdk.org> wrote:
>> Volkan Yazici has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Apply review feedback
>
> src/java.base/share/classes/java/lang/StringLatin1.java line 174:
>
>> 172: String.checkOffset(len2, length(other));
>> 173: int lim = Math.min(len1, len2);
>> 174: int k = ArraysSupport.mismatch(value, other, lim);
>
> From the `mismatch()` doc it isn't clear that the returned index (if not negative) is the _smallest_ one locating a mismatch.
> Without this guarantee, this `compareTo()` method might return an incorrect value.
Created [JDK-8377903] to address this.
[JDK-8377903]: https://bugs.openjdk.org/browse/JDK-8377903
> src/java.base/share/classes/java/lang/StringLatin1.java line 199:
>
>> 197: // inline_string_compareTo
>> 198: @IntrinsicCandidate
>> 199: private static int compareToUTF16_0(byte[] value, byte[] other) {
>
> I guess the `_` is to visually separate the 16 from the conventional 0?
> Other intrinsics names, e.g., `compareTo0()` do not use a separator.
Correct. I follow the convention of suffixing `@IntrinsicCandidate`-annotated methods with `0`. In this particular case, I thought `compareToUTF160` would be too ambiguous, and hence opted for the `_0` suffix.
> src/java.base/share/classes/java/lang/StringLatin1.java line 242:
>
>> 240: static int compareToCI(byte[] value, byte[] other) {
>> 241: int len1 = value.length; // Implicit null check on `value`
>> 242: int len2 = other.length; // Implicit null check on `other`
>
> Alternatively, if you prefer
>
> Suggestion:
>
> int len1 = value.length; // fail fast on null
> int len2 = other.length; // fail fast on null
The `implicit null check` phrase was recommended by @stuart-marks, and I've been sticking to it since then in [JDK-8156534].
[JDK-8156534]: https://bugs.openjdk.org/browse/JDK-8156534
> src/java.base/share/classes/java/lang/StringLatin1.java line 265:
>
>> 263: static int compareToCI_UTF16(byte[] value, byte[] other) {
>> 264: Objects.requireNonNull(value);
>> 265: Objects.requireNonNull(other);
>
> You could leave these checks out and rely on the following two lines for a fast fail on `null`, similarly to the latin1 case above.
> But adding the checks looks cleaner.
You're right, both `length` and `StringUTF16::length` indirectly check for nullability. But in this particular case, I've leaned towards explicit checks (hence, less cognitive load while looking at the method body) and flagging the invalid input as early as possible in the call stack.
> src/java.base/share/classes/java/lang/StringLatin1.java line 428:
>
>> 426: *
>> 427: * @throws NullPointerException if {@code value} is null
>> 428: * @throws StringIndexOutOfBoundsException if the sub-ranges are out of bounds
>
> Suggestion:
>
> * @throws StringIndexOutOfBoundsException if the sub-range is out of bounds
Fixed in 7efdc944aa3.
> src/java.base/share/classes/java/lang/StringLatin1.java line 508:
>
>> 506: @IntrinsicCandidate
>> 507: private static int indexOf0(byte[] value, int valueToIndex, byte[] str, int strToIndex, int valueFromIndex) {
>> 508: byte first = str[0];
>
> There's no protection in the 5 parameter caller agains an empty (0 length) `str`, so this can throw.
> I didn't look at the intrinsic to verify whether it assumes `str.length` > 0.
Added necessary guards in 7efdc944aa3.
> src/java.base/share/classes/java/lang/StringLatin1.java line 1025:
>
>> 1023:
>> 1024: /**
>> 1025: * Exhaustively copies characters from a Latin-1 string byte array sub-range
>
> Stylistically, why "Exhaustively"?
Indeed redundant. Removed in 7efdc944aa3.
> src/java.base/share/classes/java/lang/StringUTF16.java line 407:
>
>> 405: * @param srcOff the index (inclusive) of the first character in {@code src}
>> 406: * @param dst the target Latin-1 string byte array
>> 407: * @param srcOff the index (inclusive) of the first character in {@code dst}
>
> Suggestion:
>
> * @param dstOff the index (inclusive) of the first character in {@code dst}
Fixed in 7efdc944aa3.
> src/java.base/share/classes/java/lang/StringUTF16.java line 437:
>
>> 435:
>> 436: /**
>> 437: * Copies the prefix of Latin-1 characters from a UTF-16 string byte array
>
> Since there are offsets, strictly speaking this is not a prefix but a general subrange.
_"Copies the prefix"_ was requested by @RogerRiggs: https://github.com/openjdk/jdk/pull/28832/files/6e5273ed7df61479b37e491bc272837cca36b192#r2789099768
Note that we say _"Copies the prefix of ... a ... sub-range"_, since the copy might terminate earlier upon encounter with a non-Latin-1 character in the given sub-range.
> src/java.base/share/classes/java/lang/StringUTF16.java line 499:
>
>> 497:
>> 498: @IntrinsicCandidate
>> 499: static void getChars(byte[] value, int srcBegin, int srcEnd, char[] dst, int dstBegin) {
>
> What about this intrinsic? It doesn't seem to follow the conventions about `private` and a gatekeeper method.
As stated in the PR description:
> Moves input validation checks to Java for `java.lang.StringLatin1` intrinsics. While doing so, affected `java.lang.StringUTF16` methods needed to be updated due to relaxed checks at intrinsics.
That is, `StringUTF16` is only touched if it is affected by an associated `StringLatin1` change.
> src/java.base/share/classes/java/lang/StringUTF16.java line 623:
>
>> 621: Objects.requireNonNull(other);
>> 622: checkOffset(len1, value);
>> 623: String.checkOffset(len2, StringLatin1.length(other));
>
> FYI, all these checks are already performed in the called (non-intrinsic) method.
I was on the fence with these checks — I've removed them in 7efdc944aa3.
> src/java.base/share/classes/java/lang/StringUTF16.java line 726:
>
>> 724: static int compareToCI_Latin1(byte[] value, byte[] other) {
>> 725: Objects.requireNonNull(value);
>> 726: Objects.requireNonNull(other);
>
> FYI, checks are already performed in the called method.
I was on the fence with these checks — I've removed them in 7efdc944aa3.
> src/java.base/share/classes/java/lang/StringUTF16.java line 892:
>
>> 890: }
>> 891:
>> 892: private static int indexOfUnsafe(byte[] value, int valueToIndex, byte[] str, int strToIndex, int valueFromIndex) {
>
> I think this method requires documentation about what it assumes about the arguments.
> While it can be somehow derived from its usages, it would help to have it located here.
Added documentation in 7efdc944aa3.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2804891667
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2804701905
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2804729709
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2804747200
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2805487865
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2805486104
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2805483785
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2805481393
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2805352323
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2805362487
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2805479873
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2805479527
PR Review Comment: https://git.openjdk.org/jdk/pull/28832#discussion_r2805478651
More information about the core-libs-dev
mailing list