RFR: 8361842: Validate input in both Java and C++ for java.lang.StringCoding intrinsics
Volkan Yazici
vyazici at openjdk.org
Thu Jul 10 12:58:54 UTC 2025
On Thu, 26 Jun 2025 10:48:37 GMT, Volkan Yazici <vyazici at openjdk.org> wrote:
> Validate input in `java.lang.StringCoding` intrinsic Java wrappers, improve their documentation, enhance the checks in the associated C++ methods, and adapt them to cause VM crash on invalid input.
>
> ## Implementation notes
>
> The goal of the associated umbrella issue [JDK-8156534](https://bugs.openjdk.org/browse/JDK-8156534) is to, for `java.lang.String*` classes,
>
> 1. Move `@IntrinsicCandidate`-annotated `public` methods<sup>1</sup> (in Java code) to `private` ones, and wrap them with a `public` ["front door" method](https://github.com/openjdk/jdk/pull/24982#discussion_r2087493446)
> 2. Since we moved the `@IntrinsicCandidate` annotation to a new method, intrinsic mappings – i.e., associated `do_intrinsic()` calls in `vmIntrinsics.hpp` – need to be updated too
> 3. Add necessary input validation (range, null, etc.) checks to the newly created public front door method
> 4. Place all input validation checks in the intrinsic code (add if missing!) behind a `VerifyIntrinsicChecks` VM flag
>
> Following preliminary work needs to be carried out as well:
>
> 1. Add a new `VerifyIntrinsicChecks` VM flag
> 2. Update `generate_string_range_check` to produce a `HaltNode`. That is, crash the VM if `VerifyIntrinsicChecks` is set and a Java wrapper fails to spot an invalid input.
>
> <sup>1</sup> `@IntrinsicCandidate`-annotated constructors are not subject to this change, since they are a special case.
>
> ## Functional and performance tests
>
> - `tier1` (which includes `test/hotspot/jtreg/compiler/intrinsics/string`) passes on several platforms. Further tiers will be executed after integrating reviewer feedback.
>
> - Performance impact is still actively monitored using `test/micro/org/openjdk/bench/java/lang/String{En,De}code.java`, among other tests. If you have suggestions on benchmarks, please share in the comments.
>
> ## Verification of the VM crash
>
> I've tested the VM crash scenario as follows:
>
> 1. Created the following test program:
>
> public class StrIntri {
> public static void main(String[] args) {
> Exception lastException = null;
> for (int i = 0; i < 1_000_000; i++) {
> try {
> jdk.internal.access.SharedSecrets.getJavaLangAccess().countPositives(new byte[]{1,2,3}, 2, 5);
> } catch (Exception exception) {
> lastException = exception;
> }
> }
> if (lastException != null) {
> lastException.printStackTrace();
> } else {
> System.out.println("completed");
> }
> }
> }
>
> 2. Comp...
src/hotspot/share/opto/library_call.cpp line 963:
> 961:
> 962: if (bailout->req() > 1) {
> 963: if (halt) {
Toggle to force a VM crash to be used to surface intrinsic Java wrappers failing to spot invalid input.
src/java.base/share/classes/java/lang/StringCoding.java line 140:
> 138: *
> 139: * @param sa the source byte array containing characters encoded in UTF-16
> 140: * @param sp the index of the <em>byte (not character!)</em> from the source array to start reading from
Note the `byte (not character!)` emphasis here and below.
src/java.base/share/classes/java/lang/StringCoding.java line 150:
> 148: */
> 149: static int encodeISOArray(byte[] sa, int sp, byte[] da, int dp, int len) {
> 150: checkFromIndexSize(sp, len << 1, requireNonNull(sa, "sa").length, AIOOBE_FORMATTER);
`sa` contains 2-byte `char`s, and `sp` points to an index of this inflated array. Though, `len` denotes the codepoint count, hence the `len << 1` while checking `sp` and `len` bounds.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/25998#discussion_r2197576759
PR Review Comment: https://git.openjdk.org/jdk/pull/25998#discussion_r2197566783
PR Review Comment: https://git.openjdk.org/jdk/pull/25998#discussion_r2197572078
More information about the core-libs-dev
mailing list