RFR: 8361842: Move input validation checks to Java for String-related intrinsics

Volkan Yazici vyazici at openjdk.org
Fri Jul 11 08:40:40 UTC 2025


On Thu, 10 Jul 2025 20:54:24 GMT, John R Rose <jrose at openjdk.org> wrote:

>> Validate input in `java.lang.StringCoding` intrinsic Java wrappers, improve their documentation, enhance the checks in the associated IR or assembly code, and adapt them to cause VM crash on invalid input.
>> 
>> ## Implementation notes
>> 
>> The goal of the associated umbrella issue [JDK-8156534](https://bugs.openjdk.org/browse/JDK-8156534) is to, for `java.lang.String*` classes,
>> 
>> 1. Move `@IntrinsicCandidate`-annotated `public` methods<sup>1</sup> (in Java code) to `private` ones, and wrap them with a `public` ["front door" method](https://github.com/openjdk/jdk/pull/24982#discussion_r2087493446)
>> 2. Since we moved the `@IntrinsicCandidate` annotation to a new method, intrinsic mappings – i.e., associated `do_intrinsic()` calls in `vmIntrinsics.hpp` – need to be updated too
>> 3. Add necessary input validation (range, null, etc.) checks to the newly created public front door method
>> 4. Place all input validation checks in the intrinsic code (add if missing!) behind a `VerifyIntrinsicChecks` VM flag
>> 
>> Following preliminary work needs to be carried out as well:
>> 
>> 1. Add a new `VerifyIntrinsicChecks` VM flag
>> 2. Update `generate_string_range_check` to produce a `HaltNode`.  That is, crash the VM if `VerifyIntrinsicChecks` is set and a Java wrapper fails to spot an invalid input.
>> 
>> <sup>1</sup>  `@IntrinsicCandidate`-annotated constructors are not subject to this change, since they are a special case.
>> 
>> ## Functional and performance tests
>> 
>> - `tier1` (which includes `test/hotspot/jtreg/compiler/intrinsics/string`) passes on several platforms. Further tiers will be executed after integrating reviewer feedback.
>> 
>> - Performance impact is still actively monitored using `test/micro/org/openjdk/bench/java/lang/String{En,De}code.java`, among other tests. If you have suggestions on benchmarks, please share in the comments.
>> 
>> ## Verification of the VM crash
>> 
>> I've tested the VM crash scenario as follows:
>> 
>> 1. Created the following test program:
>> 
>> public class StrIntri {
>>     public static void main(String[] args) {
>>         Exception lastException = null;
>>         for (int i = 0; i < 1_000_000; i++) {
>>             try {
>>                 jdk.internal.access.SharedSecrets.getJavaLangAccess().countPositives(new byte[]{1,2,3}, 2, 5);
>>             } catch (Exception exception) {
>>                 lastException = exception;
>>             }
>>         }
>>         if (lastException != null) {
>>             lastException.printStackTrace...
>
> I disagree with a small part of the statement of goals:
> 
>> Always validate all input at the intrinsic (but preferably behind a VM flag)
> 
> As formulated above, this is a violation of DRY and if embraced the wrong way will lead to code that is harder to review and prove bug-free.  Performing 100% accurate range/null/validation checks is deeply impractical for an assembly-based or IR-based intrinsic.  It’s too hard to verify by code review, and coverage testing is suspect.
> 
> We must frankly put all the weight of verification on Java code, including Java bytecode intrinsic behaviors.  Java code is high-level and can be read mostly as a declarative spec, if clearly written (as straight-line code, then the intrinsic call).  Also, such simple Java code shapes (and their underlying bytecodes) are tested many orders of magnitude more than any given intrinsic.
> 
> I see two bits of evidence that you agree with me on this: 1. The intrinsic-local validation (IR or assembly) is allowed to Halt instead of throw, and 2. the intrinsic-local validation is optional, turned on only by a stress test mode.  This tells me that the extra optional testing is also not required to be 100%.
> 
> Thus, I think the above goal would be better stated this way:
> 
>> Validate input in the IR or assembly code of the intrinsic in an ad hoc manner to catch bugs in the Java validation.
> 
>> Note:  IR or assembly based validation code should not obscure the code or add large maintenance costs, and under a VM diagnostic flag (or debug flag), and causing a VM halt instead of a Java throw.
> 
> I think I'm agreeing with you on the material points.  It is important to summarize our intentions accurately at the top, for those readers that are reading only the top as a summary.

@rose00, thanks so much for the feedback. I agree with your remarks and get your points on _"Always validate all input at the intrinsic"_ is a violation of DRY and an impractical goal.

I incorporated your suggestions as follows:

1. Renamed the ticket to `Move input validation checks to Java for String-related intrinsics` (to better reflect the goal)
2. Replaced `Always validate all input at the intrinsic...` with your suggestion

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25998#issuecomment-3061201145


More information about the hotspot-dev mailing list