RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6]

Wed Nov 10 17:12:44 UTC 2021

On Thu, 4 Nov 2021 16:28:52 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
>> 
>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
>> 
>>     public static boolean isAlpha(int c) {
>>         try {
>>             return IS_ALPHA[c];
>>         } catch (ArrayIndexOutOfBoundsException ex) {
>>             return false;
>>         }
>>     }
>> 
>> 
>> ### Solution
>> 
>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ±    0.353  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ±   77.358  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ± 1205.104  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ± 1022.728  ns/op
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ±    0.352  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ±   16.641  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ±  166.728  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ±   88.235  ns/op
>> 
>> 
>> ### Implementation details
>> 
>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it.

Thanks for adding a test. Your new additions look basically good, but I have a few remarks and questions.

src/hotspot/share/prims/whitebox.cpp line 987:

> 985:     bool overflow = false;
> 986:     for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) {
> 987:       if (reason_str != NULL && !strcmp(reason_str, Deoptimization::trap_reason_name(reason))) {

Maybe the code would be better readable when checking `reason_str != NULL` first and then use 2 loops? Just a minor suggestion. Should only be done if readability is better.

src/hotspot/share/prims/whitebox.cpp line 1016:

> 1014:   }
> 1015:   ResourceMark rm(THREAD);
> 1016:   char *reason_str = (reason_obj == NULL) ?

I think we should use `const char*` as far as possible.

src/hotspot/share/runtime/deoptimization.cpp line 2695:

> 2693:   return 0;
> 2694: }
> 2695: 

Why do we need this? Is it a placeholder for a future enhancement? If so, a comment would at least be helpful.

test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 78:

> 76:     private static final WhiteBox WB = WhiteBox.getWhiteBox();
> 77:     // Until JDK-8275908 is not fixed, null-pointer traps for invokes and array-store traps are not profiled in the interpreter.
> 78:     private static final boolean JDK8275908_fixed = false;

I don't know if that one should get fixed first, but I'm ok with your workaround. Would it make sense to add that bug id to this test's header?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488