RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v11]
Vladimir Kozlov
kvn at openjdk.java.net
Thu Dec 2 00:58:32 UTC 2021
On Wed, 1 Dec 2021 11:09:55 GMT, Volker Simonis <simonis at openjdk.org> wrote:
>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
>>
>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
>>
>> public static boolean isAlpha(int c) {
>> try {
>> return IS_ALPHA[c];
>> } catch (ArrayIndexOutOfBoundsException ex) {
>> return false;
>> }
>> }
>>
>>
>> ### Solution
>>
>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
>>
>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
>> Benchmark (exceptionProbability) Mode Cnt Score Error Units
>> ImplicitExceptions.bench 0.0 avgt 5 1.430 ± 0.353 ns/op
>> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ± 77.358 ns/op
>> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ± 1205.104 ns/op
>> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ± 1022.728 ns/op
>>
>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
>> Benchmark (exceptionProbability) Mode Cnt Score Error Units
>> ImplicitExceptions.bench 0.0 avgt 5 1.432 ± 0.352 ns/op
>> ImplicitExceptions.bench 0.33 avgt 5 355.723 ± 16.641 ns/op
>> ImplicitExceptions.bench 0.66 avgt 5 887.068 ± 166.728 ns/op
>> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ± 88.235 ns/op
>>
>>
>> ### Implementation details
>>
>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception.
>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.
>
> Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
>
> - Fix jit/t/t105/t105.java to also use -XX:-OptimizeImplicitExceptions in addition to -XX:-OmitStacktracesInFastThrow
> - Fix IR Framework test Traps::classCheck() which now behaves differently with -XX:+OptimizeImplicitExceptions
> - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters
> Rebased on top of '8275908: Record null_check traps for calls and array_check traps in the interpreter'
> - Fix special case where we're creating an implicit exception for a regular invoke* bytecode
> - Minor updates as requested by @TheRealMDoerr
> - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow
Volker. What MDO (bytecodes and counters) looks like for your test case method (-XX:CompileCommand=print,ImplicitException.isAlphaWithException) ?
src/hotspot/share/opto/graphKit.cpp line 627:
> 625: const TypeKlassPtr *ex_type = TypeKlassPtr::make(ex_ciInstKlass);
> 626: kill_dead_locals();
> 627: Node* ex_node = new_instance(makecon(ex_type), NULL, NULL, true);
What happened if deoptimization happen during this allocation (which is safepoint)?
Which bytecode will be executed in Interpeter after deopt?
src/hotspot/share/opto/graphKit.cpp line 629:
> 627: Node* ex_node = new_instance(makecon(ex_type), NULL, NULL, true);
> 628: set_argument(0, ex_node);
> 629: ciMethod* init = ex_ciInstKlass->find_method(ciSymbol::make("<init>"), ciSymbol::make("()V"));
I know that all exceptions classes have such constructor but in general you need to check for `nullptr`. I think it could be moved before check at line 624.
src/hotspot/share/opto/graphKit.cpp line 640:
> 638: address target = SharedRuntime::get_resolve_opt_virtual_call_stub();
> 639:
> 640: CallStaticJavaNode *call = new CallStaticJavaNode(kit.C, TypeFunc::make(init), target, init);
At the end `<init>()` will call native `fillInStackTrace()` and nothing else:
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Throwable.java#L255
Should we optimize it by inlining it here so that EA can eliminate above Allocation if it does not escape?
-------------
PR: https://git.openjdk.java.net/jdk/pull/5488
More information about the hotspot-compiler-dev
mailing list