RFR: 8231349: Move intrinsic stubs generation to compiler runtime initialization code [v2]

Vladimir Ivanov vlivanov at openjdk.org
Wed Mar 22 21:50:47 UTC 2023


On Mon, 20 Mar 2023 17:17:13 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Based on performance data (see graph in RFE) I propose to implement @cl4es suggestion to move intrinsics stubs generation to C2 (and JVMCI) runtime initialization code.
>> 
>> It has <1% difference from not generated these stubs at all and we will not win on 1 core VMs but it is simpler and safer solution, I think. It also automatically (no need for new code) do not generate these stubs if C2 is not used (-Xint or low TieredStopAt Level.
>> 
>> On demand stubs generation requires synchronization between threads during application run which may introduce some instability and may be other issues. But it could be beneficial for Interpreter and C1 if we want more intrinsics stubs to be used by C1 and Interpreter (they use CRC32 only now).  I filed separate RFE  [8304422](https://bugs.openjdk.org/browse/JDK-8304422).
>> 
>> Changes:
>>  - Added new platform specific diagnostic flag `-XX:+MoveIntrinsicStubsGen`. It is ON by default if VM is built with C2 or JVMCI compilers except Zero and 32-bit Arm VMs which have no or few intrinsics.
>>  - Split `StubGenerator::generate_all()` method into two: `generate_final_stubs()` and `generate_compiler_stubs()`. Moved only C2 (and JVMCI) intrinsic stubs generation to new method.
>>  - I renamed methods and stubs buffer sizes according to new code. Now we have 4 separate **named** stubs buffers and corresponding methods: _Initial, Continuation, Compiler, Final_.
>>  - I added new UL printing to find new sizes for buffers and adjusted them on `aarch64` and `x86`. On other platforms I used the same as before value for `compiler_stubs` and `final_stubs`:
>> 
>>> java -Xlog:stubs -XX:+UseCompressedOops -XX:+CheckCompressedOops -XX:+VerifyOops -XX:+VerifyStackAtCalls -version
>> [0.006s][info][stubs] StubRoutines (initial stubs)	 [0x00007f94900fcc00, 0x00007f9490101b60] used: 16152, free: 4168
>> [0.026s][info][stubs] StubRoutines (continuation stubs)	 [0x00007f9490102580, 0x00007f9490102e90] used: 741, free: 1579
>> [0.051s][info][stubs] StubRoutines (final stubs)	 [0x00007f9490155600, 0x00007f949015cc70] used: 26484, free: 3836
>> [0.090s][info][stubs] StubRoutines (compiler stubs)	 [0x00007f94904ccc00, 0x00007f94904d9bd0] used: 46988, free: 6212
>> java version "21-internal" 2023-09-19 LTS
>> 
>> -Xlog:stubs=debug will print size information for each stub:
>> [0.005s][debug][stubs] ICache::flush_icache_stub [0x00007fb2d3828080, 0x00007fb2d382809d] (29 bytes)
>> [0.005s][debug][stubs] VM_Version::get_cpu_info_stub [0x00007fb2d3828380, 0x00007fb2d3828714] (916 bytes)
>> [0.005s][debug][stubs] VM_Version::detect_virt_stub [0x00007fb2d3828714, 0x00007fb2d382872e] (26 bytes)
>> [0.005s][debug][stubs] StubRoutines::forward exception [0x00007fb2d3828c00, 0x00007fb2d3828c92] (146 bytes)
>> 
>> 
>> Testing: tier1-7, Xcomp, stress on x64 and aarch64.
>> 
>> I have changes for all platforms. Please test it on platforms you support.
>
> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Address Claes comments

Overall, looks good.

Just minor comments about code style.

src/hotspot/cpu/aarch64/globals_aarch64.hpp line 39:

> 37: define_pd_global(bool, UncommonNullCast,         true);  // Uncommon-trap NULLs past to check cast
> 38: 
> 39: #if COMPILER2_OR_JVMCI

I find it easier to read when only the default value is guarded. In that respect, `COMPILER2_OR_JVMCI` code diverges from the rest of the code base.

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 4778:

> 4776: 
> 4777:  public:
> 4778:   StubGenerator(CodeBuffer* code, StubsKind kind) : StubCodeGenerator(code) {

The code is repeated on multiple platforms. It makes sense to lift it to `StubCodeGenerator`.

src/hotspot/share/runtime/globals.hpp line 369:

> 367:           "Enables intrinsification of Math.signum")                        \
> 368:                                                                             \
> 369:   product_pd(bool, MoveIntrinsicStubsGen, DIAGNOSTIC,                       \

The flag name looks confusing. What about `LazyCompilerStubGeneration` or even `LazyStubGeneration`?

-------------

Marked as reviewed by vlivanov (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/13096#pullrequestreview-1353508986
PR Review Comment: https://git.openjdk.org/jdk/pull/13096#discussion_r1145441707
PR Review Comment: https://git.openjdk.org/jdk/pull/13096#discussion_r1145436835
PR Review Comment: https://git.openjdk.org/jdk/pull/13096#discussion_r1145431118


More information about the hotspot-dev mailing list