RFR: 8231349: Move intrinsic stubs generation to compiler runtime initialization code
Claes Redestad
redestad at openjdk.org
Mon Mar 20 15:46:16 UTC 2023
On Mon, 20 Mar 2023 07:05:23 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
> Based on performance data (see graph in RFE) I propose to implement @cl4es suggestion to move intrinsics stubs generation to C2 (and JVMCI) runtime initialization code.
>
> It has <1% difference from not generated these stubs at all and we will not win on 1 core VMs but it is simpler and safer solution, I think. It also automatically (no need for new code) do not generate these stubs if C2 is not used (-Xint or low TieredStopAt Level.
>
> On demand stubs generation requires synchronization between threads during application run which may introduce some instability and may be other issues. But it could be beneficial for Interpreter and C1 if we want more intrinsics stubs to be used by C1 and Interpreter (they use CRC32 only now). I filed separate RFE [8304422](https://bugs.openjdk.org/browse/JDK-8304422).
>
> Changes:
> - Added new platform specific diagnostic flag `-XX:+MoveIntrinsicStubsGen`. It is ON by default if VM is built with C2 or JVMCI compilers except Zero and 32-bit Arm VMs which have no or few intrinsics.
> - Split `StubGenerator::generate_all()` method into two: `generate_final_stubs()` and `generate_compiler_stubs()`. Moved only C2 (and JVMCI) intrinsic stubs generation to new method.
> - I renamed methods and stubs buffer sizes according to new code. Now we have 4 separate **named** stubs buffers and corresponding methods: _Initial, Continuation, Compiler, Final_.
> - I added new UL printing to find new sizes for buffers and adjusted them on `aarch64` and `x86`. On other platforms I used the same as before value for `compiler_stubs` and `final_stubs`:
>
>> java -Xlog:stubs -XX:+UseCompressedOops -XX:+CheckCompressedOops -XX:+VerifyOops -XX:-VerifyStackAtCalls -version
> [0.006s][info][stubs] StubRoutines (initial stubs) [0x00007f94900fcc00, 0x00007f9490101b60] used: 16152, free: 4168
> [0.026s][info][stubs] StubRoutines (continuation stubs) [0x00007f9490102580, 0x00007f9490102e90] used: 741, free: 1579
> [0.051s][info][stubs] StubRoutines (final stubs) [0x00007f9490155600, 0x00007f949015cc70] used: 26484, free: 3836
> [0.090s][info][stubs] StubRoutines (compiler stubs) [0x00007f94904ccc00, 0x00007f94904d9bd0] used: 46988, free: 6212
> java version "21-internal" 2023-09-19 LTS
>
> -Xlog:stubs=debug will print size information for each stub:
> [0.005s][debug][stubs] ICache::flush_icache_stub [0x00007fb2d3828080, 0x00007fb2d382809d] (29 bytes)
> [0.005s][debug][stubs] VM_Version::get_cpu_info_stub [0x00007fb2d3828380, 0x00007fb2d3828714] (916 bytes)
> [0.005s][debug][stubs] VM_Version::detect_virt_stub [0x00007fb2d3828714, 0x00007fb2d382872e] (26 bytes)
> [0.005s][debug][stubs] StubRoutines::forward exception [0x00007fb2d3828c00, 0x00007fb2d3828c92] (146 bytes)
>
>
> Testing: tier1-7, Xcomp, stress on x64 and aarch64.
>
> I have changes for all platforms. Please test it on platforms you support.
FWIW this looks good to me!
Perhaps there's some improvements that can be made (see inline comments regarding the `count_positives` stub), but it might be prudent not to spend more time than necessary on this too much if anyone will be looking at https://bugs.openjdk.org/browse/JDK-8304422 soon enough.
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8093:
> 8091:
> 8092: // countPositives stub for large arrays.
> 8093: StubRoutines::aarch64::_count_positives = generate_count_positives(StubRoutines::aarch64::_count_positives_long);
A small detail but I am pretty certain this stub is only used by C2 and could be moved to `generate_compiler_stubs`. But it opens a question if there are more stubs that look like they are shared but are really only used by C2.
For historical reasons this intrinsic was implemented with a macro+stub on aarch64 but x64 et al. When doing so the macro was defined in MacroAssembler and not C2_MacroAssembler, but it is effectively only used from aarch64.ad.
It might be interesting to make C1 (and possibly interpreter) use this stub when available, but if/when that happens moving it back to `generate_final_stubs` is relatively straightforward.
src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp line 42:
> 40: _continuation_stubs_code_size = 2000,
> 41: _compiler_stubs_code_size = 30000,
> 42: _final_stubs_code_size = 20000
The tricky part when updating these is knowing which set of CPU features and VM flags will generate the largest possible stubs, but it looks like you've added ample of free space with these estimates.
src/hotspot/share/runtime/stubRoutines.cpp line 413:
> 411: void compiler_stubs_init(bool in_compiler_thread) {
> 412: if (in_compiler_thread && MoveIntrinsicStubsGen) {
> 413: // Temporare revert state of stubs generation because
"Temporarily"
-------------
Marked as reviewed by redestad (Reviewer).
PR: https://git.openjdk.org/jdk/pull/13096
More information about the hotspot-dev
mailing list