RFR: 8231349: Move intrinsic stubs generation to compiler runtime initialization code
Vladimir Kozlov
kvn at openjdk.org
Mon Mar 20 17:04:48 UTC 2023
On Mon, 20 Mar 2023 14:42:02 GMT, Claes Redestad <redestad at openjdk.org> wrote:
>> Based on performance data (see graph in RFE) I propose to implement @cl4es suggestion to move intrinsics stubs generation to C2 (and JVMCI) runtime initialization code.
>>
>> It has <1% difference from not generated these stubs at all and we will not win on 1 core VMs but it is simpler and safer solution, I think. It also automatically (no need for new code) do not generate these stubs if C2 is not used (-Xint or low TieredStopAt Level.
>>
>> On demand stubs generation requires synchronization between threads during application run which may introduce some instability and may be other issues. But it could be beneficial for Interpreter and C1 if we want more intrinsics stubs to be used by C1 and Interpreter (they use CRC32 only now). I filed separate RFE [8304422](https://bugs.openjdk.org/browse/JDK-8304422).
>>
>> Changes:
>> - Added new platform specific diagnostic flag `-XX:+MoveIntrinsicStubsGen`. It is ON by default if VM is built with C2 or JVMCI compilers except Zero and 32-bit Arm VMs which have no or few intrinsics.
>> - Split `StubGenerator::generate_all()` method into two: `generate_final_stubs()` and `generate_compiler_stubs()`. Moved only C2 (and JVMCI) intrinsic stubs generation to new method.
>> - I renamed methods and stubs buffer sizes according to new code. Now we have 4 separate **named** stubs buffers and corresponding methods: _Initial, Continuation, Compiler, Final_.
>> - I added new UL printing to find new sizes for buffers and adjusted them on `aarch64` and `x86`. On other platforms I used the same as before value for `compiler_stubs` and `final_stubs`:
>>
>>> java -Xlog:stubs -XX:+UseCompressedOops -XX:+CheckCompressedOops -XX:+VerifyOops -XX:-VerifyStackAtCalls -version
>> [0.006s][info][stubs] StubRoutines (initial stubs) [0x00007f94900fcc00, 0x00007f9490101b60] used: 16152, free: 4168
>> [0.026s][info][stubs] StubRoutines (continuation stubs) [0x00007f9490102580, 0x00007f9490102e90] used: 741, free: 1579
>> [0.051s][info][stubs] StubRoutines (final stubs) [0x00007f9490155600, 0x00007f949015cc70] used: 26484, free: 3836
>> [0.090s][info][stubs] StubRoutines (compiler stubs) [0x00007f94904ccc00, 0x00007f94904d9bd0] used: 46988, free: 6212
>> java version "21-internal" 2023-09-19 LTS
>>
>> -Xlog:stubs=debug will print size information for each stub:
>> [0.005s][debug][stubs] ICache::flush_icache_stub [0x00007fb2d3828080, 0x00007fb2d382809d] (29 bytes)
>> [0.005s][debug][stubs] VM_Version::get_cpu_info_stub [0x00007fb2d3828380, 0x00007fb2d3828714] (916 bytes)
>> [0.005s][debug][stubs] VM_Version::detect_virt_stub [0x00007fb2d3828714, 0x00007fb2d382872e] (26 bytes)
>> [0.005s][debug][stubs] StubRoutines::forward exception [0x00007fb2d3828c00, 0x00007fb2d3828c92] (146 bytes)
>>
>>
>> Testing: tier1-7, Xcomp, stress on x64 and aarch64.
>>
>> I have changes for all platforms. Please test it on platforms you support.
>
> src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp line 42:
>
>> 40: _continuation_stubs_code_size = 2000,
>> 41: _compiler_stubs_code_size = 30000,
>> 42: _final_stubs_code_size = 20000
>
> The tricky part when updating these is knowing which set of CPU features and VM flags will generate the largest possible stubs, but it looks like you've added ample of free space with these estimates.
Yes, that is why I ran with `-XX:+UseCompressedOops -XX:+CheckCompressedOops -XX:+VerifyOops -XX:-VerifyStackAtCalls` flags which increase generated code size.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13096#discussion_r1142437121
More information about the hotspot-dev
mailing list