RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)

Damon Fenacci dfenacci at openjdk.org
Thu Feb 20 16:32:18 UTC 2025


# Issue
The test `src/hotspot/share/opto/c2compiler.cpp` fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 in `RuntimeStub::new_runtime_stub` and `SingletonBlob::operator new`.

# Causes
There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in `RuntimeStub::new_runtime_stub` (through `RuntimeStub::operator new`) and `SingletonBlob::operator new` triggering a fatal error if there is no more space. The paths in question are:
1. `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`
1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`
1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_uncommon_trap_blob` -> `UncommonTrapBlob::create` -> `new UncommonTrapBlob`
1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_exception_blob` -> `ExceptionBlob::create` -> `new ExceptionBlob`

# Solution
Instead of fatally crashing the we can use the `alloc_fail_is_fatal` flag of `RuntimeStub::new_runtime_stub` to avoid crashing in cases 1 and 2 and add a similar flag to `SingletonBlob::operator new` for cases 3 and 4. In the latter case we need to adjust all calls accordingly.

Note: In [JDK-8326615](https://bugs.openjdk.org/browse/JDK-8326615) it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it.

# Testing
The original failing regression test in `test/hotspot/jtreg/compiler/startup/StartupOutput.java` has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion.

Tests: Tier 1-4 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode)

-------------

Commit messages:
 - JDK-8347406: reduce number of tests again
 - JDK-8347406: update copyright year
 - Merge branch 'master' into JDK-8347406
 - Merge branch 'master' into JDK-8347406
 - JDK-8347406: reduce number of test processes
 - JDK-8347406: set the C2 uncommon and exception trap blobs in OptoRuntime::generate
 - JDK-8347406: fix c2 runtime init return condition
 - JDK-8347406: reduce number of processes in test
 - JDK-8347406: make startup processes run in parallel
 - JDK-8347406: reduce number of startup test attempts
 - ... and 8 more: https://git.openjdk.org/jdk/compare/efbad00c...e930df47

Changes: https://git.openjdk.org/jdk/pull/23630/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23630&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8347406
  Stats: 114 lines in 27 files changed: 38 ins; 3 del; 73 mod
  Patch: https://git.openjdk.org/jdk/pull/23630.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23630/head:pull/23630

PR: https://git.openjdk.org/jdk/pull/23630


More information about the hotspot-dev mailing list