RFR: Fix SCCache closing races

duke duke at openjdk.org
Thu Sep 5 09:10:30 UTC 2024


On Wed, 4 Sep 2024 12:46:31 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> I have been scratching my head why Leyden builds crash with multiple iterations of `HelloStream` example.
> 
> The crashes are of various shapes, but they have a similar clue: it happens when we touch `SCCache` when it was already closed: `hs_err` shows we have called `before_exit`, `gdb` shows `SCCache::_cache` is already `nullptr`, but the crashes are in compiler code that accesses `SCCache`.
> 
> Looking at `SCCache` closing code, I found two major bugs:
> 
> 1. For cache writer side, we piggyback on `Compile_lock` to abitrate the access to maybe-closing cache. That does not work: in `ciEnv::register_method`, we already have `SCEntry*` in our hands, and once we acquire the `Compile_lock`, the whole `SCCache` might be gone, `SCEntry*` points to garbage, and we SEGV. This one is easy to solve: check that cache is still on, under the lock.
> 
> 2. For cache reader side, we have a `reading_nmethod` counter, which is supposed to track how many nmethod readers are currently active. Unfortunately, this is still racy: nothing prevents new nmethod readers to appear _"after"_ we closed the cache. So those nmethod readers can access the cache in invalid state. There is also piggybacking on `Compilation_lock`, which AFAICS is for WhiteBox paths! So this is not great. I replaced the whole thing with carefully tracking the nmethod reader counts, and making sure we have the phasing right.
> 
> Eliminating `Compilation_lock` wait improves performance and variance on short workloads as well. Observe:
> 
> 
> $ rm -f app.cds*; taskset -c 0-7 hyperfine -w 50 -r 100 "build/linux-x86_64-server-release/images/jdk/bin/java -XX:CacheDataStore=app.cds -Xmx256m -Xms256m -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -cp hellostream.jar HelloStream"
> 
> # BEFORE (this also crashes every so often)
> Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:CacheDataStore=app.cds -Xmx256m -Xms256m -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -cp hellostream.jar HelloStream
>   Time (mean ± σ):      24.0 ms ±   4.3 ms    [User: 14.1 ms, System: 9.8 ms]
>   Range (min … max):    17.0 ms …  27.3 ms    100 runs
>  
>   Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
> 
> # AFTER
> Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:CacheDataStore=app.cds -Xmx256m -Xms256m -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -cp hellostream....

@shipilev 
Your change (at version 36efd4babda0907d59ffa7ab8d15f3c589766110) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/leyden/pull/13#issuecomment-2331000260


More information about the leyden-dev mailing list