RFR: 8349713: [leyden] Memory map the cached code file

Francesco Nigro duke at openjdk.org
Mon Feb 10 13:04:42 UTC 2025


On Mon, 10 Feb 2025 12:15:01 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> It is visible in profiles for lots of applications that reading the SC cache file at startup costs significantly. On JavacBenchApp example, loading ~25M code requires about 30ms. This is ~1 GB/sec, so it is I/O limited.
>> 
>> We should really mmap the SC cache file to alleviate these costs. Let the actual SC readers (separate threads) to eat the cost of reading from the backing file. 
>> 
>> I was not entirely sure COW for file mappings works correctly on Windows, so I excepted that one.
>> 
>> Additional testing:
>>  - [x] Linux x86_64 server fastdebug, `runtime/cds`
>
> It demonstrably improves performance on Linux, kicking out the 30ms out of critical startup path.
> 
> 
> # Without mmap (legacy code)
> Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:CacheDataStore=JavacBenchApp.cds -XX:+UseParallelGC -cp JavacBenchApp.jar -XX:-MmapCachedCode JavacBenchApp 50 1
>   Time (mean ± σ):     408.0 ms ±   2.5 ms    [User: 1231.7 ms, System: 196.2 ms]
>   Range (min … max):   404.6 ms … 412.8 ms    10 runs
> 
> # With mmap
> Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:CacheDataStore=JavacBenchApp.cds -XX:+UseParallelGC -cp JavacBenchApp.jar JavacBenchApp 50 1
>   Time (mean ± σ):     382.1 ms ±   2.6 ms    [User: 1229.9 ms, System: 181.6 ms]
>   Range (min … max):   378.9 ms … 388.0 ms    10 runs

@shipilev 

Does the numbers still holds with

sync; echo 3 > /proc/sys/vm/drop_caches

before the benchmark (single shot) run?
With big servers, unless there's some expected sharing (i.e. multiple processes using the same archive) to further boost it, I would expect direct I/O to benefit loading the archive by saving the extra copy (in the OS page cache) required to use the data read (OS page cache + extra copy).
The other concern re mmap is due to munmap cost which, on kernel side, relies (IIRC) to some v page single (!) lock to guard it - which usually slowdown processes termination

-------------

PR Comment: https://git.openjdk.org/leyden/pull/34#issuecomment-2647922569


More information about the leyden-dev mailing list