RFR: 8283199: Linux os::cpu_microcode_revision() stalls cold startup [v3]
Thomas Stuefe
stuefe at openjdk.java.net
Mon Mar 21 13:04:37 UTC 2022
On Mon, 21 Mar 2022 10:33:07 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> Noticed this when staring at timeline profiles for JVM startup. If you run any small startup workload on Linux, there is a significant time gap where JVM does nothing. I pinpointed that to fopen of `/proc/cpuinfo` due to [JDK-8249672](https://bugs.openjdk.java.net/browse/JDK-8249672).
>>
>> This *does not* reproduce if you run startup workloads continuously, as it looks as if `/proc/cpuinfo` is cached for about a second, and stalls reads after that, I suspect for CPU info updates, like frequency. This reproduces on at least two of my systems running Linux kernels 5.4 and 5.15.
>>
>> Observe:
>>
>>
>> $ for I in `seq 1 5`; do sleep 0.99; time cat /proc/cpuinfo > /dev/null; done
>> real 0m0.022s
>> real 0m0.003s
>> real 0m0.004s
>> real 0m0.004s
>> real 0m0.004s
>>
>> $ for I in `seq 1 5`; do sleep 1.00; time cat /proc/cpuinfo > /dev/null; done
>> real 0m0.022s
>> real 0m0.022s
>> real 0m0.022s
>> real 0m0.022s
>> real 0m0.022s
>>
>>
>> It directly impacts JVM startup:
>>
>>
>> $ for I in `seq 1 3`; do sleep 1; time build/linux-x86_64-server-release/images/jdk/bin/java -Xms128m -Xmx128m -version > /dev/null 2>&1 ; done
>>
>> real 0m0.046s
>> real 0m0.045s
>> real 0m0.046s
>>
>>
>> Without the sleep:
>>
>>
>> $ for I in `seq 1 3`; do time build/linux-x86_64-server-release/images/jdk/bin/java -Xms128m -Xmx128m -version > /dev/null 2>&1 ; done
>>
>> real 0m0.026s
>> real 0m0.026s
>> real 0m0.023s
>>
>>
>> There is another way to do this: read `/sys/devices/system/cpu/cpu0/microcode/version`, this is what this patch does. With `sleep 1`:
>>
>>
>> $ for I in `seq 1 3`; do sleep 1; time build/linux-x86_64-server-release/images/jdk/bin/java -Xms128m -Xmx128m -version > /dev/null 2>&1 ; done
>>
>> real 0m0.025s
>> real 0m0.024s
>> real 0m0.024s
>>
>>
>> Which means it improves startup time from ~45ms to ~25ms, or about 1.8x!
>>
>> `os::cpu_microcode_revision()` is currently used to generate the VM features string. It raises a bigger question if VM features string should be generated on startup, but it seems to take negligible time otherwise, see [JDK-8283200](https://bugs.openjdk.java.net/browse/JDK-8283200). Given [JDK-8249672](https://bugs.openjdk.java.net/browse/JDK-8249672) had been backported to update releases, let's do a pointed fix for this regression first.
>>
>> Additional testing:
>> - [x] Linux x86_64 fastdebug `tier1`
>> - [x] Eyeballing reported microcode number with/without the fix -- same thing reported
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>
> Partially revert the buffer trim
Interesting find.
Do you know _why_ access to /proc is slower than to /sys/devices? Is it all /proc accesses or just /proc/cpuinfo?
My first thought was that since we use /proc for many things, we'd probably end up paying for it anyway.
Just idle thoughts, I'm fine with the patch in general (had no close look yet).
src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 473:
> 471: if (fgets(data, sizeof(data), fp)) {
> 472: sscanf(data, "%x", &result);
> 473: }
Just use fscanf? Would not need a buffer and be more compact.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7825
More information about the hotspot-runtime-dev
mailing list