RFR: 8255716: AArch64: Regression: JVM crashes if manually offline a core

Anton Kozlov akozlov at openjdk.java.net
Mon Nov 2 12:39:59 UTC 2020


On Mon, 2 Nov 2020 10:08:25 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

>> Please review this change that removed the unnecessary check **cpu_lines == os::processor_count()** from VM_Version::get_os_cpu_info inside src/hotspot/os_cpu/linux_aarch64/vm_version_linux_aarch64.cpp. 
>> 
>> This **assertion** is practically not needed. While a test system with some cores intentionally turned off, e.g. via echo 0 > /sys/devices/system/cpu/cpu1/online, **java -version** would crash, building openjdk on this system would fail too. 
>> 
>> This regression issue was introduced by https://github.com/openjdk/jdk/commit/ec9bee68
>> 
>> Testing:
>> - java -version, building openjdk as smoke tests
>> - jtreg tier1 as sanity check
>
>> This assertion is practically not needed. While a test system with some cores intentionally turned off, e.g. via echo 0 > /sys/devices/system/cpu/cpu1/online, java -version would crash, building openjdk on this system would fail too.
> 
> I don't follow this comment. I am not saying the check is redundant but the rationale provided above does not really cut it. Sure, a build might also fail. But that misses several possibilities. The image could be built on a different machine which didn't have the cpu switched off. The cpu switch off could happen on the same machine between building and running. If there is a good reason for not including this check the one cited above is not it.
> 
> Can you just state in simple terms why it is ok to continue when the info retrieved from /proc/cpuinfo and os::processor_count() do not match up?

Hi, I've introduced this check, so I feel responsible. The number of cores reported in /proc/cpuinfo was used to decide CPU_A53MAC feature https://github.com/openjdk/jdk/commit/ec9bee68660acd6abf0b4dd4023ae69514542256#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R184. After the change, os::processor_count is used instead and the guarantee checks the meaning of the CPU_A53MAC has not changed.

processor_count is initialized with sysconf https://github.com/openjdk/jdk/blob/master/src/hotspot/os/linux/os_linux.cpp#L364, which now I see may be affected by disabling cores https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/getsysstats.c;h=6d4c9c06e8a5de1b97da448909aa0874df5f6c88;hb=6d4c9c06e8a5de1b97da448909aa0874df5f6c88#l113. So processor_count can be actually not equal to number of CPUs in /proc/cpuinfo.

Assert can be relaxed (`<=` should be correct). But even after that, it's incorrect to set the feature with only one active core. It does not look critical (the feature should be harmless), but does not look right as well.

-------------

PR: https://git.openjdk.java.net/jdk/pull/983


More information about the hotspot-runtime-dev mailing list