RFR: 8255716: AArch64: Regression: JVM crashes if manually offline a core

Patrick Zhang qpzhang at openjdk.java.net
Mon Nov 2 13:14:58 UTC 2020


On Mon, 2 Nov 2020 13:03:01 GMT, Patrick Zhang <qpzhang at openjdk.org> wrote:

>>> This assertion is practically not needed. While a test system with some cores intentionally turned off, e.g. via echo 0 > /sys/devices/system/cpu/cpu1/online, java -version would crash, building openjdk on this system would fail too.
>> 
>> I don't follow this comment. I am not saying the check is redundant but the rationale provided above does not really cut it. Sure, a build might also fail. But that misses several possibilities. The image could be built on a different machine which didn't have the cpu switched off. The cpu switch off could happen on the same machine between building and running. If there is a good reason for not including this check the one cited above is not it.
>> 
>> Can you just state in simple terms why it is ok to continue when the info retrieved from /proc/cpuinfo and os::processor_count() do not match up?
>
>> > This assertion is practically not needed. While a test system with some cores intentionally turned off, e.g. via echo 0 > /sys/devices/system/cpu/cpu1/online, java -version would crash, building openjdk on this system would fail too.
>> 
>> I don't follow this comment. I am not saying the check is redundant but the rationale provided above does not really cut it. Sure, a build might also fail. But that misses several possibilities. The image could be built on a different machine which didn't have the cpu switched off. The cpu switch off could happen on the same machine between building and running. If there is a good reason for not including this check the one cited above is not it.
>> 
>> Can you just state in simple terms why it is ok to continue when the info retrieved from /proc/cpuinfo and os::processor_count() do not match up?
> 
> JVM can work well with the number of enabled cores, which is equivalent to the value of os::processor_count(), e.g. decide the number of threads for parallel gc. While counting the lines from /proc/cpuinfo is a static number.

> /proc/cpuinfo was used to decide CPU_A53MAC feature [ec9bee6#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R184](https://github.com/openjdk/jdk/commit/ec9bee68660acd6abf0b4dd4023ae69514542256#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R184).

Thanks for your comment. Yes, I noticed this part as well. One possible of solving that could be having a duplicated but simpler '_counting /proc/cpuinfo lines_' logic here.  
I want to solve the [guarantee call](https://github.com/openjdk/jdk/pull/983/files#diff-7e6fa90a7bcdbe41687eb8d39c6c6232e6518b019937a87aab75284166ef67bdL171) issue firstly, as it is breaking my daily build & test (where I have only half of cores online).

-------------

PR: https://git.openjdk.java.net/jdk/pull/983


More information about the hotspot-runtime-dev mailing list