RFR: 8255716: AArch64: Regression: JVM crashes if manually offline a core
Patrick Zhang
qpzhang at openjdk.java.net
Mon Nov 2 13:14:58 UTC 2020
On Mon, 2 Nov 2020 13:03:01 GMT, Patrick Zhang <qpzhang at openjdk.org> wrote:
>>> This assertion is practically not needed. While a test system with some cores intentionally turned off, e.g. via echo 0 > /sys/devices/system/cpu/cpu1/online, java -version would crash, building openjdk on this system would fail too.
>>
>> I don't follow this comment. I am not saying the check is redundant but the rationale provided above does not really cut it. Sure, a build might also fail. But that misses several possibilities. The image could be built on a different machine which didn't have the cpu switched off. The cpu switch off could happen on the same machine between building and running. If there is a good reason for not including this check the one cited above is not it.
>>
>> Can you just state in simple terms why it is ok to continue when the info retrieved from /proc/cpuinfo and os::processor_count() do not match up?
>
>> > This assertion is practically not needed. While a test system with some cores intentionally turned off, e.g. via echo 0 > /sys/devices/system/cpu/cpu1/online, java -version would crash, building openjdk on this system would fail too.
>>
>> I don't follow this comment. I am not saying the check is redundant but the rationale provided above does not really cut it. Sure, a build might also fail. But that misses several possibilities. The image could be built on a different machine which didn't have the cpu switched off. The cpu switch off could happen on the same machine between building and running. If there is a good reason for not including this check the one cited above is not it.
>>
>> Can you just state in simple terms why it is ok to continue when the info retrieved from /proc/cpuinfo and os::processor_count() do not match up?
>
> JVM can work well with the number of enabled cores, which is equivalent to the value of os::processor_count(), e.g. decide the number of threads for parallel gc. While counting the lines from /proc/cpuinfo is a static number.
> /proc/cpuinfo was used to decide CPU_A53MAC feature [ec9bee6#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R184](https://github.com/openjdk/jdk/commit/ec9bee68660acd6abf0b4dd4023ae69514542256#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R184).
Thanks for your comment. Yes, I noticed this part as well. One possible of solving that could be having a duplicated but simpler '_counting /proc/cpuinfo lines_' logic here.
I want to solve the [guarantee call](https://github.com/openjdk/jdk/pull/983/files#diff-7e6fa90a7bcdbe41687eb8d39c6c6232e6518b019937a87aab75284166ef67bdL171) issue firstly, as it is breaking my daily build & test (where I have only half of cores online).
-------------
PR: https://git.openjdk.java.net/jdk/pull/983
More information about the hotspot-runtime-dev
mailing list