[jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system
Per Liden
pliden at openjdk.java.net
Fri Jan 15 22:05:12 UTC 2021
On Fri, 15 Jan 2021 15:07:30 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:
>> Some environments (e.g. OpenVZ containers) incorrectly report a logical processor id that is higher than the number of processors available. This is problematic, for example, when implementing CPU-local data structures, where the processor id is used to index into an array of length processor_count().
>>
>> We've received crash reports from Jelastic (a Virtuozzo/OpenVZ user) where they run into this problem. We can workaround the problem in the JVM, until the underlying problem is fixed. Without this workaround ZGC can't be used in this environment.
>>
>> This is currently a ZGC-specific issue, since ZGC is currently the only part of HotSpot that is using CPU-local data structures, but that could change in the future.
>>
>> Just to clarify. In a Virtuozzo/OpenZV environment, it seems the underlying problem is not necessarily that sched_getcpu() returns an incorrect processor id, but rather that sysconf(_SC_NPROCESSORS_CONF) returns a too low number. Either way, sched_getcpu() and syconf(_SC_NPROCESSORS_CONF) seems to have different views of the world. This is not an issue in container environments such as Docker.
>>
>> This patch works around this problem by letting os::processor_id() on Linux detect incorrect processor ids, and convert them to processor id 0. As mentioned in the comment in the code, this is safe, but not optimal for performance if the system actually has more than one processor. There's also a warning printed the first time this happen.
>>
>> Testing: Manual testing with various fake/incorrect values returned from sched_getcpu().
>
> src/hotspot/os/linux/os_linux.cpp line 4784:
>
>> 4782: "(got processor id %d, valid processor id range is 0-%d)",
>> 4783: id, processor_count() - 1);
>> 4784: log_warning(os)("Falling back so assuming processor id is 0. "
>
> s/so/to/
Will fix!
> src/hotspot/os/linux/os_linux.cpp line 4769:
>
>> 4767: const int id = Linux::sched_getcpu();
>> 4768:
>> 4769: if (id >= 0 && id < processor_count()) {
>
> Do we really need to check if the returned processor ID is negative? That seems a whole new level of environment screwup to me.
I'm thinking we should make this safe to call in all cases. God knows what a broken environment might return.
-------------
PR: https://git.openjdk.java.net/jdk16/pull/124
More information about the hotspot-gc-dev
mailing list