[jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2]

Thu Jan 21 08:40:50 UTC 2021

On Wed, 20 Jan 2021 13:12:53 GMT, Per Liden <pliden at openjdk.org> wrote:

>> @dholmes-ora 
>> 
>>> So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then?
>> 
>> Not sure what you have in mind here? Having an indirect function call would not result in a lower overhead than the test/branch I've introduced. It's also not necessarily trivial to detect this error at startup, as you would need a reliable way to enumerate all processors (something that seems semi-broken in this environment, which is the root of the problem), bind the current thread to each of them and then check the processor id.
>> 
>>> Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment?
>> 
>> That's of course always judgement call/trade-off. I can't say I have a super good understanding of how common this environment it, but there's at least one "Java cloud provider" that uses this environment.
>
> It seems there have been e-mails sent that didn't show up here, so I'm answering on GitHub to hopefully re-attach the discussion to this PR.
> 
> From the mailing list:
>>> Glibc's tst-getcpu.c (which I assume is the test you are referring
>>> to?) fails in their environment, so it seems like the affinity mask
>>> isn't reliable either.
>> 
>> What's the nature of the failure?  If it's due to a non-changing
>> affinity mask, then using sched_getaffinity data would still be okay.
> 
> Glibc's tst-getcpu fails with some version of "getcpu results X should be Y".
> 
> There seems to be a disconnect between CPU masks/affinity and what sched_getcpu() returns.
> 
> Example (container with 1 CPU):
> 
> 1. sysconf(_SC_NPROCESSORS_CONF) returns 1
> 2. sysconf(_SC_NPROCESSORS_ONLN) returns 1
> 3. sched_getaffinity() returns the mask 00000001
> 4. sched_setaffinity(00000001) return success, but then sched_getcpu() returns 7(!) Should have returned 0.
> 
> Another example (container with 2 CPUs):
> 
> 1. sysconf(_SC_NPROCESSORS_CONF) returns 2
> 2. sysconf(_SC_NPROCESSORS_ONLN) returns 2
> 3. sched_getaffinity() returns the mask 00000011
> 4. sched_setaffinity(00000001) returns success, but then sched_getcpu() returns 2(!). Should have returned 0.
> 5. sched_setaffinity(00000010) returns success, but then sched_getcpu() also returns 2(!). Should have returned 1.
> 
> It looks like CPUs are virtualized on some level, but not in sched_getcpu(). I'm guessing sched_getcpu() is returning the CPU id of the physical CPU, and not the virtual CPU, or something. So in the last example, maybe both virtual CPUs were scheduled on the same physical CPU.

> Does sched_getaffinity actually change the affinity mask?

(assuming you meant sched_setaffinity here...)

You're seem to be right. sched_setaffinity() returns success, but a following call to sched_getaffinity() shows it had no effect.

> I wonder if it just reports a 2**N - 1 unconditionally, with N being the
> number of configured vCPUs for the container.  It probably does that so
> that the population count of the affinity mask matches the vCPU count.
> Likewise for the CPU entries under /sys (currently ignored by glibc
> because of a parser bug) and /proc/stat (the fallback actually used by
> glibc).  There is no virtualization of CPU IDs whatsoever, it looks like
> it's all done to communicate the vCPU count, without taking into account
> how badly this interacts with sched_getcpu.

Yep, that's what it looks like.

-------------

PR: https://git.openjdk.java.net/jdk16/pull/124