[jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2]

David Holmes david.holmes at oracle.com
Thu Jan 21 06:43:12 UTC 2021


Hi Per,

On 20/01/2021 11:16 pm, Per Liden wrote:
> It seems there have been e-mails sent that didn't show up here, so I'm answering on GitHub to hopefully re-attach the discussion to this PR.
> 
>  From the mailing list:
>>> Glibc's tst-getcpu.c (which I assume is the test you are referring
>>> to?) fails in their environment, so it seems like the affinity mask
>>> isn't reliable either.
>>
>> What's the nature of the failure?  If it's due to a non-changing
>> affinity mask, then using sched_getaffinity data would still be okay.
> 
> Glibc's tst-getcpu fails with some version of "getcpu results X should be Y".
> 
> There seems to be a disconnect between CPU masks/affinity and what sched_getcpu() returns.
> 
> Example (container with 1 CPU):
> 
> 1. sysconf(_SC_NPROCESSORS_CONF) returns 1
> 2. sysconf(_SC_NPROCESSORS_ONLN) returns 1
> 3. sched_getaffinity() returns the mask 00000001
> 4. sched_setaffinity(00000001) return success, but then sched_getcpu() returns 7(!) Should have returned 0.
> 
> Another example (container with 2 CPUs):
> 
> 1. sysconf(_SC_NPROCESSORS_CONF) returns 2
> 2. sysconf(_SC_NPROCESSORS_ONLN) returns 2
> 3. sched_getaffinity() returns the mask 00000011
> 4. sched_setaffinity(00000001) returns success, but then sched_getcpu() returns 2(!). Should have returned 0.
> 5. sched_setaffinity(00000010) returns success, but then sched_getcpu() also returns 2(!). Should have returned 1.
> 
> It looks like CPUs are virtualized on some level, but not in sched_getcpu(). I'm guessing sched_getcpu() is returning the CPU id of the physical CPU, and not the virtual CPU, or something. So in the last example, maybe both virtual CPUs were scheduled on the same physical CPU.
> 

So it isn't that sysconf(_SC_NPROCESSORS_CONF) returns a too low number 
as stated in the PR but rather that after calling sched_setaffinity, 
sched_getcpu is broken? Either way won't that breakage also potentially 
affect the NUMA code as well?

Thanks,
David

> -------------
> 
> PR: https://git.openjdk.java.net/jdk16/pull/124
> 



More information about the hotspot-gc-dev mailing list