[jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2]
Ruslan Synytsky
rs at jelastic.com
Mon Jan 25 09:20:06 UTC 2021
Hi, sharing comments provided by Virtuozzo team (cc'd).
*Question (**Florian**):** It would be good to have someone from Virtuozzo
comment to indicate whether the affinity mask is actually reliable for
this. But they will see test failures in low-level test suites if the
affinity mask and sched_getcpu are incompatible (I actually wrote a glibc
test case for this).*
*Answer (**Denis**): Syscall sched_setaffinity is not working inside
containers. On one hand we can not return error as this will immediately
break a lot of software, on the other hand we could not allow to bind the
process to the specific CPU as in this case we could have DoS attack
vector. Thus it returns success, but actually does nothing. The rest is the
consequence.*
Hope it's helpful.
Regards
> ---------- Forwarded message ----------
> From: David Holmes <david.holmes at oracle.com>
> To: Per Liden <pliden at openjdk.java.net>, hotspot-gc-dev at openjdk.java.net,
> hotspot-runtime-dev at openjdk.java.net
> Cc:
> Bcc:
> Date: Mon, 25 Jan 2021 07:07:23 +1000
> Subject: Re: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id
> reported by the operating system [v2]
> On 22/01/2021 9:21 pm, Per Liden wrote:
> > On Sat, 16 Jan 2021 13:00:04 GMT, David Holmes <dholmes at openjdk.org>
> wrote:
> >
> >>> Per Liden has updated the pull request incrementally with one
> additional commit since the last revision:
> >>>
> >>> Review
> >>
> >> So we have to penalize all correctly functioning users because of one
> broken environment? Can we not detect this broken environment at startup
> and inject a workaround then?
> >>
> >> Why is this an environment that is important enough that OpenJDK has to
> make changes to deal with a broken environment?
> >>
> >> Cheers,
> >> David
> >
> > @dholmes-ora Do you still have questions or concerns here, or can I go
> ahead and integrate this?
>
> I remain concerned about the justification for putting in this
> workaround for a broken virtualization system. I would be happier if the
> bug was acknowledged and a fix was in the pipeline so we would know how
> long we have to carry this for.
>
> > I've gone through all uses of sysconf(_SC_NPROCESSORS_*) and
> sched_getaffinity() we have, and they look fine. I've also looked at how
> the OSContainer stuff behaves in this environment, and it also looks fine.
> In summary, the only problem I can spot is related to sched_getcpu().
>
> So IIUC what we suspect is that sched_getcpu is reporting physical id's
> rather than virtualized ones. I find it hard to imagine how only one API
> in this area can be affected by such a bug, but if that appears to be
> the case then that is reassuring.
>
> I won't "block" this, but I'm not happy about it.
>
> Thanks,
> David
>
> > -------------
> >
> > PR: https://git.openjdk.java.net/jdk16/pull/124
> >
>
--
Ruslan Synytsky
CEO @ Jelastic Multi-Cloud PaaS <https://jelastic.com/>
More information about the hotspot-gc-dev
mailing list