[crac] RFR: RFC: -XX:CPUFeatures=0xnumber for CPU migration [v2]

Jan Kratochvil duke at openjdk.org
Thu Feb 2 06:11:56 UTC 2023


On Wed, 1 Feb 2023 14:32:01 GMT, Dan Heidinga <heidinga at openjdk.org> wrote:

> I'm concerned about the maintainability and correctness of the approach though, when running on different systems / glibcs and over time as glibc is recompiled.

Me too. This is why I also called it just "RFC". But it is not up to me whether this temporary and/or a bit fragile solution gets accepted for CRaC or not.

> Ashutosh suggested an alternative approach back in Feb 2022 [0] [1] of providing a single "-XX:[+-]PortableCode" with a common enough baseline set of features

IIUC you can just use `-XX:CPUFeatures=0` with this patch for the same effect. As my employer is all about Java Performance I find useful to provide the option of more optimal runtime JIT features.

> and the use of GLIBC_TUNABLES to control the glibc bindings.

This is what I say in Comment 0:

> If upstream glibc maintainers do not like the IFUNC reset idea [...]. In such case I believe one should switch to using GLIBC_TUNABLES environment variable, [...]

> This is very similar to the approach the OpenJ9 team has taken in their CRIU efforts and have found that it works reasonable well for the use cases they have targeted (mostly restore inside a container).

Could you send some more specific link for that functionality?

> src/hotspot/os/linux/os_linux.cpp line 6177:
> 
>> 6175:   assert(sig == RESTORE_SIGNAL, "got what requested");
>> 6176: 
>> 6177:   linux_ifunc_reset();
> 
> Assuming we're freezing all the threads to allow resetting the linux_ifuncs, it would be good to comment explicitly about why that's the right approach and/or necessary.
> 
> Also, what if we stopped a thread that was about to be call one of these ifuncs - that had already loaded the old address into a register - before we did the reset?  Wouldn't it still have the wrong value and with the right (wrong?) thread scheduling, still abort after the restore?
> 
> Basically, I think there is probably a race here still.

You are right, there is a race, I will fix that, thanks.

> src/hotspot/os/linux/os_linux_ifunc.cpp line 43:
> 
>> 41: // These offsets are for Debian 12 x86_64:
>> 42: // gdb -batch /lib64/ld-linux-x86-64.so.2 -ex
>> 43: static unsigned   l_scope_offset = 0x3b0;
> 
> Committing this into the repo seems very suspect unless we have some guarantee that these offsets are stable.  Is there such a guarantee?

These offsets are not stable, I hope they won't change in one stable OS release but still they can. If there are any problems one can always install `*-debuginfo.rpm` or `*-dbg.deb` which will override all these assumed values. Maybe if there is no debuginfo available there should be a database of `libc.so.`6 binary hashes and refuse to run if none is found.
Still I find this patch only as a proof of concept and a some codebase to ask upstream glibc for inclusion of an IFUNC-reset functionality into glibc mainline. As I stated in Comment 0 if glibc refuses such upstreaming I believe CRaC should rather use `GLIBC_TUNABLES` anyway.
This IFUNC-reset functionality looked simple originally but in the end it got much more complicated than I expected. I would not probably implement it if I could see it will not be simple enough. But when it was already mostly written I have finished it.

> src/hotspot/os/linux/os_linux_ifunc.cpp line 145:
> 
>> 143:   unsigned sht;
>> 144: };
>> 145: static int symtab_lookup_iterate_phdr(struct dl_phdr_info *info, size_t size, void *data_voidp) {
> 
> This seems like some really cool "I made it work"-style hacking and something that would be utterly unsupportable long term.
> 
> I'd be pretty concerned if we merged this approach into the repo in this form.  If glibc provided an API to reset the ifunc bindings, then I'd be way more comfortable with the approach.

Already discussed above.

-------------

PR: https://git.openjdk.org/crac/pull/41


More information about the crac-dev mailing list