RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v5]

Wed Oct 12 16:13:12 UTC 2022

On Wed, 12 Oct 2022 14:37:37 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics.
>> 
>> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522
>> 
>> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. 
>> 
>> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions.
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8295159: DSO created with -ffast-math breaks Java floating-point arithmetic

I think the Java arithmetic correctness is the thing we are after. Allowing a stray library to break it seems very wrong. The alternative to this fix would be detecting the "bad" FPU flags like FTZ/DAZ, provided we can do that without messing with a whole lot of arch-specific code, and then asking users to opt-in for potentially breaking behavior.

But I believe current patch is the minimal thing we should do. Some nits:

src/hotspot/os/linux/os_linux.cpp line 113:

> 111: # include <sys/times.h>
> 112: # include <sys/types.h>
> 113: # include <sys/utsname.h>

Do we need to shuffle^W sort includes in this patch? I presume you'd want this patch to be cleanly backportable, which means it should probably be as point-y as it can get.

src/hotspot/os/linux/os_linux.cpp line 1747:

> 1745: void * os::Linux::dlopen_helper(const char *filename, char *ebuf,
> 1746:                                 int ebuflen) {
> 1747:   // JDK-8295159: Protect floating-point environment.

We need to be more verbose in these comments. Say something like:

There are known cases where global library initialization sets the FPU flags
that affect computation accuracy, for example, enabling Flush-To-Zero and
Denormals-Are-Zero. Do not let those libraries break the Java arithmetic.
Unfortunately, this might break the libraries that might depend on these FPU
features for performance and/or numerical "accuracy", but we need to protect 
the Java semantics first and foremost. See JDK-8295159.

src/hotspot/os/linux/os_linux.cpp line 1750:

> 1748:   fenv_t curr_fenv;
> 1749:   int rtn = fegetenv(&curr_fenv);
> 1750:   assert(rtn == 0, "must be.");

Suggestion:

  assert(rtn == 0, "fegetnv must succeed");

src/hotspot/os/linux/os_linux.cpp line 1753:

> 1751:   void * result = ::dlopen(filename, RTLD_LAZY);
> 1752:   rtn = fesetenv(&curr_fenv);
> 1753:   assert(rtn == 0, "must be.");

Suggestion:

  assert(rtn == 0, "fesetenv must succeed");

test/hotspot/jtreg/compiler/floatingpoint/TestDenormalFloat.java line 28:

> 26:  * @bug 8295159
> 27:  * @summary DSO created with -ffast-math breaks Java floating-point arithmetic
> 28:  * @run main/othervm compiler.floatingpoint.TestDenormalFloat

Should it have `/native` somewhere here?

test/hotspot/jtreg/compiler/floatingpoint/TestDenormalFloat.java line 39:

> 37: //      at compiler.floatingpoint.TestDenormalFloat.testFloats(TestDenormalFloat.java:47)
> 38: //      at compiler.floatingpoint.TestDenormalFloat.main(TestDenormalFloat.java:57)
> 39: 

This comment is redundant, I think. There seem to be little point to run these tests separately?

test/hotspot/jtreg/compiler/floatingpoint/TestDenormalFloat.java line 60:

> 58:     public static void main(String[] args) {
> 59:         testFloats();
> 60:         System.out.println("Loading libfast-math.so");

I propose we have the same logging statement in `TestDoubles`.

-------------

PR: https://git.openjdk.org/jdk/pull/10661