RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7]
Jorn Vernee
jvernee at openjdk.org
Wed Oct 26 20:57:44 UTC 2022
On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics.
>>
>> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522
>>
>> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now.
>>
>> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions.
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
>
> 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic
I agree with David. Unconditionally doing a check on every call seems to be overkill, since it's a mostly theoretical problem at this point, and in general I think we should be able to assume that foreign code respects the ABI.
There are other things that can go wrong as well, such as foreign code installing a signal handler, which can break implicit null checks. Other things like the foreign code returning with corrupted register state, which then leads to further corruption, is also a possibility. i.e. there seem to be many more things that can go wrong if we expect native code to violate the ABI.
Even though the check can be pretty fast, we've seen that people watch the performance in this area closely, and care about every nanosecond spent here. On my own box, the `panama_blank` benchmark takes just 3.4ns, so the relative overhead could be larger depending on the machine, it seems. There was also recently a flag added to speed up native calls, namely `-XX:+UseSystemMemoryBarrier`. This could further make the relative overhead of a check larger.
All in all, I think `-Xcheck:jni` is a better place to test this kind of stuff, and encourage people to run tests with `-Xcheck:jni` before deploying to production.
But, at the same time, loading libraries is a known problematic situation, and there the performance matters far less. I'd say always checking and restoring the FPU control state, and perhaps emitting a warning message to spur people on to fix the issue in the long term, seems like a good solution to me.
-------------
PR: https://git.openjdk.org/jdk/pull/10661
More information about the build-dev
mailing list