Problems with double reminder on Windows/x86_64 and Visual Studio 2005

Wed Aug 27 08:01:47 PDT 2008

Hi David,

I don't think that the problems are related although they are similar.
In my case the problem is caused by the FPSW (on Windows, on Linux in
gdb it's "fstat") register which is the status word for the FPU
(formerly known as x87 and later as MMX) registers.

The problem is that MSVC apperently doesn't support these registers
any more in the sense that it doesn't generate code which uses them.
This is probably also the reason why there "_clearfp()" function does
not clear FPSW register but only the more modern MXCSR register (which
is the status register for the XMM register set (also known as
SSE2/SSE3)).

On the other hand, Microsoft is STILL using the old FPU registers in
their runtime functions (particularly in the "fmod()" function). And
this usage interfers with the usage of the old FPU registers by the
code generated from the HotSpot JIT compiler. I have no idea how the
ABI convention for the FPSW register are and if such conventions even
exist (I would be thankful for any pointer here). As Tom commented,
the code generated by the JIT is aware of this behaviour but other
C/library/JNI code may not.

Regards,
Volker

On 8/27/08, David Holmes - Sun Microsystems <David.Holmes at sun.com> wrote:
> Hi Volker,
>
>  I don't know if this is related, or just coincidental timing but a new bug
> report has just been filed:
>
>  6741940 Nonvolatile XMM registers not preserved across JNI calls
>
>  "Calls to the JNI entry point "CallVoidMethod" [in test program] do not
> preserve the nonvolatile XMM registers, unless running with -Xint.  This is
> in violation of the Windows 64-bit ABI:
>  http://msdn.microsoft.com/en-us/library/ms794547.aspx"
>
>  This won't show up on BugParade for a day or so.
>
>  Regards,
>  David Holmes
>
>  Volker Simonis said the following on 08/26/08 05:19:
>
>
> > Hi,
> >
> > we had a strange problem wich lead to failures in the JCK test
> > Math2012. The problem only occured if some other JCK-Tests where
> > compiled and  execuetd in a special order before the tests in
> > Math2012.
> >
> > I could finally track down the problem to the following simple test case:
> >
> > ====================================================
> > public class Log10 {
> >
> >  public static double log10(double d) {
> >    return Math.log10(d);
> >  }
> >
> >  public static double drem2(double d) {
> >    return d % 2;
> >  }
> >
> >  public static void main(String args[]) {
> >    System.out.println("log10(0) = " + Math.log10(0.0d));
> >    System.out.println("log10(0) = " + log10(0.0d));
> >    System.out.println("drem2(4.0) = " + drem2(4.0d));
> >  }
> > }
> > ====================================================
> >
> > which always fails on Windows/x86_64 (i.e. prints "NaN" for the result
> > of 4.0 % 2.0 which should be 0.0) if executed like this:
> >
> > java -Xcomp -Xbatch -XX:CompileCommand="compileonly Log10
> log10"
> > -XX:+PrintCompilation Log10
> >
> > VM option 'CompileCommand=compileonly Log10 log10'
> > VM option '+PrintCompilation'
> > CompilerOracle: compileonly Log10.log10
> > log10(0) = -Infinity
> >  1   b   Log10::log10 (5 bytes)
> > log10(0) = -Infinity
> > drem2(4.0) = NaN
> >
> > Notice however that we are using a version of the Java 6 HotSpot
> > compiled with Visual Studio 2005.
> >
> > I couldn't reproduce the problem with
> > jdk-7-ea-bin-b32-windows-x64-debug-04_aug_2008 however I
> could verify
> > that the code generated by both, our JDK 6 and the latest jdk-7 is
> > virtually the same. The interesting part is the compiled version of
> > the method Log10.log10():
> >
> > 000     pushq   rbp
> >        subq    rsp, #16        # Create frame
> >        nop     # nop for patch_verified_entry
> > 006     fldlg2                  #Log10
> >        fyl2x                   # Q=Log10*Log_2(x)
> > 024     addq    rsp, 16 # Destroy frame
> >        popq    rbp
> >        testl   rax, [rip + #offset_to_poll_page]       # Safepoint: poll
> for GC
> > 02f     ret
> >
> > The computation of Math.log10(0.0d) which correctly returns -Infinity
> > sets the "Zero Divide" flag in the FP status word as described in
> >
> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26569.pdf.
> >
> > After this computation "SharedRuntime::drem()" is called for the
> > computation of the double reminder in the method "drem2()" of the
> > above example. "SharedRuntime::drem()" itself just delegates the
> > computation to the "fmod()" function (defined in <math.h>) of the
> > underlying platform.
> >
> > The presence of the "Zero Divide" flag in the FP status word seems to
> > be no problem for the "fmod()" which is used by the
> > jdk-7-ea-bin-b32-windows-x64-debug-04_aug_2008 executable
> (from
> > msvcr71.dll) and it is no problem on Linux/x86_64 either, but it IS
> > definitely a problem for the "fmod()" from the "msvcr80d.dll" which is
> > used in our MSVC 2005 build.
> >
> > I have two questions now:
> >
> > 1. Is it ok that the intrinsic for Math.log10() leaves the exceptions
> > bits as they are in the FP status word?
> > 2. Can somebody confirm that the described behaviour of "fmod()" from
> > "msvcr80d.dll" as used by MSVC 2005 is buggy? (I couldn't find any bug
> > report and I also couldn't find reference if "fmod() should depend on
> > the FP status word or not.)
> >
> > It would also be nice if somebody who has recent OpenJDK built with
> > MSVC 2005 could confirm the above problem or if somebody could just
> > confirm or disprove the "fmod()" problem within different versions of
> > MSVC. Here's a small C-program which can be used to test if "fmod()"
> > is dependent on the FP status word:
> >
> > ====================== fmod.c ====================
> > #include <math.h>
> > #include <stdio.h>
> >
> > extern void fpu_asm();
> >
> > int main(int argc, char* argv[]) {
> >
> >    double d = 0.0;
> >
> >    printf("fmod(4.0, 2.0) = %f\n",  fmod(4.0, 2.0));
> >
> >    fpu_asm();
> >
> >    printf("fmod(4.0, 2.0) = %f\n",  fmod(4.0, 2.0));
> >
> > }
> > =====================================================
> >
> > ===================== fpu_asm.asm ====================
> > PUBLIC fpu_asm
> > .CODE
> >        ALIGN   8
> > fpu_asm PROC
> >        fldlg2
> >        fldz
> >        fyl2x
> >        ret
> >        ALIGN 8
> > fpu_asm ENDP
> > END
> > ======================================================
> >
> > Compile and run with:
> >
> > ml64 /c fpu_asm.asm
> > cl fmod.c fpu_asm.obj
> > fmod.exe
> > fmod(4.0, 2.0) = 0.000000
> > fmod(4.0, 2.0) = -1.#IND00
> >
> > Regards,
> > Volker
> >
> > PS: the obvious solution of calling "_clearfp()" as defined in
> > <float.h> just before a call to "fmod()" unfortunately doesn't work,
> > because "_clearfp()" (at least in MSVC 2005) only cleans the SSE
> > status register MXCSR. The only solution I see right now is using the
> > FCLEX assembler instruction, and because MSVC 2005 has no inline
> > assembler for x86_64 I'll probably have to write the whole assembler
> > function for the assembler instruction. Or does somebody have a
> > smarter solution?
> >
> > PPS: this is a nice example, how a compiler switch can get you a lot
> > of fun (isn't it Kelly:) ...
> >
>