Problems with double reminder on Windows/x86_64 and Visual Studio 2005

Tue Sep 16 01:46:47 PDT 2008

On 9/16/08, Joseph D. Darcy <Joe.Darcy at sun.com> wrote:
> Volker Simonis wrote:
>
> > Hi David,
> >
> > I don't think that the problems are related although they are similar.
> > In my case the problem is caused by the FPSW (on Windows, on Linux in
> > gdb it's "fstat") register which is the status word for the FPU
> > (formerly known as x87 and later as MMX) registers.
> >
> > The problem is that MSVC apperently doesn't support these registers
> > any more in the sense that it doesn't generate code which uses them.
> > This is probably also the reason why there "_clearfp()" function does
> > not clear FPSW register but only the more modern MXCSR register (which
> > is the status register for the XMM register set (also known as
> > SSE2/SSE3)).
> >
> > On the other hand, Microsoft is STILL using the old FPU registers in
> > their runtime functions (particularly in the "fmod()" function). And
> > this usage interfers with the usage of the old FPU registers by the
> > code generated from the HotSpot JIT compiler. I have no idea how the
> > ABI convention for the FPSW register are and if such conventions even
> > exist (I would be thankful for any pointer here). As Tom commented,
> > the code generated by the JIT is aware of this behaviour but other
> > C/library/JNI code may not.
> >
> >
>
>  IIRC, the ABI specification for x64 allows the x87 control word to *not* be
> saved/restored across context switches in 64-bit mode depending on the OS.
> Again IIRC, the Unix ABI for x64 *does* preserve the x87 control word in
> these cases but the Windows ABI does not.
>

Hi Joe,

I don't understand what do you mean by context switch - do you mean
function call?

My experiments show that on Windows, the control word is indeed not
preserved across function calls. If I call "fmod()" a second time in
the example from my previous mail, the second call returns the correct
result because the first call cleared the "Zero Divide" flag in the
control word.

The question is only, if fmod() should be dependant on the control
word at all or if it should not better clear it right after the
invocation?

By the way, I've checked my example with MSCV 2008 now and it fails as
well. This means that if you're going to switch to MSVC 2008 (as
discussed on the lists recently), you'll face this problem as well.
I've fixed it, by introducing a new stub routine
"clear_fp_exceptions()" in  "stubRoutines_amd64.{cpp,hpp}" which
simply executes WAIT and FNCLEX to clear the pending floating point
exceptions. This stub is called in "sharedruntime.cpp" in
SharedRuntime::drem() just before the call to ::fmod() if on Windows.
If youre interested, I can post the patch to the list.

Regards,
Volker

>  -Joe
>
>
>
> > Regards,
> > Volker
> >
> > On 8/27/08, David Holmes - Sun Microsystems <David.Holmes at sun.com> wrote:
> >
> >
> > > Hi Volker,
> > >
> > >  I don't know if this is related, or just coincidental timing but a new
> bug
> > > report has just been filed:
> > >
> > >  6741940 Nonvolatile XMM registers not preserved across JNI calls
> > >
> > >  "Calls to the JNI entry point "CallVoidMethod" [in test program] do not
> > > preserve the nonvolatile XMM registers, unless running with -Xint.  This
> is
> > > in violation of the Windows 64-bit ABI:
> > >  http://msdn.microsoft.com/en-us/library/ms794547.aspx"
> > >
> > >  This won't show up on BugParade for a day or so.
> > >
> > >  Regards,
> > >  David Holmes
> > >
> > >  Volker Simonis said the following on 08/26/08 05:19:
> > >
> > >
> > >
> > >
> > > > Hi,
> > > >
> > > > we had a strange problem wich lead to failures in the JCK test
> > > > Math2012. The problem only occured if some other JCK-Tests where
> > > > compiled and  execuetd in a special order before the tests in
> > > > Math2012.
> > > >
> > > > I could finally track down the problem to the following simple test
> case:
> > > >
> > > > ====================================================
> > > > public class Log10 {
> > > >
> > > >  public static double log10(double d) {
> > > >   return Math.log10(d);
> > > >  }
> > > >
> > > >  public static double drem2(double d) {
> > > >   return d % 2;
> > > >  }
> > > >
> > > >  public static void main(String args[]) {
> > > >   System.out.println("log10(0) = " + Math.log10(0.0d));
> > > >   System.out.println("log10(0) = " + log10(0.0d));
> > > >   System.out.println("drem2(4.0) = " + drem2(4.0d));
> > > >  }
> > > > }
> > > > ====================================================
> > > >
> > > > which always fails on Windows/x86_64 (i.e. prints "NaN" for the result
> > > > of 4.0 % 2.0 which should be 0.0) if executed like this:
> > > >
> > > > java -Xcomp -Xbatch -XX:CompileCommand="compileonly
> Log10
> > > >
> > > >
> > > log10"
> > >
> > >
> > > > -XX:+PrintCompilation Log10
> > > >
> > > > VM option 'CompileCommand=compileonly Log10 log10'
> > > > VM option '+PrintCompilation'
> > > > CompilerOracle: compileonly Log10.log10
> > > > log10(0) = -Infinity
> > > >  1   b   Log10::log10 (5 bytes)
> > > > log10(0) = -Infinity
> > > > drem2(4.0) = NaN
> > > >
> > > > Notice however that we are using a version of the Java 6 HotSpot
> > > > compiled with Visual Studio 2005.
> > > >
> > > > I couldn't reproduce the problem with
> > > > jdk-7-ea-bin-b32-windows-x64-debug-04_aug_2008
> however I
> > > >
> > > >
> > > could verify
> > >
> > >
> > > > that the code generated by both, our JDK 6 and the latest jdk-7 is
> > > > virtually the same. The interesting part is the compiled version of
> > > > the method Log10.log10():
> > > >
> > > > 000     pushq   rbp
> > > >       subq    rsp, #16        # Create frame
> > > >       nop     # nop for patch_verified_entry
> > > > 006     fldlg2                  #Log10
> > > >       fyl2x                   # Q=Log10*Log_2(x)
> > > > 024     addq    rsp, 16 # Destroy frame
> > > >       popq    rbp
> > > >       testl   rax, [rip + #offset_to_poll_page]       # Safepoint:
> poll
> > > >
> > > >
> > > for GC
> > >
> > >
> > > > 02f     ret
> > > >
> > > > The computation of Math.log10(0.0d) which correctly returns -Infinity
> > > > sets the "Zero Divide" flag in the FP status word as described in
> > > >
> > > >
> > > >
> > >
> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26569.pdf.
> > >
> > >
> > > > After this computation "SharedRuntime::drem()" is called for the
> > > > computation of the double reminder in the method "drem2()" of the
> > > > above example. "SharedRuntime::drem()" itself just delegates the
> > > > computation to the "fmod()" function (defined in <math.h>) of the
> > > > underlying platform.
> > > >
> > > > The presence of the "Zero Divide" flag in the FP status word seems to
> > > > be no problem for the "fmod()" which is used by the
> > > > jdk-7-ea-bin-b32-windows-x64-debug-04_aug_2008
> executable
> > > >
> > > >
> > > (from
> > >
> > >
> > > > msvcr71.dll) and it is no problem on Linux/x86_64 either, but it IS
> > > > definitely a problem for the "fmod()" from the "msvcr80d.dll" which is
> > > > used in our MSVC 2005 build.
> > > >
> > > > I have two questions now:
> > > >
> > > > 1. Is it ok that the intrinsic for Math.log10() leaves the exceptions
> > > > bits as they are in the FP status word?
> > > > 2. Can somebody confirm that the described behaviour of "fmod()" from
> > > > "msvcr80d.dll" as used by MSVC 2005 is buggy? (I couldn't find any bug
> > > > report and I also couldn't find reference if "fmod() should depend on
> > > > the FP status word or not.)
> > > >
> > > > It would also be nice if somebody who has recent OpenJDK built with
> > > > MSVC 2005 could confirm the above problem or if somebody could just
> > > > confirm or disprove the "fmod()" problem within different versions of
> > > > MSVC. Here's a small C-program which can be used to test if "fmod()"
> > > > is dependent on the FP status word:
> > > >
> > > > ====================== fmod.c ====================
> > > > #include <math.h>
> > > > #include <stdio.h>
> > > >
> > > > extern void fpu_asm();
> > > >
> > > > int main(int argc, char* argv[]) {
> > > >
> > > >   double d = 0.0;
> > > >
> > > >   printf("fmod(4.0, 2.0) = %f\n",  fmod(4.0, 2.0));
> > > >
> > > >   fpu_asm();
> > > >
> > > >   printf("fmod(4.0, 2.0) = %f\n",  fmod(4.0, 2.0));
> > > >
> > > > }
> > > > =====================================================
> > > >
> > > > ===================== fpu_asm.asm ====================
> > > > PUBLIC fpu_asm
> > > > .CODE
> > > >       ALIGN   8
> > > > fpu_asm PROC
> > > >       fldlg2
> > > >       fldz
> > > >       fyl2x
> > > >       ret
> > > >       ALIGN 8
> > > > fpu_asm ENDP
> > > > END
> > > >
> ======================================================
> > > >
> > > > Compile and run with:
> > > >
> > > > ml64 /c fpu_asm.asm
> > > > cl fmod.c fpu_asm.obj
> > > > fmod.exe
> > > > fmod(4.0, 2.0) = 0.000000
> > > > fmod(4.0, 2.0) = -1.#IND00
> > > >
> > > > Regards,
> > > > Volker
> > > >
> > > > PS: the obvious solution of calling "_clearfp()" as defined in
> > > > <float.h> just before a call to "fmod()" unfortunately doesn't work,
> > > > because "_clearfp()" (at least in MSVC 2005) only cleans the SSE
> > > > status register MXCSR. The only solution I see right now is using the
> > > > FCLEX assembler instruction, and because MSVC 2005 has no inline
> > > > assembler for x86_64 I'll probably have to write the whole assembler
> > > > function for the assembler instruction. Or does somebody have a
> > > > smarter solution?
> > > >
> > > > PPS: this is a nice example, how a compiler switch can get you a lot
> > > > of fun (isn't it Kelly:) ...
> > > >
> > > >
> > > >
> > >
> >
>
>