Problems with double reminder on Windows/x86_64 and Visual Studio 2005

Mon Sep 15 19:56:54 PDT 2008

Volker Simonis wrote:
> Hi David,
>
> I don't think that the problems are related although they are similar.
> In my case the problem is caused by the FPSW (on Windows, on Linux in
> gdb it's "fstat") register which is the status word for the FPU
> (formerly known as x87 and later as MMX) registers.
>
> The problem is that MSVC apperently doesn't support these registers
> any more in the sense that it doesn't generate code which uses them.
> This is probably also the reason why there "_clearfp()" function does
> not clear FPSW register but only the more modern MXCSR register (which
> is the status register for the XMM register set (also known as
> SSE2/SSE3)).
>
> On the other hand, Microsoft is STILL using the old FPU registers in
> their runtime functions (particularly in the "fmod()" function). And
> this usage interfers with the usage of the old FPU registers by the
> code generated from the HotSpot JIT compiler. I have no idea how the
> ABI convention for the FPSW register are and if such conventions even
> exist (I would be thankful for any pointer here). As Tom commented,
> the code generated by the JIT is aware of this behaviour but other
> C/library/JNI code may not.
>   

IIRC, the ABI specification for x64 allows the x87 control word to *not* 
be saved/restored across context switches in 64-bit mode depending on 
the OS.  Again IIRC, the Unix ABI for x64 *does* preserve the x87 
control word in these cases but the Windows ABI does not.

-Joe

> Regards,
> Volker
>
> On 8/27/08, David Holmes - Sun Microsystems <David.Holmes at sun.com> wrote:
>   
>> Hi Volker,
>>
>>  I don't know if this is related, or just coincidental timing but a new bug
>> report has just been filed:
>>
>>  6741940 Nonvolatile XMM registers not preserved across JNI calls
>>
>>  "Calls to the JNI entry point "CallVoidMethod" [in test program] do not
>> preserve the nonvolatile XMM registers, unless running with -Xint.  This is
>> in violation of the Windows 64-bit ABI:
>>  http://msdn.microsoft.com/en-us/library/ms794547.aspx"
>>
>>  This won't show up on BugParade for a day or so.
>>
>>  Regards,
>>  David Holmes
>>
>>  Volker Simonis said the following on 08/26/08 05:19:
>>
>>
>>     
>>> Hi,
>>>
>>> we had a strange problem wich lead to failures in the JCK test
>>> Math2012. The problem only occured if some other JCK-Tests where
>>> compiled and  execuetd in a special order before the tests in
>>> Math2012.
>>>
>>> I could finally track down the problem to the following simple test case:
>>>
>>> ====================================================
>>> public class Log10 {
>>>
>>>  public static double log10(double d) {
>>>    return Math.log10(d);
>>>  }
>>>
>>>  public static double drem2(double d) {
>>>    return d % 2;
>>>  }
>>>
>>>  public static void main(String args[]) {
>>>    System.out.println("log10(0) = " + Math.log10(0.0d));
>>>    System.out.println("log10(0) = " + log10(0.0d));
>>>    System.out.println("drem2(4.0) = " + drem2(4.0d));
>>>  }
>>> }
>>> ====================================================
>>>
>>> which always fails on Windows/x86_64 (i.e. prints "NaN" for the result
>>> of 4.0 % 2.0 which should be 0.0) if executed like this:
>>>
>>> java -Xcomp -Xbatch -XX:CompileCommand="compileonly Log10
>>>       
>> log10"
>>     
>>> -XX:+PrintCompilation Log10
>>>
>>> VM option 'CompileCommand=compileonly Log10 log10'
>>> VM option '+PrintCompilation'
>>> CompilerOracle: compileonly Log10.log10
>>> log10(0) = -Infinity
>>>  1   b   Log10::log10 (5 bytes)
>>> log10(0) = -Infinity
>>> drem2(4.0) = NaN
>>>
>>> Notice however that we are using a version of the Java 6 HotSpot
>>> compiled with Visual Studio 2005.
>>>
>>> I couldn't reproduce the problem with
>>> jdk-7-ea-bin-b32-windows-x64-debug-04_aug_2008 however I
>>>       
>> could verify
>>     
>>> that the code generated by both, our JDK 6 and the latest jdk-7 is
>>> virtually the same. The interesting part is the compiled version of
>>> the method Log10.log10():
>>>
>>> 000     pushq   rbp
>>>        subq    rsp, #16        # Create frame
>>>        nop     # nop for patch_verified_entry
>>> 006     fldlg2                  #Log10
>>>        fyl2x                   # Q=Log10*Log_2(x)
>>> 024     addq    rsp, 16 # Destroy frame
>>>        popq    rbp
>>>        testl   rax, [rip + #offset_to_poll_page]       # Safepoint: poll
>>>       
>> for GC
>>     
>>> 02f     ret
>>>
>>> The computation of Math.log10(0.0d) which correctly returns -Infinity
>>> sets the "Zero Divide" flag in the FP status word as described in
>>>
>>>       
>> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26569.pdf.
>>     
>>> After this computation "SharedRuntime::drem()" is called for the
>>> computation of the double reminder in the method "drem2()" of the
>>> above example. "SharedRuntime::drem()" itself just delegates the
>>> computation to the "fmod()" function (defined in <math.h>) of the
>>> underlying platform.
>>>
>>> The presence of the "Zero Divide" flag in the FP status word seems to
>>> be no problem for the "fmod()" which is used by the
>>> jdk-7-ea-bin-b32-windows-x64-debug-04_aug_2008 executable
>>>       
>> (from
>>     
>>> msvcr71.dll) and it is no problem on Linux/x86_64 either, but it IS
>>> definitely a problem for the "fmod()" from the "msvcr80d.dll" which is
>>> used in our MSVC 2005 build.
>>>
>>> I have two questions now:
>>>
>>> 1. Is it ok that the intrinsic for Math.log10() leaves the exceptions
>>> bits as they are in the FP status word?
>>> 2. Can somebody confirm that the described behaviour of "fmod()" from
>>> "msvcr80d.dll" as used by MSVC 2005 is buggy? (I couldn't find any bug
>>> report and I also couldn't find reference if "fmod() should depend on
>>> the FP status word or not.)
>>>
>>> It would also be nice if somebody who has recent OpenJDK built with
>>> MSVC 2005 could confirm the above problem or if somebody could just
>>> confirm or disprove the "fmod()" problem within different versions of
>>> MSVC. Here's a small C-program which can be used to test if "fmod()"
>>> is dependent on the FP status word:
>>>
>>> ====================== fmod.c ====================
>>> #include <math.h>
>>> #include <stdio.h>
>>>
>>> extern void fpu_asm();
>>>
>>> int main(int argc, char* argv[]) {
>>>
>>>    double d = 0.0;
>>>
>>>    printf("fmod(4.0, 2.0) = %f\n",  fmod(4.0, 2.0));
>>>
>>>    fpu_asm();
>>>
>>>    printf("fmod(4.0, 2.0) = %f\n",  fmod(4.0, 2.0));
>>>
>>> }
>>> =====================================================
>>>
>>> ===================== fpu_asm.asm ====================
>>> PUBLIC fpu_asm
>>> .CODE
>>>        ALIGN   8
>>> fpu_asm PROC
>>>        fldlg2
>>>        fldz
>>>        fyl2x
>>>        ret
>>>        ALIGN 8
>>> fpu_asm ENDP
>>> END
>>> ======================================================
>>>
>>> Compile and run with:
>>>
>>> ml64 /c fpu_asm.asm
>>> cl fmod.c fpu_asm.obj
>>> fmod.exe
>>> fmod(4.0, 2.0) = 0.000000
>>> fmod(4.0, 2.0) = -1.#IND00
>>>
>>> Regards,
>>> Volker
>>>
>>> PS: the obvious solution of calling "_clearfp()" as defined in
>>> <float.h> just before a call to "fmod()" unfortunately doesn't work,
>>> because "_clearfp()" (at least in MSVC 2005) only cleans the SSE
>>> status register MXCSR. The only solution I see right now is using the
>>> FCLEX assembler instruction, and because MSVC 2005 has no inline
>>> assembler for x86_64 I'll probably have to write the whole assembler
>>> function for the assembler instruction. Or does somebody have a
>>> smarter solution?
>>>
>>> PPS: this is a nice example, how a compiler switch can get you a lot
>>> of fun (isn't it Kelly:) ...
>>>
>>>