Problems with double reminder on Windows/x86_64 and Visual Studio 2005

Tue Sep 16 12:16:11 PDT 2008

Volker Simonis wrote:
> On 9/16/08, Joseph D. Darcy <Joe.Darcy at sun.com> wrote:
>   
>> Volker Simonis wrote:
>>
>>     
>>> Hi David,
>>>
>>> I don't think that the problems are related although they are similar.
>>> In my case the problem is caused by the FPSW (on Windows, on Linux in
>>> gdb it's "fstat") register which is the status word for the FPU
>>> (formerly known as x87 and later as MMX) registers.
>>>
>>> The problem is that MSVC apperently doesn't support these registers
>>> any more in the sense that it doesn't generate code which uses them.
>>> This is probably also the reason why there "_clearfp()" function does
>>> not clear FPSW register but only the more modern MXCSR register (which
>>> is the status register for the XMM register set (also known as
>>> SSE2/SSE3)).
>>>
>>> On the other hand, Microsoft is STILL using the old FPU registers in
>>> their runtime functions (particularly in the "fmod()" function). And
>>> this usage interfers with the usage of the old FPU registers by the
>>> code generated from the HotSpot JIT compiler. I have no idea how the
>>> ABI convention for the FPSW register are and if such conventions even
>>> exist (I would be thankful for any pointer here). As Tom commented,
>>> the code generated by the JIT is aware of this behaviour but other
>>> C/library/JNI code may not.
>>>
>>>
>>>       
>>  IIRC, the ABI specification for x64 allows the x87 control word to *not* be
>> saved/restored across context switches in 64-bit mode depending on the OS.
>> Again IIRC, the Unix ABI for x64 *does* preserve the x87 control word in
>> these cases but the Windows ABI does not.
>>
>>     
>
> Hi Joe,
>
> I don't understand what do you mean by context switch - do you mean
> function call?
>   

I was thinking of OS context switches between processes, but it is 
certainly also possible that the function call register conventions 
changed too in the 64-bit ABI.

-Joe

> My experiments show that on Windows, the control word is indeed not
> preserved across function calls. If I call "fmod()" a second time in
> the example from my previous mail, the second call returns the correct
> result because the first call cleared the "Zero Divide" flag in the
> control word.
>
> The question is only, if fmod() should be dependant on the control
> word at all or if it should not better clear it right after the
> invocation?
>
> By the way, I've checked my example with MSCV 2008 now and it fails as
> well. This means that if you're going to switch to MSVC 2008 (as
> discussed on the lists recently), you'll face this problem as well.
> I've fixed it, by introducing a new stub routine
> "clear_fp_exceptions()" in  "stubRoutines_amd64.{cpp,hpp}" which
> simply executes WAIT and FNCLEX to clear the pending floating point
> exceptions. This stub is called in "sharedruntime.cpp" in
> SharedRuntime::drem() just before the call to ::fmod() if on Windows.
> If youre interested, I can post the patch to the list.
>
> Regards,
> Volker
>
>
>   
>>  -Joe
>>
>>
>>
>>     
>>> Regards,
>>> Volker
>>>
>>> On 8/27/08, David Holmes - Sun Microsystems <David.Holmes at sun.com> wrote:
>>>
>>>
>>>       
>>>> Hi Volker,
>>>>
>>>>  I don't know if this is related, or just coincidental timing but a new
>>>>         
>> bug
>>     
>>>> report has just been filed:
>>>>
>>>>  6741940 Nonvolatile XMM registers not preserved across JNI calls
>>>>
>>>>  "Calls to the JNI entry point "CallVoidMethod" [in test program] do not
>>>> preserve the nonvolatile XMM registers, unless running with -Xint.  This
>>>>         
>> is
>>     
>>>> in violation of the Windows 64-bit ABI:
>>>>  http://msdn.microsoft.com/en-us/library/ms794547.aspx"
>>>>
>>>>  This won't show up on BugParade for a day or so.
>>>>
>>>>  Regards,
>>>>  David Holmes
>>>>
>>>>  Volker Simonis said the following on 08/26/08 05:19:
>>>>
>>>>
>>>>
>>>>
>>>>         
>>>>> Hi,
>>>>>
>>>>> we had a strange problem wich lead to failures in the JCK test
>>>>> Math2012. The problem only occured if some other JCK-Tests where
>>>>> compiled and  execuetd in a special order before the tests in
>>>>> Math2012.
>>>>>
>>>>> I could finally track down the problem to the following simple test
>>>>>           
>> case:
>>     
>>>>> ====================================================
>>>>> public class Log10 {
>>>>>
>>>>>  public static double log10(double d) {
>>>>>   return Math.log10(d);
>>>>>  }
>>>>>
>>>>>  public static double drem2(double d) {
>>>>>   return d % 2;
>>>>>  }
>>>>>
>>>>>  public static void main(String args[]) {
>>>>>   System.out.println("log10(0) = " + Math.log10(0.0d));
>>>>>   System.out.println("log10(0) = " + log10(0.0d));
>>>>>   System.out.println("drem2(4.0) = " + drem2(4.0d));
>>>>>  }
>>>>> }
>>>>> ====================================================
>>>>>
>>>>> which always fails on Windows/x86_64 (i.e. prints "NaN" for the result
>>>>> of 4.0 % 2.0 which should be 0.0) if executed like this:
>>>>>
>>>>> java -Xcomp -Xbatch -XX:CompileCommand="compileonly
>>>>>           
>> Log10
>>     
>>>>>           
>>>> log10"
>>>>
>>>>
>>>>         
>>>>> -XX:+PrintCompilation Log10
>>>>>
>>>>> VM option 'CompileCommand=compileonly Log10 log10'
>>>>> VM option '+PrintCompilation'
>>>>> CompilerOracle: compileonly Log10.log10
>>>>> log10(0) = -Infinity
>>>>>  1   b   Log10::log10 (5 bytes)
>>>>> log10(0) = -Infinity
>>>>> drem2(4.0) = NaN
>>>>>
>>>>> Notice however that we are using a version of the Java 6 HotSpot
>>>>> compiled with Visual Studio 2005.
>>>>>
>>>>> I couldn't reproduce the problem with
>>>>> jdk-7-ea-bin-b32-windows-x64-debug-04_aug_2008
>>>>>           
>> however I
>>     
>>>>>           
>>>> could verify
>>>>
>>>>
>>>>         
>>>>> that the code generated by both, our JDK 6 and the latest jdk-7 is
>>>>> virtually the same. The interesting part is the compiled version of
>>>>> the method Log10.log10():
>>>>>
>>>>> 000     pushq   rbp
>>>>>       subq    rsp, #16        # Create frame
>>>>>       nop     # nop for patch_verified_entry
>>>>> 006     fldlg2                  #Log10
>>>>>       fyl2x                   # Q=Log10*Log_2(x)
>>>>> 024     addq    rsp, 16 # Destroy frame
>>>>>       popq    rbp
>>>>>       testl   rax, [rip + #offset_to_poll_page]       # Safepoint:
>>>>>           
>> poll
>>     
>>>>>           
>>>> for GC
>>>>
>>>>
>>>>         
>>>>> 02f     ret
>>>>>
>>>>> The computation of Math.log10(0.0d) which correctly returns -Infinity
>>>>> sets the "Zero Divide" flag in the FP status word as described in
>>>>>
>>>>>
>>>>>
>>>>>           
>> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26569.pdf.
>>     
>>>>         
>>>>> After this computation "SharedRuntime::drem()" is called for the
>>>>> computation of the double reminder in the method "drem2()" of the
>>>>> above example. "SharedRuntime::drem()" itself just delegates the
>>>>> computation to the "fmod()" function (defined in <math.h>) of the
>>>>> underlying platform.
>>>>>
>>>>> The presence of the "Zero Divide" flag in the FP status word seems to
>>>>> be no problem for the "fmod()" which is used by the
>>>>> jdk-7-ea-bin-b32-windows-x64-debug-04_aug_2008
>>>>>           
>> executable
>>     
>>>>>           
>>>> (from
>>>>
>>>>
>>>>         
>>>>> msvcr71.dll) and it is no problem on Linux/x86_64 either, but it IS
>>>>> definitely a problem for the "fmod()" from the "msvcr80d.dll" which is
>>>>> used in our MSVC 2005 build.
>>>>>
>>>>> I have two questions now:
>>>>>
>>>>> 1. Is it ok that the intrinsic for Math.log10() leaves the exceptions
>>>>> bits as they are in the FP status word?
>>>>> 2. Can somebody confirm that the described behaviour of "fmod()" from
>>>>> "msvcr80d.dll" as used by MSVC 2005 is buggy? (I couldn't find any bug
>>>>> report and I also couldn't find reference if "fmod() should depend on
>>>>> the FP status word or not.)
>>>>>
>>>>> It would also be nice if somebody who has recent OpenJDK built with
>>>>> MSVC 2005 could confirm the above problem or if somebody could just
>>>>> confirm or disprove the "fmod()" problem within different versions of
>>>>> MSVC. Here's a small C-program which can be used to test if "fmod()"
>>>>> is dependent on the FP status word:
>>>>>
>>>>> ====================== fmod.c ====================
>>>>> #include <math.h>
>>>>> #include <stdio.h>
>>>>>
>>>>> extern void fpu_asm();
>>>>>
>>>>> int main(int argc, char* argv[]) {
>>>>>
>>>>>   double d = 0.0;
>>>>>
>>>>>   printf("fmod(4.0, 2.0) = %f\n",  fmod(4.0, 2.0));
>>>>>
>>>>>   fpu_asm();
>>>>>
>>>>>   printf("fmod(4.0, 2.0) = %f\n",  fmod(4.0, 2.0));
>>>>>
>>>>> }
>>>>> =====================================================
>>>>>
>>>>> ===================== fpu_asm.asm ====================
>>>>> PUBLIC fpu_asm
>>>>> .CODE
>>>>>       ALIGN   8
>>>>> fpu_asm PROC
>>>>>       fldlg2
>>>>>       fldz
>>>>>       fyl2x
>>>>>       ret
>>>>>       ALIGN 8
>>>>> fpu_asm ENDP
>>>>> END
>>>>>
>>>>>           
>> ======================================================
>>     
>>>>> Compile and run with:
>>>>>
>>>>> ml64 /c fpu_asm.asm
>>>>> cl fmod.c fpu_asm.obj
>>>>> fmod.exe
>>>>> fmod(4.0, 2.0) = 0.000000
>>>>> fmod(4.0, 2.0) = -1.#IND00
>>>>>
>>>>> Regards,
>>>>> Volker
>>>>>
>>>>> PS: the obvious solution of calling "_clearfp()" as defined in
>>>>> <float.h> just before a call to "fmod()" unfortunately doesn't work,
>>>>> because "_clearfp()" (at least in MSVC 2005) only cleans the SSE
>>>>> status register MXCSR. The only solution I see right now is using the
>>>>> FCLEX assembler instruction, and because MSVC 2005 has no inline
>>>>> assembler for x86_64 I'll probably have to write the whole assembler
>>>>> function for the assembler instruction. Or does somebody have a
>>>>> smarter solution?
>>>>>
>>>>> PPS: this is a nice example, how a compiler switch can get you a lot
>>>>> of fun (isn't it Kelly:) ...
>>>>>
>>>>>
>>>>>
>>>>>           
>>     

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20080916/2298d505/attachment.html