Problems with double reminder on Windows/x86_64 and Visual Studio 2005

Mon Aug 25 12:19:17 PDT 2008

Hi,

we had a strange problem wich lead to failures in the JCK test
Math2012. The problem only occured if some other JCK-Tests where
compiled and  execuetd in a special order before the tests in
Math2012.

I could finally track down the problem to the following simple test case:

====================================================
public class Log10 {

  public static double log10(double d) {
    return Math.log10(d);
  }

  public static double drem2(double d) {
    return d % 2;
  }

  public static void main(String args[]) {
    System.out.println("log10(0) = " + Math.log10(0.0d));
    System.out.println("log10(0) = " + log10(0.0d));
    System.out.println("drem2(4.0) = " + drem2(4.0d));
  }
}
====================================================

which always fails on Windows/x86_64 (i.e. prints "NaN" for the result
of 4.0 % 2.0 which should be 0.0) if executed like this:

java -Xcomp -Xbatch -XX:CompileCommand="compileonly Log10 log10"
-XX:+PrintCompilation Log10

VM option 'CompileCommand=compileonly Log10 log10'
VM option '+PrintCompilation'
CompilerOracle: compileonly Log10.log10
log10(0) = -Infinity
  1   b   Log10::log10 (5 bytes)
log10(0) = -Infinity
drem2(4.0) = NaN

Notice however that we are using a version of the Java 6 HotSpot
compiled with Visual Studio 2005.

I couldn't reproduce the problem with
jdk-7-ea-bin-b32-windows-x64-debug-04_aug_2008 however I could verify
that the code generated by both, our JDK 6 and the latest jdk-7 is
virtually the same. The interesting part is the compiled version of
the method Log10.log10():

000     pushq   rbp
        subq    rsp, #16        # Create frame
        nop     # nop for patch_verified_entry
006     fldlg2                  #Log10
        fyl2x                   # Q=Log10*Log_2(x)
024     addq    rsp, 16 # Destroy frame
        popq    rbp
        testl   rax, [rip + #offset_to_poll_page]       # Safepoint: poll for GC
02f     ret

The computation of Math.log10(0.0d) which correctly returns -Infinity
sets the "Zero Divide" flag in the FP status word as described in
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26569.pdf.

After this computation "SharedRuntime::drem()" is called for the
computation of the double reminder in the method "drem2()" of the
above example. "SharedRuntime::drem()" itself just delegates the
computation to the "fmod()" function (defined in <math.h>) of the
underlying platform.

The presence of the "Zero Divide" flag in the FP status word seems to
be no problem for the "fmod()" which is used by the
jdk-7-ea-bin-b32-windows-x64-debug-04_aug_2008 executable (from
msvcr71.dll) and it is no problem on Linux/x86_64 either, but it IS
definitely a problem for the "fmod()" from the "msvcr80d.dll" which is
used in our MSVC 2005 build.

I have two questions now:

1. Is it ok that the intrinsic for Math.log10() leaves the exceptions
bits as they are in the FP status word?
2. Can somebody confirm that the described behaviour of "fmod()" from
"msvcr80d.dll" as used by MSVC 2005 is buggy? (I couldn't find any bug
report and I also couldn't find reference if "fmod() should depend on
the FP status word or not.)

It would also be nice if somebody who has recent OpenJDK built with
MSVC 2005 could confirm the above problem or if somebody could just
confirm or disprove the "fmod()" problem within different versions of
MSVC. Here's a small C-program which can be used to test if "fmod()"
is dependent on the FP status word:

====================== fmod.c ====================
#include <math.h>
#include <stdio.h>

extern void fpu_asm();

int main(int argc, char* argv[]) {

    double d = 0.0;

    printf("fmod(4.0, 2.0) = %f\n",  fmod(4.0, 2.0));

    fpu_asm();

    printf("fmod(4.0, 2.0) = %f\n",  fmod(4.0, 2.0));

}
=====================================================

===================== fpu_asm.asm ====================
PUBLIC fpu_asm
.CODE
	ALIGN	8
fpu_asm PROC
	fldlg2
	fldz
	fyl2x
	ret
	ALIGN 8
fpu_asm ENDP
END
======================================================

Compile and run with:

ml64 /c fpu_asm.asm
cl fmod.c fpu_asm.obj
fmod.exe
fmod(4.0, 2.0) = 0.000000
fmod(4.0, 2.0) = -1.#IND00

Regards,
Volker

PS: the obvious solution of calling "_clearfp()" as defined in
<float.h> just before a call to "fmod()" unfortunately doesn't work,
because "_clearfp()" (at least in MSVC 2005) only cleans the SSE
status register MXCSR. The only solution I see right now is using the
FCLEX assembler instruction, and because MSVC 2005 has no inline
assembler for x86_64 I'll probably have to write the whole assembler
function for the assembler instruction. Or does somebody have a
smarter solution?

PPS: this is a nice example, how a compiler switch can get you a lot
of fun (isn't it Kelly:) ...