PPC64: Poor StrictMath performance due to non-optimized compilation

joe darcy joe.darcy at oracle.com
Thu Nov 17 17:35:05 UTC 2016


Hello,


On 11/16/2016 5:45 PM, Gustavo Romero wrote:
> Hi,
>
> Currently, optimization for building fdlibm is disabled, except for the
> "solaris" OS target [1].

The reason for that is because historically the Solaris compilers have 
had sufficient discipline and control regarding floating-point semantics 
and compiler optimizations to still implement the Java-mandated results 
when optimization was enabled. The gcc family of compilers, for example, 
has lacked such discipline.

>
> As a consequence on PPC64 (Linux) StrictMath methods like, but not limited to,
> sin(), cos(), and tan() perform verify poor in comparison to the same methods
> in Math class [2]:

If you are doing your work against JDK 9, note that the pow, hypot, and 
cbrt fdlibm methods required by StrictMath have been ported to Java 
(JDK-8134780: Port fdlibm to Java). I have intentions to port the 
remaining methods to Java, but it is unclear whether or not this will 
occur for JDK 9.

Methods in the Math class, such as pow, are often intrinsified and use a 
different algorithm so a straight performance comparison may not be as 
fair or meaningful in those cases.

>
>                     Math  StrictMath
>                =========  ==========
> sin           0m29.984s   1m41.184s
> cos           0m30.031s   1m41.200s
> tan           0m31.772s   1m46.976s
> asin           0m4.577s    0m4.543s
> acos           0m4.539s    0m4.525s
> atan          0m12.929s   0m12.896s
> exp            0m1.071s    0m4.570s
> log            0m3.272s   0m14.239s
> log10          0m4.362s   0m20.236s
> sqrt           0m0.913s    0m0.981s
> cbrt          0m10.786s   0m10.808s
> sinh           0m4.438s    0m4.433s
> cosh           0m4.496s    0m4.478s
> tanh           0m3.360s    0m3.353s
> expm1          0m4.076s    0m4.094s
> log1p          0m13.518s  0m13.527s
> IEEEremainder  0m38.803s  0m38.909s
> atan2          0m20.100s  0m20.057s
> pow            0m14.096s  0m19.938s
> hypot          0m5.136s    0m5.122s
>
>
> Switching on the O3 optimization can damage precision of those methods,
> nonetheless it's possible to avoid that side effect and yet get huge benefits of
> the -O3 optimization on PPC64 if -fno-expensive-optimizations is passed in
> addition to the -O3  optimization flag.
>
> In that sense the following change is proposed to resolve the issue:
>
> diff -r 81eb4bd34611 make/lib/CoreLibraries.gmk
> --- a/make/lib/CoreLibraries.gmk	Wed Nov 09 13:37:19 2016 +0100
> +++ b/make/lib/CoreLibraries.gmk	Wed Nov 16 19:11:11 2016 -0500
> @@ -33,10 +33,16 @@
>   # libfdlibm is statically linked with libjava below and not delivered into the
>   # product on its own.
>
> -BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
> +BUILD_LIBFDLIBM_OPTIMIZATION := NONE
>
> -ifneq ($(OPENJDK_TARGET_OS), solaris)
> -  BUILD_LIBFDLIBM_OPTIMIZATION := NONE
> +ifeq ($(OPENJDK_TARGET_OS), solaris)
> +  BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
> +endif
> +
> +ifeq ($(OPENJDK_TARGET_OS), linux)
> +  ifeq ($(OPENJDK_TARGET_CPU_ARCH), ppc)
> +    BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
> +  endif
>   endif
>
>   LIBFDLIBM_SRC := $(JDK_TOPDIR)/src/java.base/share/native/libfdlibm
> @@ -51,6 +57,7 @@
>         CFLAGS := $(CFLAGS_JDKLIB) $(LIBFDLIBM_CFLAGS), \
>         CFLAGS_windows_debug := -DLOGGING, \
>         CFLAGS_aix := -qfloat=nomaf, \
> +      CFLAGS_linux_ppc := -fno-expensive-optimizations, \
>         DISABLED_WARNINGS_gcc := sign-compare, \
>         DISABLED_WARNINGS_microsoft := 4146 4244 4018, \
>         ARFLAGS := $(ARFLAGS), \
>
>
> diff -r 2a1f97c0ad3d make/common/NativeCompilation.gmk
> --- a/make/common/NativeCompilation.gmk	Wed Nov 09 15:32:39 2016 +0100
> +++ b/make/common/NativeCompilation.gmk	Wed Nov 16 19:08:06 2016 -0500
> @@ -569,16 +569,19 @@
>     $1_ALL_OBJS := $$(sort $$($1_EXPECTED_OBJS) $$($1_EXTRA_OBJECT_FILES))
>
>     # Pickup extra OPENJDK_TARGET_OS_TYPE and/or OPENJDK_TARGET_OS dependent variables for CFLAGS.
> -  $1_EXTRA_CFLAGS:=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)) $$($1_CFLAGS_$(OPENJDK_TARGET_OS))
> +  $1_EXTRA_CFLAGS:=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)) $$($1_CFLAGS_$(OPENJDK_TARGET_OS)) \
> +      $$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH))
>     ifneq ($(DEBUG_LEVEL),release)
>       # Pickup extra debug dependent variables for CFLAGS
>       $1_EXTRA_CFLAGS+=$$($1_CFLAGS_debug)
>       $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)_debug)
>       $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_debug)
> +    $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)_debug)
>     else
>       $1_EXTRA_CFLAGS+=$$($1_CFLAGS_release)
>       $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)_release)
>       $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_release)
> +    $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)_release)
>     endif
>
>     # Pickup extra OPENJDK_TARGET_OS_TYPE and/or OPENJDK_TARGET_OS dependent variables for CXXFLAGS.
>
>
> After enabling the optimization it's possible to again up to 3x on performance
> regarding the aforementioned methods without losing precision:
>
>                             StrictMath, original              StrictMath, optimized
>                     ============================       ============================
> sin                1.7136493465700542 1m41.184s      1.7136493465700542  0m33.895s
> cos                0.1709843554185943 1m41.200s      0.1709843554185943  0m33.884s
> tan             -5.5500322522995315E7 1m46.976s   -5.5500322522995315E7  0m36.461s
> asin                              NaN  0m4.543s                     NaN   0m3.175s
> acos                              NaN  0m4.525s                     NaN   0m3.211s
> atan             1.5707961389886132E8 0m12.896s    1.5707961389886132E8   0m7.100s
> exp                          Infinity  0m4.570s                Infinity   0m3.187s
> log              1.7420680845245087E9 0m14.239s    1.7420680845245087E9   0m7.170s
> log10             7.565705562087342E8 0m20.236s     7.565705562087342E8   0m9.610s
> sqrt              6.66666671666567E11  0m0.981s     6.66666671666567E11   0m0.948s
> cbrt             3.481191648389617E10 0m10.808s    3.481191648389617E10  0m10.786s
> sinh                         Infinity  0m4.433s                Infinity   0m3.179s
> cosh                         Infinity  0m4.478s                Infinity   0m3.174s
> tanh              9.999999971990079E7  0m3.353s     9.999999971990079E7   0m3.208s
> expm1                        Infinity  0m4.094s                Infinity   0m3.185s
> log1p            1.7420681029451895E9 0m13.527s    1.7420681029451895E9   0m8.756s
> IEEEremainder                502000.0 0m38.909s                502000.0  0m14.055s
> atan2             1.570453905253704E8 0m20.057s     1.570453905253704E8  0m10.510s
> pow                          Infinity 0m19.938s                Infinity  0m20.204s
> hypot            5.000000099033372E15  0m5.122s    5.000000099033372E15   0m5.130s
>
>
> I believe that as the FC is passed but FEC is not the change can, after the due
> scrutiny and review, be pushed if a special exception approval grants it. Once
> on 9, I'll request the downport to 8.

Accumulating the the results of the functions and comparisons the sums 
is not a sufficiently robust way of checking to see if the optimized 
versions are indeed equivalent to the non-optimized ones. The 
specification of StrictMath requires a particular result for each set of 
floating-point arguments and sums get round-away low-order bits that differ.

Running the JDK math library regression tests and corresponding JCK 
tests is recommended for work in this area.

Cheers,

-Joe


More information about the ppc-aix-port-dev mailing list