PPC64: Poor StrictMath performance due to non-optimized compilation

David Holmes david.holmes at oracle.com
Thu Nov 17 02:31:48 UTC 2016


Adding in build-dev as they need to scrutinize all build changes.

David

On 17/11/2016 11:45 AM, Gustavo Romero wrote:
> Hi,
>
> Currently, optimization for building fdlibm is disabled, except for the
> "solaris" OS target [1].
>
> As a consequence on PPC64 (Linux) StrictMath methods like, but not limited to,
> sin(), cos(), and tan() perform verify poor in comparison to the same methods
> in Math class [2]:
>
>                    Math  StrictMath
>               =========  ==========
> sin           0m29.984s   1m41.184s
> cos           0m30.031s   1m41.200s
> tan           0m31.772s   1m46.976s
> asin           0m4.577s    0m4.543s
> acos           0m4.539s    0m4.525s
> atan          0m12.929s   0m12.896s
> exp            0m1.071s    0m4.570s
> log            0m3.272s   0m14.239s
> log10          0m4.362s   0m20.236s
> sqrt           0m0.913s    0m0.981s
> cbrt          0m10.786s   0m10.808s
> sinh           0m4.438s    0m4.433s
> cosh           0m4.496s    0m4.478s
> tanh           0m3.360s    0m3.353s
> expm1          0m4.076s    0m4.094s
> log1p          0m13.518s  0m13.527s
> IEEEremainder  0m38.803s  0m38.909s
> atan2          0m20.100s  0m20.057s
> pow            0m14.096s  0m19.938s
> hypot          0m5.136s    0m5.122s
>
>
> Switching on the O3 optimization can damage precision of those methods,
> nonetheless it's possible to avoid that side effect and yet get huge benefits of
> the -O3 optimization on PPC64 if -fno-expensive-optimizations is passed in
> addition to the -O3  optimization flag.
>
> In that sense the following change is proposed to resolve the issue:
>
> diff -r 81eb4bd34611 make/lib/CoreLibraries.gmk
> --- a/make/lib/CoreLibraries.gmk	Wed Nov 09 13:37:19 2016 +0100
> +++ b/make/lib/CoreLibraries.gmk	Wed Nov 16 19:11:11 2016 -0500
> @@ -33,10 +33,16 @@
>  # libfdlibm is statically linked with libjava below and not delivered into the
>  # product on its own.
>
> -BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
> +BUILD_LIBFDLIBM_OPTIMIZATION := NONE
>
> -ifneq ($(OPENJDK_TARGET_OS), solaris)
> -  BUILD_LIBFDLIBM_OPTIMIZATION := NONE
> +ifeq ($(OPENJDK_TARGET_OS), solaris)
> +  BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
> +endif
> +
> +ifeq ($(OPENJDK_TARGET_OS), linux)
> +  ifeq ($(OPENJDK_TARGET_CPU_ARCH), ppc)
> +    BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
> +  endif
>  endif
>
>  LIBFDLIBM_SRC := $(JDK_TOPDIR)/src/java.base/share/native/libfdlibm
> @@ -51,6 +57,7 @@
>        CFLAGS := $(CFLAGS_JDKLIB) $(LIBFDLIBM_CFLAGS), \
>        CFLAGS_windows_debug := -DLOGGING, \
>        CFLAGS_aix := -qfloat=nomaf, \
> +      CFLAGS_linux_ppc := -fno-expensive-optimizations, \
>        DISABLED_WARNINGS_gcc := sign-compare, \
>        DISABLED_WARNINGS_microsoft := 4146 4244 4018, \
>        ARFLAGS := $(ARFLAGS), \
>
>
> diff -r 2a1f97c0ad3d make/common/NativeCompilation.gmk
> --- a/make/common/NativeCompilation.gmk	Wed Nov 09 15:32:39 2016 +0100
> +++ b/make/common/NativeCompilation.gmk	Wed Nov 16 19:08:06 2016 -0500
> @@ -569,16 +569,19 @@
>    $1_ALL_OBJS := $$(sort $$($1_EXPECTED_OBJS) $$($1_EXTRA_OBJECT_FILES))
>
>    # Pickup extra OPENJDK_TARGET_OS_TYPE and/or OPENJDK_TARGET_OS dependent variables for CFLAGS.
> -  $1_EXTRA_CFLAGS:=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)) $$($1_CFLAGS_$(OPENJDK_TARGET_OS))
> +  $1_EXTRA_CFLAGS:=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)) $$($1_CFLAGS_$(OPENJDK_TARGET_OS)) \
> +      $$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH))
>    ifneq ($(DEBUG_LEVEL),release)
>      # Pickup extra debug dependent variables for CFLAGS
>      $1_EXTRA_CFLAGS+=$$($1_CFLAGS_debug)
>      $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)_debug)
>      $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_debug)
> +    $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)_debug)
>    else
>      $1_EXTRA_CFLAGS+=$$($1_CFLAGS_release)
>      $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)_release)
>      $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_release)
> +    $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)_release)
>    endif
>
>    # Pickup extra OPENJDK_TARGET_OS_TYPE and/or OPENJDK_TARGET_OS dependent variables for CXXFLAGS.
>
>
> After enabling the optimization it's possible to again up to 3x on performance
> regarding the aforementioned methods without losing precision:
>
>                            StrictMath, original              StrictMath, optimized
>                    ============================       ============================
> sin                1.7136493465700542 1m41.184s      1.7136493465700542  0m33.895s
> cos                0.1709843554185943 1m41.200s      0.1709843554185943  0m33.884s
> tan             -5.5500322522995315E7 1m46.976s   -5.5500322522995315E7  0m36.461s
> asin                              NaN  0m4.543s                     NaN   0m3.175s
> acos                              NaN  0m4.525s                     NaN   0m3.211s
> atan             1.5707961389886132E8 0m12.896s    1.5707961389886132E8   0m7.100s
> exp                          Infinity  0m4.570s                Infinity   0m3.187s
> log              1.7420680845245087E9 0m14.239s    1.7420680845245087E9   0m7.170s
> log10             7.565705562087342E8 0m20.236s     7.565705562087342E8   0m9.610s
> sqrt              6.66666671666567E11  0m0.981s     6.66666671666567E11   0m0.948s
> cbrt             3.481191648389617E10 0m10.808s    3.481191648389617E10  0m10.786s
> sinh                         Infinity  0m4.433s                Infinity   0m3.179s
> cosh                         Infinity  0m4.478s                Infinity   0m3.174s
> tanh              9.999999971990079E7  0m3.353s     9.999999971990079E7   0m3.208s
> expm1                        Infinity  0m4.094s                Infinity   0m3.185s
> log1p            1.7420681029451895E9 0m13.527s    1.7420681029451895E9   0m8.756s
> IEEEremainder                502000.0 0m38.909s                502000.0  0m14.055s
> atan2             1.570453905253704E8 0m20.057s     1.570453905253704E8  0m10.510s
> pow                          Infinity 0m19.938s                Infinity  0m20.204s
> hypot            5.000000099033372E15  0m5.122s    5.000000099033372E15   0m5.130s
>
>
> I believe that as the FC is passed but FEC is not the change can, after the due
> scrutiny and review, be pushed if a special exception approval grants it. Once
> on 9, I'll request the downport to 8.
>
> Could I open a bug to address that issue?
>
> Thank you very much.
>
>
> Regards,
> Gustavo
>
> [1] http://hg.openjdk.java.net/jdk9/hs/jdk/file/81eb4bd34611/make/lib/CoreLibraries.gmk#l39
> [2] https://github.com/gromero/strictmath (comparison script used to get the results)
>


More information about the ppc-aix-port-dev mailing list