PPC64: Poor StrictMath performance due to non-optimized compilation
David Holmes
david.holmes at oracle.com
Thu Nov 17 02:31:48 UTC 2016
Adding in build-dev as they need to scrutinize all build changes.
David
On 17/11/2016 11:45 AM, Gustavo Romero wrote:
> Hi,
>
> Currently, optimization for building fdlibm is disabled, except for the
> "solaris" OS target [1].
>
> As a consequence on PPC64 (Linux) StrictMath methods like, but not limited to,
> sin(), cos(), and tan() perform verify poor in comparison to the same methods
> in Math class [2]:
>
> Math StrictMath
> ========= ==========
> sin 0m29.984s 1m41.184s
> cos 0m30.031s 1m41.200s
> tan 0m31.772s 1m46.976s
> asin 0m4.577s 0m4.543s
> acos 0m4.539s 0m4.525s
> atan 0m12.929s 0m12.896s
> exp 0m1.071s 0m4.570s
> log 0m3.272s 0m14.239s
> log10 0m4.362s 0m20.236s
> sqrt 0m0.913s 0m0.981s
> cbrt 0m10.786s 0m10.808s
> sinh 0m4.438s 0m4.433s
> cosh 0m4.496s 0m4.478s
> tanh 0m3.360s 0m3.353s
> expm1 0m4.076s 0m4.094s
> log1p 0m13.518s 0m13.527s
> IEEEremainder 0m38.803s 0m38.909s
> atan2 0m20.100s 0m20.057s
> pow 0m14.096s 0m19.938s
> hypot 0m5.136s 0m5.122s
>
>
> Switching on the O3 optimization can damage precision of those methods,
> nonetheless it's possible to avoid that side effect and yet get huge benefits of
> the -O3 optimization on PPC64 if -fno-expensive-optimizations is passed in
> addition to the -O3 optimization flag.
>
> In that sense the following change is proposed to resolve the issue:
>
> diff -r 81eb4bd34611 make/lib/CoreLibraries.gmk
> --- a/make/lib/CoreLibraries.gmk Wed Nov 09 13:37:19 2016 +0100
> +++ b/make/lib/CoreLibraries.gmk Wed Nov 16 19:11:11 2016 -0500
> @@ -33,10 +33,16 @@
> # libfdlibm is statically linked with libjava below and not delivered into the
> # product on its own.
>
> -BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
> +BUILD_LIBFDLIBM_OPTIMIZATION := NONE
>
> -ifneq ($(OPENJDK_TARGET_OS), solaris)
> - BUILD_LIBFDLIBM_OPTIMIZATION := NONE
> +ifeq ($(OPENJDK_TARGET_OS), solaris)
> + BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
> +endif
> +
> +ifeq ($(OPENJDK_TARGET_OS), linux)
> + ifeq ($(OPENJDK_TARGET_CPU_ARCH), ppc)
> + BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
> + endif
> endif
>
> LIBFDLIBM_SRC := $(JDK_TOPDIR)/src/java.base/share/native/libfdlibm
> @@ -51,6 +57,7 @@
> CFLAGS := $(CFLAGS_JDKLIB) $(LIBFDLIBM_CFLAGS), \
> CFLAGS_windows_debug := -DLOGGING, \
> CFLAGS_aix := -qfloat=nomaf, \
> + CFLAGS_linux_ppc := -fno-expensive-optimizations, \
> DISABLED_WARNINGS_gcc := sign-compare, \
> DISABLED_WARNINGS_microsoft := 4146 4244 4018, \
> ARFLAGS := $(ARFLAGS), \
>
>
> diff -r 2a1f97c0ad3d make/common/NativeCompilation.gmk
> --- a/make/common/NativeCompilation.gmk Wed Nov 09 15:32:39 2016 +0100
> +++ b/make/common/NativeCompilation.gmk Wed Nov 16 19:08:06 2016 -0500
> @@ -569,16 +569,19 @@
> $1_ALL_OBJS := $$(sort $$($1_EXPECTED_OBJS) $$($1_EXTRA_OBJECT_FILES))
>
> # Pickup extra OPENJDK_TARGET_OS_TYPE and/or OPENJDK_TARGET_OS dependent variables for CFLAGS.
> - $1_EXTRA_CFLAGS:=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)) $$($1_CFLAGS_$(OPENJDK_TARGET_OS))
> + $1_EXTRA_CFLAGS:=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)) $$($1_CFLAGS_$(OPENJDK_TARGET_OS)) \
> + $$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH))
> ifneq ($(DEBUG_LEVEL),release)
> # Pickup extra debug dependent variables for CFLAGS
> $1_EXTRA_CFLAGS+=$$($1_CFLAGS_debug)
> $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)_debug)
> $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_debug)
> + $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)_debug)
> else
> $1_EXTRA_CFLAGS+=$$($1_CFLAGS_release)
> $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)_release)
> $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_release)
> + $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)_release)
> endif
>
> # Pickup extra OPENJDK_TARGET_OS_TYPE and/or OPENJDK_TARGET_OS dependent variables for CXXFLAGS.
>
>
> After enabling the optimization it's possible to again up to 3x on performance
> regarding the aforementioned methods without losing precision:
>
> StrictMath, original StrictMath, optimized
> ============================ ============================
> sin 1.7136493465700542 1m41.184s 1.7136493465700542 0m33.895s
> cos 0.1709843554185943 1m41.200s 0.1709843554185943 0m33.884s
> tan -5.5500322522995315E7 1m46.976s -5.5500322522995315E7 0m36.461s
> asin NaN 0m4.543s NaN 0m3.175s
> acos NaN 0m4.525s NaN 0m3.211s
> atan 1.5707961389886132E8 0m12.896s 1.5707961389886132E8 0m7.100s
> exp Infinity 0m4.570s Infinity 0m3.187s
> log 1.7420680845245087E9 0m14.239s 1.7420680845245087E9 0m7.170s
> log10 7.565705562087342E8 0m20.236s 7.565705562087342E8 0m9.610s
> sqrt 6.66666671666567E11 0m0.981s 6.66666671666567E11 0m0.948s
> cbrt 3.481191648389617E10 0m10.808s 3.481191648389617E10 0m10.786s
> sinh Infinity 0m4.433s Infinity 0m3.179s
> cosh Infinity 0m4.478s Infinity 0m3.174s
> tanh 9.999999971990079E7 0m3.353s 9.999999971990079E7 0m3.208s
> expm1 Infinity 0m4.094s Infinity 0m3.185s
> log1p 1.7420681029451895E9 0m13.527s 1.7420681029451895E9 0m8.756s
> IEEEremainder 502000.0 0m38.909s 502000.0 0m14.055s
> atan2 1.570453905253704E8 0m20.057s 1.570453905253704E8 0m10.510s
> pow Infinity 0m19.938s Infinity 0m20.204s
> hypot 5.000000099033372E15 0m5.122s 5.000000099033372E15 0m5.130s
>
>
> I believe that as the FC is passed but FEC is not the change can, after the due
> scrutiny and review, be pushed if a special exception approval grants it. Once
> on 9, I'll request the downport to 8.
>
> Could I open a bug to address that issue?
>
> Thank you very much.
>
>
> Regards,
> Gustavo
>
> [1] http://hg.openjdk.java.net/jdk9/hs/jdk/file/81eb4bd34611/make/lib/CoreLibraries.gmk#l39
> [2] https://github.com/gromero/strictmath (comparison script used to get the results)
>
More information about the ppc-aix-port-dev
mailing list