PPC64: Poor StrictMath performance due to non-optimized compilation

Erik Joelsson erik.joelsson at oracle.com
Thu Nov 17 09:17:33 UTC 2016


Hello,

Overall this looks reasonable to me. However, if we want to introduce a 
new possible tuple for specifying compilation flags to 
SetupNativeCompilation, we (the build team) would prefer if we used 
OPENJDK_TARGET_CPU instead of OPENJDK_TARGET_CPU_ARCH.

/Erik

On 2016-11-17 03:31, David Holmes wrote:
> Adding in build-dev as they need to scrutinize all build changes.
>
> David
>
> On 17/11/2016 11:45 AM, Gustavo Romero wrote:
>> Hi,
>>
>> Currently, optimization for building fdlibm is disabled, except for the
>> "solaris" OS target [1].
>>
>> As a consequence on PPC64 (Linux) StrictMath methods like, but not 
>> limited to,
>> sin(), cos(), and tan() perform verify poor in comparison to the same 
>> methods
>> in Math class [2]:
>>
>>                    Math  StrictMath
>>               =========  ==========
>> sin           0m29.984s   1m41.184s
>> cos           0m30.031s   1m41.200s
>> tan           0m31.772s   1m46.976s
>> asin           0m4.577s    0m4.543s
>> acos           0m4.539s    0m4.525s
>> atan          0m12.929s   0m12.896s
>> exp            0m1.071s    0m4.570s
>> log            0m3.272s   0m14.239s
>> log10          0m4.362s   0m20.236s
>> sqrt           0m0.913s    0m0.981s
>> cbrt          0m10.786s   0m10.808s
>> sinh           0m4.438s    0m4.433s
>> cosh           0m4.496s    0m4.478s
>> tanh           0m3.360s    0m3.353s
>> expm1          0m4.076s    0m4.094s
>> log1p          0m13.518s  0m13.527s
>> IEEEremainder  0m38.803s  0m38.909s
>> atan2          0m20.100s  0m20.057s
>> pow            0m14.096s  0m19.938s
>> hypot          0m5.136s    0m5.122s
>>
>>
>> Switching on the O3 optimization can damage precision of those methods,
>> nonetheless it's possible to avoid that side effect and yet get huge 
>> benefits of
>> the -O3 optimization on PPC64 if -fno-expensive-optimizations is 
>> passed in
>> addition to the -O3  optimization flag.
>>
>> In that sense the following change is proposed to resolve the issue:
>>
>> diff -r 81eb4bd34611 make/lib/CoreLibraries.gmk
>> --- a/make/lib/CoreLibraries.gmk    Wed Nov 09 13:37:19 2016 +0100
>> +++ b/make/lib/CoreLibraries.gmk    Wed Nov 16 19:11:11 2016 -0500
>> @@ -33,10 +33,16 @@
>>  # libfdlibm is statically linked with libjava below and not 
>> delivered into the
>>  # product on its own.
>>
>> -BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
>> +BUILD_LIBFDLIBM_OPTIMIZATION := NONE
>>
>> -ifneq ($(OPENJDK_TARGET_OS), solaris)
>> -  BUILD_LIBFDLIBM_OPTIMIZATION := NONE
>> +ifeq ($(OPENJDK_TARGET_OS), solaris)
>> +  BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
>> +endif
>> +
>> +ifeq ($(OPENJDK_TARGET_OS), linux)
>> +  ifeq ($(OPENJDK_TARGET_CPU_ARCH), ppc)
>> +    BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
>> +  endif
>>  endif
>>
>>  LIBFDLIBM_SRC := $(JDK_TOPDIR)/src/java.base/share/native/libfdlibm
>> @@ -51,6 +57,7 @@
>>        CFLAGS := $(CFLAGS_JDKLIB) $(LIBFDLIBM_CFLAGS), \
>>        CFLAGS_windows_debug := -DLOGGING, \
>>        CFLAGS_aix := -qfloat=nomaf, \
>> +      CFLAGS_linux_ppc := -fno-expensive-optimizations, \
>>        DISABLED_WARNINGS_gcc := sign-compare, \
>>        DISABLED_WARNINGS_microsoft := 4146 4244 4018, \
>>        ARFLAGS := $(ARFLAGS), \
>>
>>
>> diff -r 2a1f97c0ad3d make/common/NativeCompilation.gmk
>> --- a/make/common/NativeCompilation.gmk    Wed Nov 09 15:32:39 2016 
>> +0100
>> +++ b/make/common/NativeCompilation.gmk    Wed Nov 16 19:08:06 2016 
>> -0500
>> @@ -569,16 +569,19 @@
>>    $1_ALL_OBJS := $$(sort $$($1_EXPECTED_OBJS) 
>> $$($1_EXTRA_OBJECT_FILES))
>>
>>    # Pickup extra OPENJDK_TARGET_OS_TYPE and/or OPENJDK_TARGET_OS 
>> dependent variables for CFLAGS.
>> -  $1_EXTRA_CFLAGS:=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)) 
>> $$($1_CFLAGS_$(OPENJDK_TARGET_OS))
>> +  $1_EXTRA_CFLAGS:=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)) 
>> $$($1_CFLAGS_$(OPENJDK_TARGET_OS)) \
>> + $$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH))
>>    ifneq ($(DEBUG_LEVEL),release)
>>      # Pickup extra debug dependent variables for CFLAGS
>>      $1_EXTRA_CFLAGS+=$$($1_CFLAGS_debug)
>> $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)_debug)
>>      $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_debug)
>> + 
>> $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)_debug)
>>    else
>>      $1_EXTRA_CFLAGS+=$$($1_CFLAGS_release)
>> $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)_release)
>>      $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_release)
>> + 
>> $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)_release)
>>    endif
>>
>>    # Pickup extra OPENJDK_TARGET_OS_TYPE and/or OPENJDK_TARGET_OS 
>> dependent variables for CXXFLAGS.
>>
>>
>> After enabling the optimization it's possible to again up to 3x on 
>> performance
>> regarding the aforementioned methods without losing precision:
>>
>>                            StrictMath, original StrictMath, optimized
>>                    ============================ 
>> ============================
>> sin                1.7136493465700542 1m41.184s 1.7136493465700542  
>> 0m33.895s
>> cos                0.1709843554185943 1m41.200s 0.1709843554185943  
>> 0m33.884s
>> tan             -5.5500322522995315E7 1m46.976s 
>> -5.5500322522995315E7  0m36.461s
>> asin                              NaN 0m4.543s                     
>> NaN   0m3.175s
>> acos                              NaN 0m4.525s                     
>> NaN   0m3.211s
>> atan             1.5707961389886132E8 0m12.896s 
>> 1.5707961389886132E8   0m7.100s
>> exp                          Infinity  0m4.570s Infinity   0m3.187s
>> log              1.7420680845245087E9 0m14.239s 
>> 1.7420680845245087E9   0m7.170s
>> log10             7.565705562087342E8 0m20.236s 7.565705562087342E8   
>> 0m9.610s
>> sqrt              6.66666671666567E11  0m0.981s 6.66666671666567E11   
>> 0m0.948s
>> cbrt             3.481191648389617E10 0m10.808s 3.481191648389617E10  
>> 0m10.786s
>> sinh                         Infinity  0m4.433s Infinity   0m3.179s
>> cosh                         Infinity  0m4.478s Infinity   0m3.174s
>> tanh              9.999999971990079E7  0m3.353s 9.999999971990079E7   
>> 0m3.208s
>> expm1                        Infinity  0m4.094s Infinity   0m3.185s
>> log1p            1.7420681029451895E9 0m13.527s 
>> 1.7420681029451895E9   0m8.756s
>> IEEEremainder                502000.0 0m38.909s 502000.0  0m14.055s
>> atan2             1.570453905253704E8 0m20.057s 1.570453905253704E8  
>> 0m10.510s
>> pow                          Infinity 0m19.938s Infinity  0m20.204s
>> hypot            5.000000099033372E15  0m5.122s 
>> 5.000000099033372E15   0m5.130s
>>
>>
>> I believe that as the FC is passed but FEC is not the change can, 
>> after the due
>> scrutiny and review, be pushed if a special exception approval grants 
>> it. Once
>> on 9, I'll request the downport to 8.
>>
>> Could I open a bug to address that issue?
>>
>> Thank you very much.
>>
>>
>> Regards,
>> Gustavo
>>
>> [1] 
>> http://hg.openjdk.java.net/jdk9/hs/jdk/file/81eb4bd34611/make/lib/CoreLibraries.gmk#l39
>> [2] https://github.com/gromero/strictmath (comparison script used to 
>> get the results)
>>



More information about the ppc-aix-port-dev mailing list