PPC64: Poor StrictMath performance due to non-optimized compilation
White, Derek
Derek.White at cavium.com
Thu Nov 17 22:48:07 UTC 2016
Hi Joe,
Although neither a floating point expert (as I think I've proven to you over the years), or a gcc expert, I checked with our in-house gcc expert and got this following answer:
"Yes using -fno-strict-aliasing fixes the issues. Also there are many forks of fdlibm which has this fixed including the code inside glibc. "
FWIW,
- Derek
-----Original Message-----
From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Chris Plummer
Sent: Thursday, November 17, 2016 4:49 PM
To: joe darcy <joe.darcy at oracle.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net
Cc: build-dev <build-dev at openjdk.java.net>
Subject: Re: PPC64: Poor StrictMath performance due to non-optimized compilation
On 11/17/16 1:33 PM, joe darcy wrote:
> Hi Gustavo,
>
>
> On 11/17/2016 10:31 AM, Gustavo Romero wrote:
>> Hi Joe,
>>
>> Thanks a lot for your valuable comments.
>>
>> On 17-11-2016 15:35, joe darcy wrote:
>>>> Currently, optimization for building fdlibm is disabled, except for
>>>> the "solaris" OS target [1].
>>> The reason for that is because historically the Solaris compilers
>>> have had sufficient discipline and control regarding floating-point
>>> semantics and compiler optimizations to still implement the
>>> Java-mandated results when optimization was enabled. The gcc family
>>> of compilers, for example, has lacked such discipline.
>> oh, I see. Thanks for clarifying that. I was exactly wondering why
>> fdlibm optimization is off even for x86_x64 as it, AFAICS regarding
>> gcc 5 only, does not affect the precision, even if setting -O3 does
>> not improve the performance as much as on PPC64.
>
> The fdlibm code relies on aliasing a two-element array of int with a
> double to do bit-level reads and writes of floating-point values. As I
> understand it, the C spec allows compilers to assume values of
> different types don't overlap in memory. The compilation environment
> has to be configured in such a way that the C compiler disables code
> generation and optimization techniques that would run afoul of these
> fdlibm coding practices.
This is the strict aliasing issue right? It's a long standing problem with fdlibm that kept getting worse as gcc got smarter. IIRC, compiling with -fno-strict-aliasing fixes it, but it's been more than 12 years since I last dealt with fdlibm and compiler aliasing issues.
Chris
>
>>>> As a consequence on PPC64 (Linux) StrictMath methods like, but not
>>>> limited to, sin(), cos(), and tan() perform verify poor in
>>>> comparison to the same methods in Math class [2]:
>>> If you are doing your work against JDK 9, note that the pow, hypot,
>>> and cbrt fdlibm methods required by StrictMath have been ported to
>>> Java (JDK-8134780: Port fdlibm to Java). I have intentions to port
>>> the remaining methods to Java, but it is unclear whether or not this
>>> will occur for JDK 9.
>> Yes, I'm doing my work against 9. So is there any problem if I
>> proceed with my change? I understand that there is no conflict as
>> JDK-8134780 progresses and replaces the StrictMath methods by their
>> counterparts in Java.
>> Please, advice.
>
> If I manage to finish the fdlibm C -> Java port in JDK 9, the changes
> you are proposing would eventually be removed as unneeded since the C
> code wouldn't be there to get compiled anymore.
>
>>
>> Is it intended to downport JDK-8134780 to 8?
>
> Such a backport would be technically possible, but we at Oracle don't
> currently plan to do so.
>
>>
>>
>>> Methods in the Math class, such as pow, are often intrinsified and
>>> use a different algorithm so a straight performance comparison may
>>> not be as fair or meaningful in those cases.
>> I agree. It's just that the issue on StrictMath methods was first
>> noted due to that huge gap (Math vs StrictMath) on PPC64, which is
>> not prominent on x64.
>
> Depending on how Math.{sin, cos} is implemented on PPC64, compiling
> the fdlibm sin/cos with more aggressive optimizations should not be
> expected to close the performance gap. In particular, if Math.{sin,
> cos} is an intrinsic on PPC64 (I haven't checked the sources) that
> used platform-specific feature (say fused multiply add instructions)
> then just compiling fdlibm more aggressively wouldn't necessarily make
> up that gap.
>
> To allow cross-platform and cross-release reproducibility, StrictMath
> is specified to use the particular fdlibm algorithms, which precludes
> using better algorithms developed more recently. If we were to start
> with a clean slate today, to get such reproducibility we would specify
> correctly-rounded behavior of all those methods, but such an approach
> was much less tractable technical 20+ years ago without benefit of the
> research that was been done in the interim, such as the work of Prof.
> Muller and associates: https://lipforge.ens-lyon.fr/projects/crlibm/.
>
>>
>>
>>> Accumulating the the results of the functions and comparisons the
>>> sums is not a sufficiently robust way of checking to see if the
>>> optimized versions are indeed equivalent to the non-optimized ones.
>>> The specification of StrictMath requires a particular result for
>>> each set of floating-point arguments and sums get round-away
>>> low-order bits that differ.
>> That's really good point, thanks for letting me know about that. I'll
>> re-test my change under that perspective.
>>
>>
>>> Running the JDK math library regression tests and corresponding JCK
>>> tests is recommended for work in this area.
>> Got it. By "the JDK math library regression tests" you mean exactly
>> which test
>> suite? the jtreg tests?
>
> Specifically, the regression tests under test/java/lang/Math and
> test/java/lang/StrictMath in the jdk repository. There are some other
> math library tests in the hotspot repo, but I don't know where they
> are offhand.
>
> A note on methodologies, when I've been writing test for my port I've
> tried to include test cases that exercise all the branches point in
> the code. Due to the large input space (~2^64 for a single-argument
> method), random sampling alone is an inefficient way to try to find
> differences in behavior.
>> For testing against JCK/TCK I'll need some help on that.
>>
>
> I believe the JCK/TCK does have additional testcases relevant here.
>
> HTH; thanks,
>
> -Joe
More information about the ppc-aix-port-dev
mailing list