PPC64: Poor StrictMath performance due to non-optimized compilation
Chris Plummer
chris.plummer at oracle.com
Thu Nov 17 21:48:36 UTC 2016
On 11/17/16 1:33 PM, joe darcy wrote:
> Hi Gustavo,
>
>
> On 11/17/2016 10:31 AM, Gustavo Romero wrote:
>> Hi Joe,
>>
>> Thanks a lot for your valuable comments.
>>
>> On 17-11-2016 15:35, joe darcy wrote:
>>>> Currently, optimization for building fdlibm is disabled, except for
>>>> the
>>>> "solaris" OS target [1].
>>> The reason for that is because historically the Solaris compilers
>>> have had sufficient discipline and control regarding floating-point
>>> semantics and compiler optimizations to still implement the
>>> Java-mandated results when optimization was enabled. The gcc family
>>> of compilers, for example, has lacked such discipline.
>> oh, I see. Thanks for clarifying that. I was exactly wondering why
>> fdlibm
>> optimization is off even for x86_x64 as it, AFAICS regarding gcc 5
>> only, does
>> not affect the precision, even if setting -O3 does not improve the
>> performance
>> as much as on PPC64.
>
> The fdlibm code relies on aliasing a two-element array of int with a
> double to do bit-level reads and writes of floating-point values. As I
> understand it, the C spec allows compilers to assume values of
> different types don't overlap in memory. The compilation environment
> has to be configured in such a way that the C compiler disables code
> generation and optimization techniques that would run afoul of these
> fdlibm coding practices.
This is the strict aliasing issue right? It's a long standing problem
with fdlibm that kept getting worse as gcc got smarter. IIRC, compiling
with -fno-strict-aliasing fixes it, but it's been more than 12 years
since I last dealt with fdlibm and compiler aliasing issues.
Chris
>
>>>> As a consequence on PPC64 (Linux) StrictMath methods like, but not
>>>> limited to,
>>>> sin(), cos(), and tan() perform verify poor in comparison to the
>>>> same methods
>>>> in Math class [2]:
>>> If you are doing your work against JDK 9, note that the pow, hypot,
>>> and cbrt fdlibm methods required by StrictMath have been ported to
>>> Java (JDK-8134780: Port fdlibm to Java). I have intentions to
>>> port the remaining methods to Java, but it is unclear whether or not
>>> this will occur for JDK 9.
>> Yes, I'm doing my work against 9. So is there any problem if I
>> proceed with my
>> change? I understand that there is no conflict as JDK-8134780
>> progresses and
>> replaces the StrictMath methods by their counterparts in Java.
>> Please, advice.
>
> If I manage to finish the fdlibm C -> Java port in JDK 9, the changes
> you are proposing would eventually be removed as unneeded since the C
> code wouldn't be there to get compiled anymore.
>
>>
>> Is it intended to downport JDK-8134780 to 8?
>
> Such a backport would be technically possible, but we at Oracle don't
> currently plan to do so.
>
>>
>>
>>> Methods in the Math class, such as pow, are often intrinsified and
>>> use a different algorithm so a straight performance comparison may
>>> not be as fair or meaningful in those cases.
>> I agree. It's just that the issue on StrictMath methods was first
>> noted due to
>> that huge gap (Math vs StrictMath) on PPC64, which is not prominent
>> on x64.
>
> Depending on how Math.{sin, cos} is implemented on PPC64, compiling
> the fdlibm sin/cos with more aggressive optimizations should not be
> expected to close the performance gap. In particular, if Math.{sin,
> cos} is an intrinsic on PPC64 (I haven't checked the sources) that
> used platform-specific feature (say fused multiply add instructions)
> then just compiling fdlibm more aggressively wouldn't necessarily make
> up that gap.
>
> To allow cross-platform and cross-release reproducibility, StrictMath
> is specified to use the particular fdlibm algorithms, which precludes
> using better algorithms developed more recently. If we were to start
> with a clean slate today, to get such reproducibility we would specify
> correctly-rounded behavior of all those methods, but such an approach
> was much less tractable technical 20+ years ago without benefit of the
> research that was been done in the interim, such as the work of Prof.
> Muller and associates: https://lipforge.ens-lyon.fr/projects/crlibm/.
>
>>
>>
>>> Accumulating the the results of the functions and comparisons the
>>> sums is not a sufficiently robust way of checking to see if the
>>> optimized versions are indeed equivalent to the non-optimized ones.
>>> The specification of StrictMath requires a particular result for
>>> each set of floating-point arguments and sums get round-away
>>> low-order bits that differ.
>> That's really good point, thanks for letting me know about that. I'll
>> re-test my
>> change under that perspective.
>>
>>
>>> Running the JDK math library regression tests and corresponding JCK
>>> tests is recommended for work in this area.
>> Got it. By "the JDK math library regression tests" you mean exactly
>> which test
>> suite? the jtreg tests?
>
> Specifically, the regression tests under test/java/lang/Math and
> test/java/lang/StrictMath in the jdk repository. There are some other
> math library tests in the hotspot repo, but I don't know where they
> are offhand.
>
> A note on methodologies, when I've been writing test for my port I've
> tried to include test cases that exercise all the branches point in
> the code. Due to the large input space (~2^64 for a single-argument
> method), random sampling alone is an inefficient way to try to find
> differences in behavior.
>> For testing against JCK/TCK I'll need some help on that.
>>
>
> I believe the JCK/TCK does have additional testcases relevant here.
>
> HTH; thanks,
>
> -Joe
More information about the ppc-aix-port-dev
mailing list