PPC64: Poor StrictMath performance due to non-optimized compilation

Thu Nov 17 21:48:36 UTC 2016

On 11/17/16 1:33 PM, joe darcy wrote:
> Hi Gustavo,
>
>
> On 11/17/2016 10:31 AM, Gustavo Romero wrote:
>> Hi Joe,
>>
>> Thanks a lot for your valuable comments.
>>
>> On 17-11-2016 15:35, joe darcy wrote:
>>>> Currently, optimization for building fdlibm is disabled, except for 
>>>> the
>>>> "solaris" OS target [1].
>>> The reason for that is because historically the Solaris compilers 
>>> have had sufficient discipline and control regarding floating-point 
>>> semantics and compiler optimizations to still implement the
>>> Java-mandated results when optimization was enabled. The gcc family 
>>> of compilers, for example, has lacked such discipline.
>> oh, I see. Thanks for clarifying that. I was exactly wondering why 
>> fdlibm
>> optimization is off even for x86_x64 as it, AFAICS regarding gcc 5 
>> only, does
>> not affect the precision, even if setting -O3 does not improve the 
>> performance
>> as much as on PPC64.
>
> The fdlibm code relies on aliasing a two-element array of int with a 
> double to do bit-level reads and writes of floating-point values. As I 
> understand it, the C spec allows compilers to assume values of 
> different types don't overlap in memory. The compilation environment 
> has to be configured in such a way that the C compiler disables code 
> generation and optimization techniques that would run afoul of these 
> fdlibm coding practices.
This is the strict aliasing issue right? It's a long standing problem 
with fdlibm that kept getting worse as gcc got smarter. IIRC, compiling 
with -fno-strict-aliasing fixes it, but it's been more than 12 years 
since I last dealt with fdlibm and compiler aliasing issues.

Chris
>
>>>> As a consequence on PPC64 (Linux) StrictMath methods like, but not 
>>>> limited to,
>>>> sin(), cos(), and tan() perform verify poor in comparison to the 
>>>> same methods
>>>> in Math class [2]:
>>> If you are doing your work against JDK 9, note that the pow, hypot, 
>>> and cbrt fdlibm methods required by StrictMath have been ported to 
>>> Java (JDK-8134780: Port fdlibm to Java). I have intentions to
>>> port the remaining methods to Java, but it is unclear whether or not 
>>> this will occur for JDK 9.
>> Yes, I'm doing my work against 9. So is there any problem if I 
>> proceed with my
>> change? I understand that there is no conflict as JDK-8134780 
>> progresses and
>> replaces the StrictMath methods by their counterparts in Java. 
>> Please, advice.
>
> If I manage to finish the fdlibm C -> Java port in JDK 9, the changes 
> you are proposing would eventually be removed as unneeded since the C 
> code wouldn't be there to get compiled anymore.
>
>>
>> Is it intended to downport JDK-8134780 to 8?
>
> Such a backport would be technically possible, but we at Oracle don't 
> currently plan to do so.
>
>>
>>
>>> Methods in the Math class, such as pow, are often intrinsified and 
>>> use a different algorithm so a straight performance comparison may 
>>> not be as fair or meaningful in those cases.
>> I agree. It's just that the issue on StrictMath methods was first 
>> noted due to
>> that huge gap (Math vs StrictMath) on PPC64, which is not prominent 
>> on x64.
>
> Depending on how Math.{sin, cos} is implemented on PPC64, compiling 
> the fdlibm sin/cos with more aggressive optimizations should not be 
> expected to close the performance gap. In particular, if Math.{sin, 
> cos} is an intrinsic on PPC64 (I haven't checked the sources) that 
> used platform-specific feature (say fused multiply add instructions) 
> then just compiling fdlibm more aggressively wouldn't necessarily make 
> up that gap.
>
> To allow cross-platform and cross-release reproducibility, StrictMath 
> is specified to use the particular fdlibm algorithms, which precludes 
> using better algorithms developed more recently. If we were to start 
> with a clean slate today, to get such reproducibility we would specify 
> correctly-rounded behavior of all those methods, but such an approach 
> was much less tractable technical 20+ years ago without benefit of the 
> research that was been done in the interim, such as the work of Prof. 
> Muller and associates: https://lipforge.ens-lyon.fr/projects/crlibm/.
>
>>
>>
>>> Accumulating the the results of the functions and comparisons the 
>>> sums is not a sufficiently robust way of checking to see if the 
>>> optimized versions are indeed equivalent to the non-optimized ones.
>>> The specification of StrictMath requires a particular result for 
>>> each set of floating-point arguments and sums get round-away 
>>> low-order bits that differ.
>> That's really good point, thanks for letting me know about that. I'll 
>> re-test my
>> change under that perspective.
>>
>>
>>> Running the JDK math library regression tests and corresponding JCK 
>>> tests is recommended for work in this area.
>> Got it. By "the JDK math library regression tests" you mean exactly 
>> which test
>> suite? the jtreg tests?
>
> Specifically, the regression tests under test/java/lang/Math and 
> test/java/lang/StrictMath in the jdk repository. There are some other 
> math library tests in the hotspot repo, but I don't know where they 
> are offhand.
>
> A note on methodologies, when I've been writing test for my port I've 
> tried to include test cases that exercise all the branches point in 
> the code. Due to the large input space (~2^64 for a single-argument 
> method), random sampling alone is an inefficient way to try to find 
> differences in behavior.
>> For testing against JCK/TCK I'll need some help on that.
>>
>
> I believe the JCK/TCK does have additional testcases relevant here.
>
> HTH; thanks,
>
> -Joe