RFR: 8210416: [linux] Poor StrictMath performance due to non-optimized compilation
Andrew Haley
aph at redhat.com
Mon Sep 10 18:15:24 UTC 2018
On 09/10/2018 02:15 PM, Gustavo Romero wrote:
> Hi Severin,
>
> On 09/10/2018 06:27 AM, Severin Gehwolf wrote:
>> On Mon, 2018-09-10 at 10:05 +0100, Andrew Haley wrote:
>>> On 09/05/2018 02:12 PM, Severin Gehwolf wrote:
>>>> Is there a good
>>>> reason to not use -O3 -ffp-contract=off everywhere?
>>>
>>> Is there a good reason to use -O3 rather than -O2?
>>
>> Not sure. I was following what JDK-8170153 did, which was using
>> OPTIMIZATION := HIGH corresponding to -O3. cc'ing Gustavo. Gustavo,
>> would you know why HIGH was chosen over, LOW?
>
> I don't remember exactly, but at least for ppc64 I discussed that a bit with
> the toolchain folks (also regarding the precision issue, etc) and they never
> said anything against using -O3. Unfortunately it was long time ago so I
> don't remember exactly the numbers on ppc64 for -O2 to check if it was
> worse and so I selected -O3 instead.
>
>>> -O3 can bloat the
>>> code which can increase cache pressure, which is not always noticeable
>>> in benchmarks but hurts real-world programs. Unless benchmarks are
>>> significantly better at -O3, -O2 is a good default choice.
>>
>> OK, thanks! I'll re-test and change to LOW (-O2) if it gives similar
>> results.
>
> That's interesting. Andrew, do you mean bloat in the sense of final code size
> (for instance, due to unrolling), right?
Yes. With one of my other hats on: I'm also am occasional GCC
maintainer, and we've always had the problem that people assume that
O3 > O2, therefore O3 is better. It can be, but inlining can cause
problems due to code size and high register pressure, so it's good to
check. Let's see.
> BTW (I just remembered that), on RISC the lack of optimization hurts way more
> than the lack of optimization on CISC,
Mmm, yes. Inlining is cool if you have a ton of registers, and can
cause frantic spilling if you don't.
> so I recall that it puzzled me the fact that turning on the
> optimization on x86_64 did not change much the scenario, contrary to
> the conspicuous gains on on ppc64 when turning on the optimization.
> I took me some time so to understand that the optimization flag was
> the culprit (a much simpler case lucky), because I tried first to
> profile and optimize the fdlibm code (after extracting it from JVM
> for detailed analysis) and only after getting to a dead end I turned
> to look at simpler causes.
>
> Are you checking the difference between -O2 and -O3 only on x86_64?
x86_64 has hand-carved code for a lot of this stuff, so it might not
much be affected.
--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the build-dev
mailing list