RFR (M) CR 8050147: StoreLoad barrier interferes with stack usages
Aleksey Shipilev
aleksey.shipilev at oracle.com
Wed Jul 30 17:20:21 UTC 2014
On 07/23/2014 11:38 AM, Aleksey Shipilev wrote:
> On 07/23/2014 01:38 AM, Aleksey Shipilev wrote:
>> * 1x4x1 Intel Core (Haswell-DT) benefits from offsetting %esp as well.
>> There is an interesting hump on lower backoffs with addl (%esp-8), which
>> seems to be a second-order microarchitectural effect. Unfortunately, we
>> don't have large Haswells available at the moment to dive into this:
>> http://cr.openjdk.java.net/~shade/8050147/micros/Intel-Haswell.data.png
>
> Finally managed to get the profile on this Haswell. The hottest loops
> are identical, although the second variant is consistently better.
>
> This produces 1200.318 +- 18.520 ns/op:
>
> 2.65% 0x00007fdcb87190b0: mov (%rsp),%r10
> 0x00007fdcb87190b4: mov 0xc(%r10),%r10d
> 0.06% 0x00007fdcb87190b8: mov %r10d,0xc(%rbp)
> 2.66% 0x00007fdcb87190bc: lock addl $0x0,-0x8(%rsp)
> 80.08% 95.59% 0x00007fdcb87190c2: mov %rbp,%rsi
> 0x00007fdcb87190c5: xchg %ax,%ax
> 0x00007fdcb87190c7: callq 0x00007fdcb85d7fe0
> 0x00007fdcb87190cc: test %eax,%eax
> 0x00007fdcb87190ce: jne 0x00007fdcb87190b0
>
> This produces 891.427 +- 15.277 ns/op:
>
> 3.52% 0.02% 0x00007ffa3918f030: mov (%rsp),%r10
> 0x00007ffa3918f034: mov 0xc(%r10),%r10d
> 0.07% 0x00007ffa3918f038: mov %r10d,0xc(%rbp)
> 3.68% 0x00007ffa3918f03c: lock addl $0x0,-0x48(%rsp)
> 76.54% 97.13% 0x00007ffa3918f042: mov %rbp,%rsi
> 0x00007ffa3918f045: xchg %ax,%ax
> 0x00007ffa3918f047: callq 0x00007ffa39045fe0
> 0x00007ffa3918f04c: test %eax,%eax
> 0x00007ffa3918f04e: jne 0x00007ffa3918f030
>
> Full logs here:
> http://cr.openjdk.java.net/~shade/8050147/haswell-barrier2.perfasm
> http://cr.openjdk.java.net/~shade/8050147/haswell-barrier3.perfasm
>
> It is puzzling to me why do we have the difference here. In the logs
> there, you may see the second-hottest method, looping(), which is called
> in this busy loop. It does:
>
> 2.60% 0x00007fdcb871b200: sub $0x18,%rsp
> 0x00007fdcb871b207: mov %rbp,0x10(%rsp)
>
> ...which seems to callee-save %rbp to (%rsp-8)? Any pointers how we
> manage the stack in callees? It would seem we need to step back more
> than -8 to dodge callee-saves, but how much?
>
> It seems odd this affects Haswell so much. I've checked on my
> SandyBridge laptop, and we have the same code, but performance is
> consistent. Barring that, it would seem like some the second-order
> microarchitectural effect on Haswell. ...which makes me say this is the
> mode we should switch to:
>
>> ...or "lock addl (%esp-CL-8), 0), pessimistically padding away from
>> stack users:
>> http://cr.openjdk.java.net/~shade/8050147/webrev.02/
Ping. Anyone?
-Aleksey.
More information about the hotspot-compiler-dev
mailing list