VarHandles & LMAX Disruptor
Paul Sandoz
paul.sandoz at oracle.com
Fri Jul 31 12:18:56 UTC 2015
On 31 Jul 2015, at 07:43, Michael Barker <mikeb01 at gmail.com> wrote:
> Hi Paul,
>
> I've had a look at the patch that you mentioned and AFAICT it doesn't seem to be able to eliminate the bounds check, even if I remove the array padding. Just to confirm my analysis the assembler around the aaload instruction looks like:
>
> 0x00007f627d6b3aea: cmp %ebx,%edx
> 0x00007f627d6b3aec: jae 0x00007f627d6b3e0c
> 0x00007f627d6b3af2: mov %r8,%r13
> 0x00007f627d6b3af5: mov %r11d,%r8d
> 0x00007f627d6b3af8: mov 0x10(%rax,%rdx,4),%r11d ;*aaload
>
> I'm guessing that the cmp/jae pair is the bounds check?
>
Yes, i am assuming that is generated code of RingBuffer.elementAt (aaload) and not MultiProducerSequencer. I suspect the padding is throwing the compiler off the scent (it does not know that indexMask is less than entries.length - BUFFER_INDEX).
For an array access such as:
int j = i & (a.length - 1);
X x = a[j];
I would expect to observe something like:
test %edi,%edi
jbe …
The bound check should get strength reduced to checking if the array length is zero, and i would expect such a test to be hoisted out of any loop (and get folded into another dominating array length check, if any).
Here is an example, given this benchmark method:
@Benchmark
public int relaxed_r_aa() {
Value[] _receiver = receiver;
int sum = 0;
for (int i = start; i < end; i++) {
int j = i & (_receiver.length - 1);
sum += _receiver[j].i;
}
return sum;
}
Without the patch the following code is generated (when loop unrolling is switched off):
: mov 0xc(%r12,%rbx,8),%r9d ;*arraylength
: mov %r9d,%r11d
: dec %r11d
: lea (%r12,%rbx,8),%rcx
: xor %eax,%eax ;*iload_3
<loop>: mov %r11d,%ebp
: and %r10d,%ebp ;*iand
: cmp %r9d,%ebp
: jae <bounds-fail>
: mov 0x10(%rcx,%rbp,4),%edi ;*aaload
: add 0xc(%r12,%rdi,8),%eax ;*iadd
: inc %r10d ;*iinc
: cmp %r8d,%r10d
: jl <loop> ;*if_icmpge
When the patch is applied the following code is generated:
: mov 0xc(%r12,%rbp,8),%r9d ;*arraylength
: test %r9d,%r9d
: jbe <bounds-fail>
: dec %r9d
: lea (%r12,%rbp,8),%r8
: data32 nopw 0x0(%rax,%rax,1)
: data32 data32 xchg %ax,%ax ;*iload_3
<loop>: mov %r9d,%ebx
: and %r11d,%ebx
: mov 0x10(%r8,%rbx,4),%ebx ;*aaload
: add 0xc(%r12,%rbx,8),%eax ;*iadd
: inc %r11d ;*iinc
: cmp %r10d,%r11d
: jl <loop> ;*if_icmpge
> A quick note on the patch, it merged, but didn't compile. There seemed to be a signature change on the signature of ConINode::make, so I changed line 1311 in subnode.cpp from 'return ConINode::make(phase->C, 1);' to 'return ConINode::make(1);'. That let it compile, but I don't really understand the code so I'm not sure if it is semantically correct.
>
Oops sorry about that, i uploaded a new revision of the patch the same fix that you applied.
Paul.
More information about the valhalla-dev
mailing list