VarHandles & LMAX Disruptor

Paul Sandoz paul.sandoz at oracle.com
Fri Jul 31 12:18:56 UTC 2015


On 31 Jul 2015, at 07:43, Michael Barker <mikeb01 at gmail.com> wrote:

> Hi Paul,
> 
> I've had a look at the patch that you mentioned and AFAICT it doesn't seem to be able to eliminate the bounds check, even if I remove the array padding.  Just to confirm my analysis the assembler around the aaload instruction looks like:
> 
> 0x00007f627d6b3aea: cmp    %ebx,%edx
> 0x00007f627d6b3aec: jae    0x00007f627d6b3e0c
> 0x00007f627d6b3af2: mov    %r8,%r13
> 0x00007f627d6b3af5: mov    %r11d,%r8d
> 0x00007f627d6b3af8: mov    0x10(%rax,%rdx,4),%r11d  ;*aaload
> 
> I'm guessing that the cmp/jae pair is the bounds check?
> 

Yes, i am assuming that is generated code of RingBuffer.elementAt (aaload) and not MultiProducerSequencer. I suspect the padding is throwing the compiler off the scent (it does not know that indexMask is less than entries.length - BUFFER_INDEX).

For an array access such as:

  int j = i & (a.length - 1);
  X x = a[j];

I would expect to observe something like:

  test   %edi,%edi
  jbe    …

The bound check should get strength reduced to checking if the array length is zero, and i would expect such a test to be hoisted out of any loop (and get folded into another dominating array length check, if any).

Here is an example, given this benchmark method:

@Benchmark
public int relaxed_r_aa() {
    Value[] _receiver = receiver;
    int sum = 0;
    for (int i = start; i < end; i++) {
        int j = i & (_receiver.length - 1);
        sum += _receiver[j].i;
    }
    return sum;
}

Without the patch the following code is generated (when loop unrolling is switched off):

: mov    0xc(%r12,%rbx,8),%r9d  ;*arraylength
: mov    %r9d,%r11d
: dec    %r11d
: lea    (%r12,%rbx,8),%rcx
: xor    %eax,%eax          ;*iload_3

<loop>: mov    %r11d,%ebp
: and    %r10d,%ebp         ;*iand
: cmp    %r9d,%ebp
: jae    <bounds-fail>
: mov    0x10(%rcx,%rbp,4),%edi  ;*aaload
: add    0xc(%r12,%rdi,8),%eax  ;*iadd
: inc    %r10d              ;*iinc
: cmp    %r8d,%r10d
: jl     <loop>  ;*if_icmpge


When the patch is applied the following code is generated:

: mov    0xc(%r12,%rbp,8),%r9d  ;*arraylength
: test   %r9d,%r9d
: jbe    <bounds-fail>
: dec    %r9d
: lea    (%r12,%rbp,8),%r8
: data32 nopw 0x0(%rax,%rax,1)
: data32 data32 xchg %ax,%ax  ;*iload_3

<loop>: mov    %r9d,%ebx
: and    %r11d,%ebx
: mov    0x10(%r8,%rbx,4),%ebx  ;*aaload
: add    0xc(%r12,%rbx,8),%eax  ;*iadd
: inc    %r11d              ;*iinc
: cmp    %r10d,%r11d
: jl     <loop>  ;*if_icmpge



> A quick note on the patch, it merged, but didn't compile.  There seemed to be a signature change on the signature of ConINode::make, so I changed line 1311 in subnode.cpp from 'return ConINode::make(phase->C, 1);' to 'return ConINode::make(1);'.  That let it compile, but I don't really understand the code so I'm not sure if it is semantically correct.
> 

Oops sorry about that, i uploaded a new revision of the patch the same fix that you applied.

Paul.



More information about the valhalla-dev mailing list