VarHandles & LMAX Disruptor

Vitaly Davidovich vitalyd at gmail.com
Tue Aug 4 15:29:18 UTC 2015


TrustFinalNonStaticFields only works if the holder class is static final.
You can see my thread on hotspot-compiler about this flag from a few months
ago:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017739.html

On Tue, Aug 4, 2015 at 11:15 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:

> Hi Michael,
>
> Thanks for the detailed response.
>
> Not sure how we can solve this particular use-case, contended arrays,
> without either HotSpot getting smarter on such bounds checks or supporting
> alternative array layouts.
>
> You could try out #3 by setting the experimental flag:
>
>   -XX:+UnlockExperimentalVMOptions -XX:+TrustFinalNonStaticFields
>
> But i don’t know if HotSpot will correctly associate indexMask with the
> array length.
>
> Paul.
>
> On 3 Aug 2015, at 03:53, Michael Barker <mikeb01 at gmail.com> wrote:
>
> > Hi,
> >
> > Having played with it a bit more, I can get the bound check removed
> under one specific arrangement of the code, the following is for the
> RingBufferFields.elementAt:
> >
> > 1)  return (E) entries[((int) sequence) & (entries.length - 1)];
> >
> > This works and I get the pattern mentioned Paul's example.  There is a
> drawback to implementing it this way, details below.
> >
> > 2)  return (E) entries[(int) sequence & (entries.length - 1)];
> >
> > Note the position of the brackets.  In this case entries.length is being
> promoted to a long before applying the mask, then being cast down to an
> int, which seems to trip up the compiler.
> >
> > 3)  return (E) entries[((int) sequence) & indexMask];  // indexMask is a
> final field of value entries.length - 1
> >
> > I suspect this is because the compiler can't assume that final fields
> are actually final.
> >
> > 4)  return (E) entries[BUFFER_PAD + (((int) sequence) & (entries.length
> - ((BUFFER_PAD * 2) + 1)))];
> >
> > I'm guessing it gets just too complex for the compiler to figure our
> that the resulting value will be less that entries.length.
> >
> > One of the niggles with approach #1, is that I need to reference
> entries.length at the call site and can't use a separate field to store the
> value.  This can cause issue in a concurrent ring-buffer-like structures as
> the array length can end up sharing a cache line with the array itself.  So
> if a producer thread writes into one of the first 15 array slots (worst
> case), the whole line including the part containing the length field will
> be invalid on the consumer's core resulting in an additional cache miss in
> certain cases.  This is probably less of an issue for the default set up of
> the disruptor as we don't actually replace the references inside of the
> entries array, but a number of fast queue implementations would suffer.
> The developer is caught having to make a trade-off between reducing false
> sharing and removing bounds checking.
> >
> > Mike.
> >
> >
> > On 1 August 2015 at 00:18, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> >
> > On 31 Jul 2015, at 07:43, Michael Barker <mikeb01 at gmail.com> wrote:
> >
> > > Hi Paul,
> > >
> > > I've had a look at the patch that you mentioned and AFAICT it doesn't
> seem to be able to eliminate the bounds check, even if I remove the array
> padding.  Just to confirm my analysis the assembler around the aaload
> instruction looks like:
> > >
> > > 0x00007f627d6b3aea: cmp    %ebx,%edx
> > > 0x00007f627d6b3aec: jae    0x00007f627d6b3e0c
> > > 0x00007f627d6b3af2: mov    %r8,%r13
> > > 0x00007f627d6b3af5: mov    %r11d,%r8d
> > > 0x00007f627d6b3af8: mov    0x10(%rax,%rdx,4),%r11d  ;*aaload
> > >
> > > I'm guessing that the cmp/jae pair is the bounds check?
> > >
> >
> > Yes, i am assuming that is generated code of RingBuffer.elementAt
> (aaload) and not MultiProducerSequencer. I suspect the padding is throwing
> the compiler off the scent (it does not know that indexMask is less than
> entries.length - BUFFER_INDEX).
> >
> > For an array access such as:
> >
> >   int j = i & (a.length - 1);
> >   X x = a[j];
> >
> > I would expect to observe something like:
> >
> >   test   %edi,%edi
> >   jbe    …
> >
> > The bound check should get strength reduced to checking if the array
> length is zero, and i would expect such a test to be hoisted out of any
> loop (and get folded into another dominating array length check, if any).
> >
> > Here is an example, given this benchmark method:
> >
> > @Benchmark
> > public int relaxed_r_aa() {
> >     Value[] _receiver = receiver;
> >     int sum = 0;
> >     for (int i = start; i < end; i++) {
> >         int j = i & (_receiver.length - 1);
> >         sum += _receiver[j].i;
> >     }
> >     return sum;
> > }
> >
> > Without the patch the following code is generated (when loop unrolling
> is switched off):
> >
> > : mov    0xc(%r12,%rbx,8),%r9d  ;*arraylength
> > : mov    %r9d,%r11d
> > : dec    %r11d
> > : lea    (%r12,%rbx,8),%rcx
> > : xor    %eax,%eax          ;*iload_3
> >
> > <loop>: mov    %r11d,%ebp
> > : and    %r10d,%ebp         ;*iand
> > : cmp    %r9d,%ebp
> > : jae    <bounds-fail>
> > : mov    0x10(%rcx,%rbp,4),%edi  ;*aaload
> > : add    0xc(%r12,%rdi,8),%eax  ;*iadd
> > : inc    %r10d              ;*iinc
> > : cmp    %r8d,%r10d
> > : jl     <loop>  ;*if_icmpge
> >
> >
> > When the patch is applied the following code is generated:
> >
> > : mov    0xc(%r12,%rbp,8),%r9d  ;*arraylength
> > : test   %r9d,%r9d
> > : jbe    <bounds-fail>
> > : dec    %r9d
> > : lea    (%r12,%rbp,8),%r8
> > : data32 nopw 0x0(%rax,%rax,1)
> > : data32 data32 xchg %ax,%ax  ;*iload_3
> >
> > <loop>: mov    %r9d,%ebx
> > : and    %r11d,%ebx
> > : mov    0x10(%r8,%rbx,4),%ebx  ;*aaload
> > : add    0xc(%r12,%rbx,8),%eax  ;*iadd
> > : inc    %r11d              ;*iinc
> > : cmp    %r10d,%r11d
> > : jl     <loop>  ;*if_icmpge
> >
> >
> >
> > > A quick note on the patch, it merged, but didn't compile.  There
> seemed to be a signature change on the signature of ConINode::make, so I
> changed line 1311 in subnode.cpp from 'return ConINode::make(phase->C, 1);'
> to 'return ConINode::make(1);'.  That let it compile, but I don't really
> understand the code so I'm not sure if it is semantically correct.
> > >
> >
> > Oops sorry about that, i uploaded a new revision of the patch the same
> fix that you applied.
> >
> > Paul.
> >
> >
>
>


More information about the valhalla-dev mailing list