VarHandles & LMAX Disruptor
John Rose
john.r.rose at oracle.com
Tue Aug 18 06:44:54 UTC 2015
> On Aug 7, 2015, at 1:55 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>
>>
>> Therefore even I use separate field to store the indexMask (entries.length
>> - 1) to avoid false sharing, the bounds check will effectively undo that
>> effort?
>
>
> Yes, anything that touches array.length can induce false sharing -- I
> mentioned this earlier in this thread :) :
I don't see that this particular case is a big problem.
If the workload consists of padded arrays, and if the first several
elements of each padded array A are left idle, then the array header
(including the field A.length) will be shared read-only.
The main risk is that an unrelated object X will be allocated *before*
the array header, and a field X.F toward the end of X will be mutable,
and will therefore cause conflicts with reads of A.length.
This risk is lessened because A is allocated with a certain amount
of alignment. We could reduce it more if we added a rule (which we
don't have now) of aligning longer arrays more strongly. There are
other reasons to do this as well, having to do with vector processing.
Anybody want to prototype some stronger array alignment logic?
It's tricky, but mostly straightforward, except for the extra checks on object
reallocation in the GC promotion logic.
>
> I think any approach that requires reading the length of an array, whether
>> user or compiler inserted, is going to run afoul of possible false sharing;
>> only approach #3 would work since the mask field could be isolated to its
>> own cacheline.
>
> The approach #3 you mentioned in the initial email could work *iff* JIT
> started const propagating final fields and then seeing that your mask
> guarantees an index < array.length *and* constant propagating the final
> field holding the array and knowing its length > 0 (assuming we're talking
> about a class with a final field array, which is true for typical
> ringbuffers). Right now, only Unsafe can help avoiding range checks and
> false sharing.
Sorry, I think that's too many "iff"s. We would have to store algebraic relations
between fields. Doable but so complex that it would need a much more
spectacular payoff than just this use case.
Please, use the built-in A.length field as much as possible for your calculations,
because that is the field that all JVM versions are guaranteed to know about.
We know how to do the power-of-two range check trick, and intend to integrate it.
Side note #1: It's not just power-of-two. Any array length works the same, since
(a & b) <= b as long as (b >= 0), even if b+1 is not a power of two. Perhaps one
reason we have been slow to integrate this optimization is that I have been
too eager to "capture" the full set of related identities.
Side note #2: Paul Sandoz's talk last week at JVMLS showed that getting
better optimization for loops over unsafe accesses requires a surprisingly
small amount of special pleading. He showed you can get a lot from an
intrinsic that reifies the same range check that is built into aaload, with
no further coupling to the actual Unsafe load. It took me about five tries
to convince myself that it would really work.
— John
>
> On Fri, Aug 7, 2015 at 4:41 PM, Michael Barker <mikeb01 at gmail.com> wrote:
>
>> Hi Doug,
>>
>> Thank you for that info. I've just realised something but want to check my
>> reasoning. The bounds check is effectively doing:
>>
>> if (index >= entries.length)
>> throw ArrayOutOfBoundsException();
>>
>> Therefore even I use separate field to store the indexMask (entries.length
>> - 1) to avoid false sharing, the bounds check will effectively undo that
>> effort?
>>
>> Mike.
>>
>>
>> On 6 August 2015 at 02:12, Doug Lea <dl at cs.oswego.edu> wrote:
>>
>>> On 08/04/2015 05:25 PM, Michael Barker wrote:
>>>
>>> Thanks, I'll give that approach a try. While it retains the padding on
>> the
>>>> array entries it is lost for entries.length, so while it won't share a
>>>> cache line with the array entries it could share it with some other
>> random
>>>> unknown thing. I still have to run some more tests to see what the
>> actual
>>>> costs end up being. I suspect that I'll just suck up the cost of the
>>>> bounds checking,
>>>>
>>>
>>> That's what I've ended up doing in similar cases in java.util.concurrent.
>>> It does add variance depending on placement across runs of
>>> programs as well as within runs when GC moves things. But not
>>> much more variance than you see from other memory placement
>>> effects and JVM noise.
>>>
>>> It's worth reminding everyone that the need to do this
>>> stems from compiler limitations that in principle could
>>> be addressed someday. "Unsafe" access just means that
>>> the compiler cannot prove that an address is legal.
>>> It is frustrating that some cases that are obvious
>>> to their programmers cannot be proven -- as in an array
>>> that is only ever allocated with power-of-two size and
>>> accessed with masked indexing, but the allocation is outside
>>> the scope of compiler analysis.
>>>
>>> High-quality bounds-check optimizations generally require
>>> internal type systems and static analyses that are too
>>> expensive for JITs. (See for example the work done in the
>>> ahead-of-time X10 compiler.) But with the planned demise
>>> of Unsafe, perhaps some people interested in optimization
>>> will be more motivated to try to invent some more effective
>>> JIT-friendly approximations.
>>>
>>> -Doug
>>>
>>>
>>>
>>
More information about the valhalla-dev
mailing list