RFR: JDK-8199781: Don't use naked == for comparing oops

Thu Mar 29 09:14:16 UTC 2018

Am 29.03.2018 um 09:57 schrieb Erik Österlund:
> Hi Roman,
> 
> On 2018-03-29 09:04, Roman Kennke wrote:
>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/src/hotspot/share/oops/accessBackend.new.hpp.html
>>>
>>>
>>> So against my better judgement, I was trying to follow the bouncing
>>> template instantiation ball.  And I guess we end up calling this:
>>>
>>> RawAccessBarrier<>
>>>
>>>   405   static bool equals(oop o1, oop o2) { return o1 == o2; }
>>>
>>>
>>> Can you show how this decomposes from:
>>>
>>> + inline static bool equals(oop o1, oop o2) { return
>>> Access<>::equals(o1, o2); }
>> It would also be nice to verify that this actually inlines the cmpq (or
>> function pointer call). Just because we think the compiler can do it,
>> doesn't mean it actually does it. It might just as well give up.
> 
> I verified a handful of callsites by disassembling a few .o files. It
> inlines as expected.
> While the compiler always has the choice to not inline things you tell
> it to, for absolutely no reason, only a compiler that tries hard to ruin
> your day would do that, when the cost of the comparison is lower than
> the cost of the callsite to wherever the not inlined code is.

Very good! Thank you.

I wouldn't necessarily trust the compiler to do the expected thing. It
wouldn't be the first time that I see a compiler not inline something
because it's blown some complexity limit.

Do you want to push the final patch (pending reviews)? Or should I do it?

Also, it would be nice to get ok from John and Andrew. :-)

Thanks, Roman

> 
> /Erik
> 
>> Roman
>>
>>
>>> Thank you for getting this to inline and supporting Roman's patch.
>>>
>>> Coleen
>>>
>>> On 3/28/18 4:11 PM, Erik Osterlund wrote:
>>>> Hi Roman,
>>>>
>>>> Thanks for the review.
>>>>
>>>> /Erik
>>>>
>>>>> On 28 Mar 2018, at 21:56, Roman Kennke <rkennke at redhat.com> wrote:
>>>>>
>>>>>
>>>>>> Incremental webrev:
>>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01_02/
>>>>>>
>>>>>> Full webrev:
>>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/
>>>>> Looks good to me. (Well. As good as all this template-voodoo can look
>>>>> ;-) Better than before though).
>>>>>
>>>>> Cheers,
>>>>> Roman
>>>>>
>>>>>
>>>>>>> Thank you for solving the inline problem with oopDesc::equals with
>>>>>>> this.   I haven't fully reviewed it but don't wait for me.
>>>>>> Thank you for having a look at this.
>>>>>>
>>>>>> /Erik
>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>>> On 3/28/18 9:00 AM, Erik Österlund wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The access.hpp support has reincarnated.
>>>>>>>>
>>>>>>>> Here is an incremental patch on Roman's work (rebased on latest
>>>>>>>> jdk-hs) that makes it possible to include access.hpp to call
>>>>>>>> Access<>::equals specifically. The infrastructure allows picking
>>>>>>>> more
>>>>>>>> accesses that can be used from the hpp file, but I have kept it
>>>>>>>> down
>>>>>>>> to the minimum requirements now, so for now it only works for
>>>>>>>> equals.
>>>>>>>> Perhaps in the future, we might want to move more operations to be
>>>>>>>> reachable from hpp, but not today.
>>>>>>>>
>>>>>>>> I changed the involved header files to include access.hpp, and made
>>>>>>>> it possible to inline this again for non-Shenandoah builds, by
>>>>>>>> checking for the INTERNAL_BT_TO_SPACE_INVARIANT build-time
>>>>>>>> decorator.
>>>>>>>>
>>>>>>>> Builds that only have to-space invariant GCs inline equals to a
>>>>>>>> normal native comparison of the pointers. Builds that have at least
>>>>>>>> one to-space invariant GC instead generate a single function
>>>>>>>> pointer
>>>>>>>> call. This function pointer may be resolved in the access.cpp file,
>>>>>>>> instead of the access.inline.hpp file.
>>>>>>>>
>>>>>>>> Here is an explanation how this works:
>>>>>>>>
>>>>>>>> The template pipeline for each operation consists of 5 steps, and
>>>>>>>> they were previously all performed in the access.inline.hpp
>>>>>>>> file. The
>>>>>>>> steps are:
>>>>>>>>
>>>>>>>> 1) Set default decorators and decay types
>>>>>>>> 2) Reduce types
>>>>>>>> 3) Pre-runtime dispatch
>>>>>>>> 4) Runtime-dispatch
>>>>>>>> 5.a) Barrier resolution (check which GC backend accessor to be used
>>>>>>>> upon the first invocation, then patch function pointer to 5.b)
>>>>>>>> 5.b) Post-runtime dispatch
>>>>>>>>
>>>>>>>> I have moved steps 1-4 to the accessBackend.hpp file, which is
>>>>>>>> included by access.hpp. This allows performing all steps until the
>>>>>>>> backend barrier resolution in only hpp files. This makes it
>>>>>>>> possible
>>>>>>>> to use, e.g. raw accesses (although without compressed oops, atomic
>>>>>>>> or order access) from only the access.hpp file. The resolution of
>>>>>>>> barriers in the GC backends (comprising steps 4-5) remains in the
>>>>>>>> access.inline.hpp file. But they can be instantiated from the
>>>>>>>> access.cpp file for well chosen accesses and decorator combinations
>>>>>>>> that can be pre-generated in the libjvm.so file. The function
>>>>>>>> pointers used in builds that have not to-space invariant GCs
>>>>>>>> will be
>>>>>>>> resolved to point right into these functions in the access.cpp file
>>>>>>>> at runtime.
>>>>>>>>
>>>>>>>> This change required moving the decorators to a new file called
>>>>>>>> accessDecorators.hpp. I kind of like having them there in a
>>>>>>>> separate
>>>>>>>> file.
>>>>>>>>
>>>>>>>> Incremental webrev to Roman's patch:
>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.00_01/
>>>>>>>>
>>>>>>>> Full webrev:
>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01/
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> /Erik
>>>>>>>>
>>>>>>>>> On 2018-03-27 09:14, Roman Kennke wrote:
>>>>>>>>>> Am 27.03.2018 um 00:46 schrieb John Rose:
>>>>>>>>>>> On Mar 26, 2018, at 2:21 PM, Roman Kennke
>>>>>>>>>>> <rkennke at redhat.com> wrote:
>>>>>>>>>>>> Am 26.03.2018 um 21:11 schrieb John Rose:
>>>>>>>>>>>> ...
>>>>>>>>>>>> This is clearly one of those cases.  Let's code against
>>>>>>>>>>>> surprises
>>>>>>>>>>>> (both present and future) by abstracting simpler operations
>>>>>>>>>>>> with
>>>>>>>>>>>> inlines.
>>>>>>>>>>> Thanks John for clarifying. I generally do agree with you
>>>>>>>>>>> that it
>>>>>>>>>>> makes
>>>>>>>>>>> sense to inline something like an operator that will compile
>>>>>>>>>>> to a
>>>>>>>>>>> single
>>>>>>>>>>> instruction.
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately, this new abstraction will generate a
>>>>>>>>>>> non-inlineable
>>>>>>>>>>> call.
>>>>>>>>>>> Thanks Erik's work with the Access API, it should be a straight
>>>>>>>>>>> call to
>>>>>>>>>>> the right implementation, not even a virtual call, but it's a
>>>>>>>>>>> call
>>>>>>>>>>> nonetheless.
>>>>>>>>>> What would it take to inline the call, so there's no new
>>>>>>>>>> performance
>>>>>>>>>> hazard introduced?
>>>>>>>>>>
>>>>>>>>>> This might be a place to add #ifdefs, if the templates don't
>>>>>>>>>> let us
>>>>>>>>>> control the decision statically.
>>>>>>>>>>
>>>>>>>>>> (I confess I find the templates complex and hard to reason about.
>>>>>>>>>> But that's a potential maintenance issue with the access API, not
>>>>>>>>>> your work.)
>>>>>>>>> The Access API supports static inclusion/exclusion of stuff,
>>>>>>>>> and we've
>>>>>>>>> used it before to ensable barriers on primitive field accesses for
>>>>>>>>> Shenandoah-enabled builds. It works by passing
>>>>>>>>> -DSUPPORT_BARRIER_ON_PRIMITIVES to gcc. This is much better than
>>>>>>>>> #ifdefs
>>>>>>>>> and does the same thing. I can add something similar for object
>>>>>>>>> equality.
>>>>>>>>>
>>>>>>>>>>> The reason why I'm proposing it is that the GC might want to
>>>>>>>>>>> have
>>>>>>>>>>> a say
>>>>>>>>>>> about object equality. In the case of Shenandoah, there may
>>>>>>>>>>> be two
>>>>>>>>>>> copies of any one object floating around, and comparing the two
>>>>>>>>>>> copies
>>>>>>>>>>> might otherwise result in false negatives.
>>>>>>>>>> 1. Of course we need a way for some GC algorithms to control
>>>>>>>>>> object identity.
>>>>>>>>>>
>>>>>>>>>> 2. It's good coding discipline to find all the places where
>>>>>>>>>> native
>>>>>>>>>> op== is used and upgrade them to the needed control.
>>>>>>>>> I hope I did that with the proposed patch (it's based on years of
>>>>>>>>> finding all those places in Shenandoah land ;-) ).
>>>>>>>>>
>>>>>>>>>> (2a. We also need a way to avoid having op== creep back in
>>>>>>>>>> in an uncontrolled way.)
>>>>>>>>> We have something for this in Shenandoah too. It works by
>>>>>>>>> overriding ==
>>>>>>>>> in oopsHierarch.hpp and running into ShouldNotReachHere() if both
>>>>>>>>> operands are != NULL. This catches all uses of naked == at
>>>>>>>>> runtime,
>>>>>>>>> when
>>>>>>>>> running with +CheckUnhandledOops. It's not perfect, because it
>>>>>>>>> doesn't
>>>>>>>>> catch uses in places that are not executed in tests, but it's
>>>>>>>>> still
>>>>>>>>> incredibly useful. The caveat here is that we also need another
>>>>>>>>> abstraction for code paths that actually want naked ==, e.g. in
>>>>>>>>> the GC.
>>>>>>>>> In Shenandoah we have an oopDesc::unsafe_equals() for that. The
>>>>>>>>> other
>>>>>>>>> alternative would be to drop operators from oop to oopDesc* and
>>>>>>>>> compare
>>>>>>>>> that (which is exactly what oopDesc::unsafe_equals() does).
>>>>>>>>>
>>>>>>>>>> 3. We still need an unsurprising performance model for GC's
>>>>>>>>>> which don't require the extra identity hook.
>>>>>>>>> Yeah. Unfortunately this is C++ and not Java, and the decision
>>>>>>>>> which GC
>>>>>>>>> to use is made at runtime. The 'best' that I can see is to rip
>>>>>>>>> out the
>>>>>>>>> indirection in Shenandoah-disabled builds, as proposed above. I
>>>>>>>>> don't
>>>>>>>>> like it much, but if this is the only way we can get Shenandoah
>>>>>>>>> in,
>>>>>>>>> then
>>>>>>>>> let's go for it?
>>>>>>>>>
>>>>>>>>>> So we're not there yet, I think.  If there's no easy way to
>>>>>>>>>> adjust
>>>>>>>>>> the access API to get the last inlining, I suggest adding an
>>>>>>>>>> #ifdef
>>>>>>>>>> into oopDesc::equals.
>>>>>>>>>>
>>>>>>>>>> I suspect the happy final solution will be to templatize hot
>>>>>>>>>> loops
>>>>>>>>>> using making the inlinable adjusted op== be part of the
>>>>>>>>>> "decorations"
>>>>>>>>>> of those loops.  Then we will have two copies of hot loops in the
>>>>>>>>>> JVM, one with the out-of-line hook and one without.
>>>>>>>>>>
>>>>>>>>>> For now, I think it's enough to have separate builds of the JVM,
>>>>>>>>>> one with the #ifdef set "the old way" and one that allows more
>>>>>>>>>> flexibility.
>>>>>>>>>>
>>>>>>>>>> (Is this the first time we've run into an occasion to make a
>>>>>>>>>> separate
>>>>>>>>>> Shenandaoah build?  Surely not.)
>>>>>>>>> Primitive field accesses required similar jumping through hoops
>>>>>>>>> ;-)
>>>>>>>>>
>>>>>>>>>>> So... with inlining oopDesc::equals() we will get one
>>>>>>>>>>> straight call
>>>>>>>>>>> (from the caller of oopDesc::equals() to the impl), instead of 2
>>>>>>>>>>> (caller
>>>>>>>>>>> -> oopDesc::equals() -> impl).
>>>>>>>>>>>
>>>>>>>>>>> It's the best I could come up with that does what
>>>>>>>>>>> (Shenandoah) GC
>>>>>>>>>>> needs,
>>>>>>>>>>> and it hasn't shown up in any performance testing that we
>>>>>>>>>>> did, yet.
>>>>>>>>>>>
>>>>>>>>>>> Still good?
>>>>>>>>>> I think it's too risky to out-of-line the "cmpq" instruction.
>>>>>>>>>> Let's
>>>>>>>>>> find an agreeable way to handle that via a trade-off that keeps
>>>>>>>>>> the old code shape in VM builds which don't need the new code
>>>>>>>>>> shape.  I'm speaking up here because I don't believe this is the
>>>>>>>>>> last time we'll have to consider adding something gross like an
>>>>>>>>>> #ifdef to support new features, and I want to aim at trade-offs
>>>>>>>>>> that keep the existing hot paths efficient, while still allowing
>>>>>>>>>> new code shapes to become part of the mix.
>>>>>>>>>>
>>>>>>>>>> (I'd be happy to see something cleaner than an #ifdef, of
>>>>>>>>>> course.  Seems to me that a constant non-product flag
>>>>>>>>>> -XX:+ShortCircuitOopEquals would work too, with a
>>>>>>>>>> binding of falseInShenandoah.)
>>>>>>>>> That would require coding 2 versions of hot loops that involve
>>>>>>>>> oop==oop
>>>>>>>>> and switch to one or the other based on that flag. I have
>>>>>>>>> checked all
>>>>>>>>> the uses of oopDesc::equals() in my patch, and none seemed
>>>>>>>>> particularily
>>>>>>>>> critical. I think this would be overkill.
>>>>>>>>>
>>>>>>>>> And notice also that none interpreter/c1/c2 compiled code will
>>>>>>>>> suffer
>>>>>>>>> this problem because we'll just generate the right code at
>>>>>>>>> runtime.
>>>>>>>>>
>>>>>>>>> Roman
>>>>>>>>>
>>
>