RFR: JDK-8199781: Don't use naked == for comparing oops

Thu Mar 29 11:51:11 UTC 2018


On 3/28/18 6:51 PM, Roman Kennke wrote:
> Hi Coleen,
>
>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/src/hotspot/share/prims/jvm.cpp.udiff.html
>>
>> + if ((!oopDesc::equals(previous_protection_domain, protection_domain))
>> && (protection_domain != NULL)) {
>>
>>
>> So we don't need oopDesc::equals for NULL?
> No. Unless we ever want to use an actual object for null, but in this
> case we wouldn't use NULL I suppose ;-)
>
>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/src/hotspot/share/oops/accessBackend.new.hpp.html
>>
>> So against my better judgement, I was trying to follow the bouncing
>> template instantiation ball.  And I guess we end up calling this:
>>
>> RawAccessBarrier<>
>>
>>   405   static bool equals(oop o1, oop o2) { return o1 == o2; }
>>
>>
>> Can you show how this decomposes from:
>>
>> + inline static bool equals(oop o1, oop o2) { return
>> Access<>::equals(o1, o2); }
> Uh-oh ... You really want to know, do you? ;-) (I leave it to Erik to
> give an actual answer ..)

I have to learn to be careful what I wish for.
Coleen
>
> Thanks Coleen for reviewing!
>
> Roman
>
>
>> Thank you for getting this to inline and supporting Roman's patch.
>>
>> Coleen
>>
>> On 3/28/18 4:11 PM, Erik Osterlund wrote:
>>> Hi Roman,
>>>
>>> Thanks for the review.
>>>
>>> /Erik
>>>
>>>> On 28 Mar 2018, at 21:56, Roman Kennke <rkennke at redhat.com> wrote:
>>>>
>>>>
>>>>> Incremental webrev:
>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01_02/
>>>>>
>>>>> Full webrev:
>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/
>>>> Looks good to me. (Well. As good as all this template-voodoo can look
>>>> ;-) Better than before though).
>>>>
>>>> Cheers,
>>>> Roman
>>>>
>>>>
>>>>>> Thank you for solving the inline problem with oopDesc::equals with
>>>>>> this.   I haven't fully reviewed it but don't wait for me.
>>>>> Thank you for having a look at this.
>>>>>
>>>>> /Erik
>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>>> On 3/28/18 9:00 AM, Erik Österlund wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> The access.hpp support has reincarnated.
>>>>>>>
>>>>>>> Here is an incremental patch on Roman's work (rebased on latest
>>>>>>> jdk-hs) that makes it possible to include access.hpp to call
>>>>>>> Access<>::equals specifically. The infrastructure allows picking more
>>>>>>> accesses that can be used from the hpp file, but I have kept it down
>>>>>>> to the minimum requirements now, so for now it only works for equals.
>>>>>>> Perhaps in the future, we might want to move more operations to be
>>>>>>> reachable from hpp, but not today.
>>>>>>>
>>>>>>> I changed the involved header files to include access.hpp, and made
>>>>>>> it possible to inline this again for non-Shenandoah builds, by
>>>>>>> checking for the INTERNAL_BT_TO_SPACE_INVARIANT build-time decorator.
>>>>>>>
>>>>>>> Builds that only have to-space invariant GCs inline equals to a
>>>>>>> normal native comparison of the pointers. Builds that have at least
>>>>>>> one to-space invariant GC instead generate a single function pointer
>>>>>>> call. This function pointer may be resolved in the access.cpp file,
>>>>>>> instead of the access.inline.hpp file.
>>>>>>>
>>>>>>> Here is an explanation how this works:
>>>>>>>
>>>>>>> The template pipeline for each operation consists of 5 steps, and
>>>>>>> they were previously all performed in the access.inline.hpp file. The
>>>>>>> steps are:
>>>>>>>
>>>>>>> 1) Set default decorators and decay types
>>>>>>> 2) Reduce types
>>>>>>> 3) Pre-runtime dispatch
>>>>>>> 4) Runtime-dispatch
>>>>>>> 5.a) Barrier resolution (check which GC backend accessor to be used
>>>>>>> upon the first invocation, then patch function pointer to 5.b)
>>>>>>> 5.b) Post-runtime dispatch
>>>>>>>
>>>>>>> I have moved steps 1-4 to the accessBackend.hpp file, which is
>>>>>>> included by access.hpp. This allows performing all steps until the
>>>>>>> backend barrier resolution in only hpp files. This makes it possible
>>>>>>> to use, e.g. raw accesses (although without compressed oops, atomic
>>>>>>> or order access) from only the access.hpp file. The resolution of
>>>>>>> barriers in the GC backends (comprising steps 4-5) remains in the
>>>>>>> access.inline.hpp file. But they can be instantiated from the
>>>>>>> access.cpp file for well chosen accesses and decorator combinations
>>>>>>> that can be pre-generated in the libjvm.so file. The function
>>>>>>> pointers used in builds that have not to-space invariant GCs will be
>>>>>>> resolved to point right into these functions in the access.cpp file
>>>>>>> at runtime.
>>>>>>>
>>>>>>> This change required moving the decorators to a new file called
>>>>>>> accessDecorators.hpp. I kind of like having them there in a separate
>>>>>>> file.
>>>>>>>
>>>>>>> Incremental webrev to Roman's patch:
>>>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.00_01/
>>>>>>>
>>>>>>> Full webrev:
>>>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01/
>>>>>>>
>>>>>>> Thanks,
>>>>>>> /Erik
>>>>>>>
>>>>>>>> On 2018-03-27 09:14, Roman Kennke wrote:
>>>>>>>>> Am 27.03.2018 um 00:46 schrieb John Rose:
>>>>>>>>>> On Mar 26, 2018, at 2:21 PM, Roman Kennke <rkennke at redhat.com> wrote:
>>>>>>>>>>> Am 26.03.2018 um 21:11 schrieb John Rose:
>>>>>>>>>>> ...
>>>>>>>>>>> This is clearly one of those cases.  Let's code against surprises
>>>>>>>>>>> (both present and future) by abstracting simpler operations with
>>>>>>>>>>> inlines.
>>>>>>>>>> Thanks John for clarifying. I generally do agree with you that it
>>>>>>>>>> makes
>>>>>>>>>> sense to inline something like an operator that will compile to a
>>>>>>>>>> single
>>>>>>>>>> instruction.
>>>>>>>>>>
>>>>>>>>>> Unfortunately, this new abstraction will generate a non-inlineable
>>>>>>>>>> call.
>>>>>>>>>> Thanks Erik's work with the Access API, it should be a straight
>>>>>>>>>> call to
>>>>>>>>>> the right implementation, not even a virtual call, but it's a call
>>>>>>>>>> nonetheless.
>>>>>>>>> What would it take to inline the call, so there's no new performance
>>>>>>>>> hazard introduced?
>>>>>>>>>
>>>>>>>>> This might be a place to add #ifdefs, if the templates don't let us
>>>>>>>>> control the decision statically.
>>>>>>>>>
>>>>>>>>> (I confess I find the templates complex and hard to reason about.
>>>>>>>>> But that's a potential maintenance issue with the access API, not
>>>>>>>>> your work.)
>>>>>>>> The Access API supports static inclusion/exclusion of stuff, and we've
>>>>>>>> used it before to ensable barriers on primitive field accesses for
>>>>>>>> Shenandoah-enabled builds. It works by passing
>>>>>>>> -DSUPPORT_BARRIER_ON_PRIMITIVES to gcc. This is much better than
>>>>>>>> #ifdefs
>>>>>>>> and does the same thing. I can add something similar for object
>>>>>>>> equality.
>>>>>>>>
>>>>>>>>>> The reason why I'm proposing it is that the GC might want to have
>>>>>>>>>> a say
>>>>>>>>>> about object equality. In the case of Shenandoah, there may be two
>>>>>>>>>> copies of any one object floating around, and comparing the two
>>>>>>>>>> copies
>>>>>>>>>> might otherwise result in false negatives.
>>>>>>>>> 1. Of course we need a way for some GC algorithms to control
>>>>>>>>> object identity.
>>>>>>>>>
>>>>>>>>> 2. It's good coding discipline to find all the places where native
>>>>>>>>> op== is used and upgrade them to the needed control.
>>>>>>>> I hope I did that with the proposed patch (it's based on years of
>>>>>>>> finding all those places in Shenandoah land ;-) ).
>>>>>>>>
>>>>>>>>> (2a. We also need a way to avoid having op== creep back in
>>>>>>>>> in an uncontrolled way.)
>>>>>>>> We have something for this in Shenandoah too. It works by overriding ==
>>>>>>>> in oopsHierarch.hpp and running into ShouldNotReachHere() if both
>>>>>>>> operands are != NULL. This catches all uses of naked == at runtime,
>>>>>>>> when
>>>>>>>> running with +CheckUnhandledOops. It's not perfect, because it doesn't
>>>>>>>> catch uses in places that are not executed in tests, but it's still
>>>>>>>> incredibly useful. The caveat here is that we also need another
>>>>>>>> abstraction for code paths that actually want naked ==, e.g. in the GC.
>>>>>>>> In Shenandoah we have an oopDesc::unsafe_equals() for that. The other
>>>>>>>> alternative would be to drop operators from oop to oopDesc* and compare
>>>>>>>> that (which is exactly what oopDesc::unsafe_equals() does).
>>>>>>>>
>>>>>>>>> 3. We still need an unsurprising performance model for GC's
>>>>>>>>> which don't require the extra identity hook.
>>>>>>>> Yeah. Unfortunately this is C++ and not Java, and the decision which GC
>>>>>>>> to use is made at runtime. The 'best' that I can see is to rip out the
>>>>>>>> indirection in Shenandoah-disabled builds, as proposed above. I don't
>>>>>>>> like it much, but if this is the only way we can get Shenandoah in,
>>>>>>>> then
>>>>>>>> let's go for it?
>>>>>>>>
>>>>>>>>> So we're not there yet, I think.  If there's no easy way to adjust
>>>>>>>>> the access API to get the last inlining, I suggest adding an #ifdef
>>>>>>>>> into oopDesc::equals.
>>>>>>>>>
>>>>>>>>> I suspect the happy final solution will be to templatize hot loops
>>>>>>>>> using making the inlinable adjusted op== be part of the "decorations"
>>>>>>>>> of those loops.  Then we will have two copies of hot loops in the
>>>>>>>>> JVM, one with the out-of-line hook and one without.
>>>>>>>>>
>>>>>>>>> For now, I think it's enough to have separate builds of the JVM,
>>>>>>>>> one with the #ifdef set "the old way" and one that allows more
>>>>>>>>> flexibility.
>>>>>>>>>
>>>>>>>>> (Is this the first time we've run into an occasion to make a separate
>>>>>>>>> Shenandaoah build?  Surely not.)
>>>>>>>> Primitive field accesses required similar jumping through hoops ;-)
>>>>>>>>
>>>>>>>>>> So... with inlining oopDesc::equals() we will get one straight call
>>>>>>>>>> (from the caller of oopDesc::equals() to the impl), instead of 2
>>>>>>>>>> (caller
>>>>>>>>>> -> oopDesc::equals() -> impl).
>>>>>>>>>>
>>>>>>>>>> It's the best I could come up with that does what (Shenandoah) GC
>>>>>>>>>> needs,
>>>>>>>>>> and it hasn't shown up in any performance testing that we did, yet.
>>>>>>>>>>
>>>>>>>>>> Still good?
>>>>>>>>> I think it's too risky to out-of-line the "cmpq" instruction. Let's
>>>>>>>>> find an agreeable way to handle that via a trade-off that keeps
>>>>>>>>> the old code shape in VM builds which don't need the new code
>>>>>>>>> shape.  I'm speaking up here because I don't believe this is the
>>>>>>>>> last time we'll have to consider adding something gross like an
>>>>>>>>> #ifdef to support new features, and I want to aim at trade-offs
>>>>>>>>> that keep the existing hot paths efficient, while still allowing
>>>>>>>>> new code shapes to become part of the mix.
>>>>>>>>>
>>>>>>>>> (I'd be happy to see something cleaner than an #ifdef, of
>>>>>>>>> course.  Seems to me that a constant non-product flag
>>>>>>>>> -XX:+ShortCircuitOopEquals would work too, with a
>>>>>>>>> binding of falseInShenandoah.)
>>>>>>>> That would require coding 2 versions of hot loops that involve oop==oop
>>>>>>>> and switch to one or the other based on that flag. I have checked all
>>>>>>>> the uses of oopDesc::equals() in my patch, and none seemed
>>>>>>>> particularily
>>>>>>>> critical. I think this would be overkill.
>>>>>>>>
>>>>>>>> And notice also that none interpreter/c1/c2 compiled code will suffer
>>>>>>>> this problem because we'll just generate the right code at runtime.
>>>>>>>>
>>>>>>>> Roman
>>>>>>>>
>