RFR: JDK-8199781: Don't use naked == for comparing oops

Wed Mar 28 22:51:00 UTC 2018

Hi Coleen,

> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/src/hotspot/share/prims/jvm.cpp.udiff.html
> 
> + if ((!oopDesc::equals(previous_protection_domain, protection_domain))
> && (protection_domain != NULL)) {
> 
> 
> So we don't need oopDesc::equals for NULL?

No. Unless we ever want to use an actual object for null, but in this
case we wouldn't use NULL I suppose ;-)

> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/src/hotspot/share/oops/accessBackend.new.hpp.html
> 
> So against my better judgement, I was trying to follow the bouncing
> template instantiation ball.  And I guess we end up calling this:
> 
> RawAccessBarrier<>
> 
>  405   static bool equals(oop o1, oop o2) { return o1 == o2; }
> 
> 
> Can you show how this decomposes from:
> 
> + inline static bool equals(oop o1, oop o2) { return
> Access<>::equals(o1, o2); }

Uh-oh ... You really want to know, do you? ;-) (I leave it to Erik to
give an actual answer ..)

Thanks Coleen for reviewing!

Roman

> Thank you for getting this to inline and supporting Roman's patch.
> 
> Coleen
> 
> On 3/28/18 4:11 PM, Erik Osterlund wrote:
>> Hi Roman,
>>
>> Thanks for the review.
>>
>> /Erik
>>
>>> On 28 Mar 2018, at 21:56, Roman Kennke <rkennke at redhat.com> wrote:
>>>
>>>
>>>> Incremental webrev:
>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01_02/
>>>>
>>>> Full webrev:
>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/
>>> Looks good to me. (Well. As good as all this template-voodoo can look
>>> ;-) Better than before though).
>>>
>>> Cheers,
>>> Roman
>>>
>>>
>>>>> Thank you for solving the inline problem with oopDesc::equals with
>>>>> this.   I haven't fully reviewed it but don't wait for me.
>>>> Thank you for having a look at this.
>>>>
>>>> /Erik
>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>>> On 3/28/18 9:00 AM, Erik Österlund wrote:
>>>>>> Hi,
>>>>>>
>>>>>> The access.hpp support has reincarnated.
>>>>>>
>>>>>> Here is an incremental patch on Roman's work (rebased on latest
>>>>>> jdk-hs) that makes it possible to include access.hpp to call
>>>>>> Access<>::equals specifically. The infrastructure allows picking more
>>>>>> accesses that can be used from the hpp file, but I have kept it down
>>>>>> to the minimum requirements now, so for now it only works for equals.
>>>>>> Perhaps in the future, we might want to move more operations to be
>>>>>> reachable from hpp, but not today.
>>>>>>
>>>>>> I changed the involved header files to include access.hpp, and made
>>>>>> it possible to inline this again for non-Shenandoah builds, by
>>>>>> checking for the INTERNAL_BT_TO_SPACE_INVARIANT build-time decorator.
>>>>>>
>>>>>> Builds that only have to-space invariant GCs inline equals to a
>>>>>> normal native comparison of the pointers. Builds that have at least
>>>>>> one to-space invariant GC instead generate a single function pointer
>>>>>> call. This function pointer may be resolved in the access.cpp file,
>>>>>> instead of the access.inline.hpp file.
>>>>>>
>>>>>> Here is an explanation how this works:
>>>>>>
>>>>>> The template pipeline for each operation consists of 5 steps, and
>>>>>> they were previously all performed in the access.inline.hpp file. The
>>>>>> steps are:
>>>>>>
>>>>>> 1) Set default decorators and decay types
>>>>>> 2) Reduce types
>>>>>> 3) Pre-runtime dispatch
>>>>>> 4) Runtime-dispatch
>>>>>> 5.a) Barrier resolution (check which GC backend accessor to be used
>>>>>> upon the first invocation, then patch function pointer to 5.b)
>>>>>> 5.b) Post-runtime dispatch
>>>>>>
>>>>>> I have moved steps 1-4 to the accessBackend.hpp file, which is
>>>>>> included by access.hpp. This allows performing all steps until the
>>>>>> backend barrier resolution in only hpp files. This makes it possible
>>>>>> to use, e.g. raw accesses (although without compressed oops, atomic
>>>>>> or order access) from only the access.hpp file. The resolution of
>>>>>> barriers in the GC backends (comprising steps 4-5) remains in the
>>>>>> access.inline.hpp file. But they can be instantiated from the
>>>>>> access.cpp file for well chosen accesses and decorator combinations
>>>>>> that can be pre-generated in the libjvm.so file. The function
>>>>>> pointers used in builds that have not to-space invariant GCs will be
>>>>>> resolved to point right into these functions in the access.cpp file
>>>>>> at runtime.
>>>>>>
>>>>>> This change required moving the decorators to a new file called
>>>>>> accessDecorators.hpp. I kind of like having them there in a separate
>>>>>> file.
>>>>>>
>>>>>> Incremental webrev to Roman's patch:
>>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.00_01/
>>>>>>
>>>>>> Full webrev:
>>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01/
>>>>>>
>>>>>> Thanks,
>>>>>> /Erik
>>>>>>
>>>>>>> On 2018-03-27 09:14, Roman Kennke wrote:
>>>>>>>> Am 27.03.2018 um 00:46 schrieb John Rose:
>>>>>>>>> On Mar 26, 2018, at 2:21 PM, Roman Kennke <rkennke at redhat.com> wrote:
>>>>>>>>>> Am 26.03.2018 um 21:11 schrieb John Rose:
>>>>>>>>>> ...
>>>>>>>>>> This is clearly one of those cases.  Let's code against surprises
>>>>>>>>>> (both present and future) by abstracting simpler operations with
>>>>>>>>>> inlines.
>>>>>>>>> Thanks John for clarifying. I generally do agree with you that it
>>>>>>>>> makes
>>>>>>>>> sense to inline something like an operator that will compile to a
>>>>>>>>> single
>>>>>>>>> instruction.
>>>>>>>>>
>>>>>>>>> Unfortunately, this new abstraction will generate a non-inlineable
>>>>>>>>> call.
>>>>>>>>> Thanks Erik's work with the Access API, it should be a straight
>>>>>>>>> call to
>>>>>>>>> the right implementation, not even a virtual call, but it's a call
>>>>>>>>> nonetheless.
>>>>>>>> What would it take to inline the call, so there's no new performance
>>>>>>>> hazard introduced?
>>>>>>>>
>>>>>>>> This might be a place to add #ifdefs, if the templates don't let us
>>>>>>>> control the decision statically.
>>>>>>>>
>>>>>>>> (I confess I find the templates complex and hard to reason about.
>>>>>>>> But that's a potential maintenance issue with the access API, not
>>>>>>>> your work.)
>>>>>>> The Access API supports static inclusion/exclusion of stuff, and we've
>>>>>>> used it before to ensable barriers on primitive field accesses for
>>>>>>> Shenandoah-enabled builds. It works by passing
>>>>>>> -DSUPPORT_BARRIER_ON_PRIMITIVES to gcc. This is much better than
>>>>>>> #ifdefs
>>>>>>> and does the same thing. I can add something similar for object
>>>>>>> equality.
>>>>>>>
>>>>>>>>> The reason why I'm proposing it is that the GC might want to have
>>>>>>>>> a say
>>>>>>>>> about object equality. In the case of Shenandoah, there may be two
>>>>>>>>> copies of any one object floating around, and comparing the two
>>>>>>>>> copies
>>>>>>>>> might otherwise result in false negatives.
>>>>>>>> 1. Of course we need a way for some GC algorithms to control
>>>>>>>> object identity.
>>>>>>>>
>>>>>>>> 2. It's good coding discipline to find all the places where native
>>>>>>>> op== is used and upgrade them to the needed control.
>>>>>>> I hope I did that with the proposed patch (it's based on years of
>>>>>>> finding all those places in Shenandoah land ;-) ).
>>>>>>>
>>>>>>>> (2a. We also need a way to avoid having op== creep back in
>>>>>>>> in an uncontrolled way.)
>>>>>>> We have something for this in Shenandoah too. It works by overriding ==
>>>>>>> in oopsHierarch.hpp and running into ShouldNotReachHere() if both
>>>>>>> operands are != NULL. This catches all uses of naked == at runtime,
>>>>>>> when
>>>>>>> running with +CheckUnhandledOops. It's not perfect, because it doesn't
>>>>>>> catch uses in places that are not executed in tests, but it's still
>>>>>>> incredibly useful. The caveat here is that we also need another
>>>>>>> abstraction for code paths that actually want naked ==, e.g. in the GC.
>>>>>>> In Shenandoah we have an oopDesc::unsafe_equals() for that. The other
>>>>>>> alternative would be to drop operators from oop to oopDesc* and compare
>>>>>>> that (which is exactly what oopDesc::unsafe_equals() does).
>>>>>>>
>>>>>>>> 3. We still need an unsurprising performance model for GC's
>>>>>>>> which don't require the extra identity hook.
>>>>>>> Yeah. Unfortunately this is C++ and not Java, and the decision which GC
>>>>>>> to use is made at runtime. The 'best' that I can see is to rip out the
>>>>>>> indirection in Shenandoah-disabled builds, as proposed above. I don't
>>>>>>> like it much, but if this is the only way we can get Shenandoah in,
>>>>>>> then
>>>>>>> let's go for it?
>>>>>>>
>>>>>>>> So we're not there yet, I think.  If there's no easy way to adjust
>>>>>>>> the access API to get the last inlining, I suggest adding an #ifdef
>>>>>>>> into oopDesc::equals.
>>>>>>>>
>>>>>>>> I suspect the happy final solution will be to templatize hot loops
>>>>>>>> using making the inlinable adjusted op== be part of the "decorations"
>>>>>>>> of those loops.  Then we will have two copies of hot loops in the
>>>>>>>> JVM, one with the out-of-line hook and one without.
>>>>>>>>
>>>>>>>> For now, I think it's enough to have separate builds of the JVM,
>>>>>>>> one with the #ifdef set "the old way" and one that allows more
>>>>>>>> flexibility.
>>>>>>>>
>>>>>>>> (Is this the first time we've run into an occasion to make a separate
>>>>>>>> Shenandaoah build?  Surely not.)
>>>>>>> Primitive field accesses required similar jumping through hoops ;-)
>>>>>>>
>>>>>>>>> So... with inlining oopDesc::equals() we will get one straight call
>>>>>>>>> (from the caller of oopDesc::equals() to the impl), instead of 2
>>>>>>>>> (caller
>>>>>>>>> -> oopDesc::equals() -> impl).
>>>>>>>>>
>>>>>>>>> It's the best I could come up with that does what (Shenandoah) GC
>>>>>>>>> needs,
>>>>>>>>> and it hasn't shown up in any performance testing that we did, yet.
>>>>>>>>>
>>>>>>>>> Still good?
>>>>>>>> I think it's too risky to out-of-line the "cmpq" instruction. Let's
>>>>>>>> find an agreeable way to handle that via a trade-off that keeps
>>>>>>>> the old code shape in VM builds which don't need the new code
>>>>>>>> shape.  I'm speaking up here because I don't believe this is the
>>>>>>>> last time we'll have to consider adding something gross like an
>>>>>>>> #ifdef to support new features, and I want to aim at trade-offs
>>>>>>>> that keep the existing hot paths efficient, while still allowing
>>>>>>>> new code shapes to become part of the mix.
>>>>>>>>
>>>>>>>> (I'd be happy to see something cleaner than an #ifdef, of
>>>>>>>> course.  Seems to me that a constant non-product flag
>>>>>>>> -XX:+ShortCircuitOopEquals would work too, with a
>>>>>>>> binding of falseInShenandoah.)
>>>>>>> That would require coding 2 versions of hot loops that involve oop==oop
>>>>>>> and switch to one or the other based on that flag. I have checked all
>>>>>>> the uses of oopDesc::equals() in my patch, and none seemed
>>>>>>> particularily
>>>>>>> critical. I think this would be overkill.
>>>>>>>
>>>>>>> And notice also that none interpreter/c1/c2 compiled code will suffer
>>>>>>> this problem because we'll just generate the right code at runtime.
>>>>>>>
>>>>>>> Roman
>>>>>>>
>>>
>