RFR: JDK-8199781: Don't use naked == for comparing oops

Wed Mar 28 19:56:00 UTC 2018

> 
> Incremental webrev:
> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01_02/
> 
> Full webrev:
> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/

Looks good to me. (Well. As good as all this template-voodoo can look
;-) Better than before though).

Cheers,
Roman

> 
>> Thank you for solving the inline problem with oopDesc::equals with
>> this.   I haven't fully reviewed it but don't wait for me.
> 
> Thank you for having a look at this.
> 
> /Erik
> 
>> Thanks,
>> Coleen
>>
>> On 3/28/18 9:00 AM, Erik Österlund wrote:
>>> Hi,
>>>
>>> The access.hpp support has reincarnated.
>>>
>>> Here is an incremental patch on Roman's work (rebased on latest
>>> jdk-hs) that makes it possible to include access.hpp to call
>>> Access<>::equals specifically. The infrastructure allows picking more
>>> accesses that can be used from the hpp file, but I have kept it down
>>> to the minimum requirements now, so for now it only works for equals.
>>> Perhaps in the future, we might want to move more operations to be
>>> reachable from hpp, but not today.
>>>
>>> I changed the involved header files to include access.hpp, and made
>>> it possible to inline this again for non-Shenandoah builds, by
>>> checking for the INTERNAL_BT_TO_SPACE_INVARIANT build-time decorator.
>>>
>>> Builds that only have to-space invariant GCs inline equals to a
>>> normal native comparison of the pointers. Builds that have at least
>>> one to-space invariant GC instead generate a single function pointer
>>> call. This function pointer may be resolved in the access.cpp file,
>>> instead of the access.inline.hpp file.
>>>
>>> Here is an explanation how this works:
>>>
>>> The template pipeline for each operation consists of 5 steps, and
>>> they were previously all performed in the access.inline.hpp file. The
>>> steps are:
>>>
>>> 1) Set default decorators and decay types
>>> 2) Reduce types
>>> 3) Pre-runtime dispatch
>>> 4) Runtime-dispatch
>>> 5.a) Barrier resolution (check which GC backend accessor to be used
>>> upon the first invocation, then patch function pointer to 5.b)
>>> 5.b) Post-runtime dispatch
>>>
>>> I have moved steps 1-4 to the accessBackend.hpp file, which is
>>> included by access.hpp. This allows performing all steps until the
>>> backend barrier resolution in only hpp files. This makes it possible
>>> to use, e.g. raw accesses (although without compressed oops, atomic
>>> or order access) from only the access.hpp file. The resolution of
>>> barriers in the GC backends (comprising steps 4-5) remains in the
>>> access.inline.hpp file. But they can be instantiated from the
>>> access.cpp file for well chosen accesses and decorator combinations
>>> that can be pre-generated in the libjvm.so file. The function
>>> pointers used in builds that have not to-space invariant GCs will be
>>> resolved to point right into these functions in the access.cpp file
>>> at runtime.
>>>
>>> This change required moving the decorators to a new file called
>>> accessDecorators.hpp. I kind of like having them there in a separate
>>> file.
>>>
>>> Incremental webrev to Roman's patch:
>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.00_01/
>>>
>>> Full webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01/
>>>
>>> Thanks,
>>> /Erik
>>>
>>> On 2018-03-27 09:14, Roman Kennke wrote:
>>>> Am 27.03.2018 um 00:46 schrieb John Rose:
>>>>> On Mar 26, 2018, at 2:21 PM, Roman Kennke <rkennke at redhat.com> wrote:
>>>>>> Am 26.03.2018 um 21:11 schrieb John Rose:
>>>>>>> ...
>>>>>>> This is clearly one of those cases.  Let's code against surprises
>>>>>>> (both present and future) by abstracting simpler operations with
>>>>>>> inlines.
>>>>>> Thanks John for clarifying. I generally do agree with you that it
>>>>>> makes
>>>>>> sense to inline something like an operator that will compile to a
>>>>>> single
>>>>>> instruction.
>>>>>>
>>>>>> Unfortunately, this new abstraction will generate a non-inlineable
>>>>>> call.
>>>>>> Thanks Erik's work with the Access API, it should be a straight
>>>>>> call to
>>>>>> the right implementation, not even a virtual call, but it's a call
>>>>>> nonetheless.
>>>>> What would it take to inline the call, so there's no new performance
>>>>> hazard introduced?
>>>>>
>>>>> This might be a place to add #ifdefs, if the templates don't let us
>>>>> control the decision statically.
>>>>>
>>>>> (I confess I find the templates complex and hard to reason about.
>>>>> But that's a potential maintenance issue with the access API, not
>>>>> your work.)
>>>> The Access API supports static inclusion/exclusion of stuff, and we've
>>>> used it before to ensable barriers on primitive field accesses for
>>>> Shenandoah-enabled builds. It works by passing
>>>> -DSUPPORT_BARRIER_ON_PRIMITIVES to gcc. This is much better than
>>>> #ifdefs
>>>> and does the same thing. I can add something similar for object
>>>> equality.
>>>>
>>>>>> The reason why I'm proposing it is that the GC might want to have
>>>>>> a say
>>>>>> about object equality. In the case of Shenandoah, there may be two
>>>>>> copies of any one object floating around, and comparing the two
>>>>>> copies
>>>>>> might otherwise result in false negatives.
>>>>> 1. Of course we need a way for some GC algorithms to control
>>>>> object identity.
>>>>>
>>>>> 2. It's good coding discipline to find all the places where native
>>>>> op== is used and upgrade them to the needed control.
>>>> I hope I did that with the proposed patch (it's based on years of
>>>> finding all those places in Shenandoah land ;-) ).
>>>>
>>>>> (2a. We also need a way to avoid having op== creep back in
>>>>> in an uncontrolled way.)
>>>> We have something for this in Shenandoah too. It works by overriding ==
>>>> in oopsHierarch.hpp and running into ShouldNotReachHere() if both
>>>> operands are != NULL. This catches all uses of naked == at runtime,
>>>> when
>>>> running with +CheckUnhandledOops. It's not perfect, because it doesn't
>>>> catch uses in places that are not executed in tests, but it's still
>>>> incredibly useful. The caveat here is that we also need another
>>>> abstraction for code paths that actually want naked ==, e.g. in the GC.
>>>> In Shenandoah we have an oopDesc::unsafe_equals() for that. The other
>>>> alternative would be to drop operators from oop to oopDesc* and compare
>>>> that (which is exactly what oopDesc::unsafe_equals() does).
>>>>
>>>>> 3. We still need an unsurprising performance model for GC's
>>>>> which don't require the extra identity hook.
>>>> Yeah. Unfortunately this is C++ and not Java, and the decision which GC
>>>> to use is made at runtime. The 'best' that I can see is to rip out the
>>>> indirection in Shenandoah-disabled builds, as proposed above. I don't
>>>> like it much, but if this is the only way we can get Shenandoah in,
>>>> then
>>>> let's go for it?
>>>>
>>>>> So we're not there yet, I think.  If there's no easy way to adjust
>>>>> the access API to get the last inlining, I suggest adding an #ifdef
>>>>> into oopDesc::equals.
>>>>>
>>>>> I suspect the happy final solution will be to templatize hot loops
>>>>> using making the inlinable adjusted op== be part of the "decorations"
>>>>> of those loops.  Then we will have two copies of hot loops in the
>>>>> JVM, one with the out-of-line hook and one without.
>>>>>
>>>>> For now, I think it's enough to have separate builds of the JVM,
>>>>> one with the #ifdef set "the old way" and one that allows more
>>>>> flexibility.
>>>>>
>>>>> (Is this the first time we've run into an occasion to make a separate
>>>>> Shenandaoah build?  Surely not.)
>>>> Primitive field accesses required similar jumping through hoops ;-)
>>>>
>>>>>> So... with inlining oopDesc::equals() we will get one straight call
>>>>>> (from the caller of oopDesc::equals() to the impl), instead of 2
>>>>>> (caller
>>>>>> -> oopDesc::equals() -> impl).
>>>>>>
>>>>>> It's the best I could come up with that does what (Shenandoah) GC
>>>>>> needs,
>>>>>> and it hasn't shown up in any performance testing that we did, yet.
>>>>>>
>>>>>> Still good?
>>>>> I think it's too risky to out-of-line the "cmpq" instruction. Let's
>>>>> find an agreeable way to handle that via a trade-off that keeps
>>>>> the old code shape in VM builds which don't need the new code
>>>>> shape.  I'm speaking up here because I don't believe this is the
>>>>> last time we'll have to consider adding something gross like an
>>>>> #ifdef to support new features, and I want to aim at trade-offs
>>>>> that keep the existing hot paths efficient, while still allowing
>>>>> new code shapes to become part of the mix.
>>>>>
>>>>> (I'd be happy to see something cleaner than an #ifdef, of
>>>>> course.  Seems to me that a constant non-product flag
>>>>> -XX:+ShortCircuitOopEquals would work too, with a
>>>>> binding of falseInShenandoah.)
>>>> That would require coding 2 versions of hot loops that involve oop==oop
>>>> and switch to one or the other based on that flag. I have checked all
>>>> the uses of oopDesc::equals() in my patch, and none seemed
>>>> particularily
>>>> critical. I think this would be overkill.
>>>>
>>>> And notice also that none interpreter/c1/c2 compiled code will suffer
>>>> this problem because we'll just generate the right code at runtime.
>>>>
>>>> Roman
>>>>
>>>
>>
>