RFR: JDK-8199781: Don't use naked == for comparing oops

Wed Mar 28 15:32:18 UTC 2018

Hi Coleen,

On 2018-03-28 15:57, coleen.phillimore at oracle.com wrote:
>
> Erik,
>
> I'm so glad you are splitting up the access.hpp file, and really like 
> the new accessDecorators.hpp file.

I am glad you like it. That is encouraging.

> access.hpp has AccessInternal in it but so does accessBackend.hpp. Can 
> the AccessInternal implementation be moved from access.hpp to 
> accessBackend.hpp ?   Then access.hpp will be somewhat understandable 
> to the casual reader and user, without having to dig into the template 
> details.

Sure.

Incremental webrev:
http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01_02/

Full webrev:
http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/

> Thank you for solving the inline problem with oopDesc::equals with 
> this.   I haven't fully reviewed it but don't wait for me.

Thank you for having a look at this.

/Erik

> Thanks,
> Coleen
>
> On 3/28/18 9:00 AM, Erik Österlund wrote:
>> Hi,
>>
>> The access.hpp support has reincarnated.
>>
>> Here is an incremental patch on Roman's work (rebased on latest 
>> jdk-hs) that makes it possible to include access.hpp to call 
>> Access<>::equals specifically. The infrastructure allows picking more 
>> accesses that can be used from the hpp file, but I have kept it down 
>> to the minimum requirements now, so for now it only works for equals. 
>> Perhaps in the future, we might want to move more operations to be 
>> reachable from hpp, but not today.
>>
>> I changed the involved header files to include access.hpp, and made 
>> it possible to inline this again for non-Shenandoah builds, by 
>> checking for the INTERNAL_BT_TO_SPACE_INVARIANT build-time decorator.
>>
>> Builds that only have to-space invariant GCs inline equals to a 
>> normal native comparison of the pointers. Builds that have at least 
>> one to-space invariant GC instead generate a single function pointer 
>> call. This function pointer may be resolved in the access.cpp file, 
>> instead of the access.inline.hpp file.
>>
>> Here is an explanation how this works:
>>
>> The template pipeline for each operation consists of 5 steps, and 
>> they were previously all performed in the access.inline.hpp file. The 
>> steps are:
>>
>> 1) Set default decorators and decay types
>> 2) Reduce types
>> 3) Pre-runtime dispatch
>> 4) Runtime-dispatch
>> 5.a) Barrier resolution (check which GC backend accessor to be used 
>> upon the first invocation, then patch function pointer to 5.b)
>> 5.b) Post-runtime dispatch
>>
>> I have moved steps 1-4 to the accessBackend.hpp file, which is 
>> included by access.hpp. This allows performing all steps until the 
>> backend barrier resolution in only hpp files. This makes it possible 
>> to use, e.g. raw accesses (although without compressed oops, atomic 
>> or order access) from only the access.hpp file. The resolution of 
>> barriers in the GC backends (comprising steps 4-5) remains in the 
>> access.inline.hpp file. But they can be instantiated from the 
>> access.cpp file for well chosen accesses and decorator combinations 
>> that can be pre-generated in the libjvm.so file. The function 
>> pointers used in builds that have not to-space invariant GCs will be 
>> resolved to point right into these functions in the access.cpp file 
>> at runtime.
>>
>> This change required moving the decorators to a new file called 
>> accessDecorators.hpp. I kind of like having them there in a separate 
>> file.
>>
>> Incremental webrev to Roman's patch:
>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.00_01/
>>
>> Full webrev:
>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01/
>>
>> Thanks,
>> /Erik
>>
>> On 2018-03-27 09:14, Roman Kennke wrote:
>>> Am 27.03.2018 um 00:46 schrieb John Rose:
>>>> On Mar 26, 2018, at 2:21 PM, Roman Kennke <rkennke at redhat.com> wrote:
>>>>> Am 26.03.2018 um 21:11 schrieb John Rose:
>>>>>> ...
>>>>>> This is clearly one of those cases.  Let's code against surprises
>>>>>> (both present and future) by abstracting simpler operations with
>>>>>> inlines.
>>>>> Thanks John for clarifying. I generally do agree with you that it 
>>>>> makes
>>>>> sense to inline something like an operator that will compile to a 
>>>>> single
>>>>> instruction.
>>>>>
>>>>> Unfortunately, this new abstraction will generate a non-inlineable 
>>>>> call.
>>>>> Thanks Erik's work with the Access API, it should be a straight 
>>>>> call to
>>>>> the right implementation, not even a virtual call, but it's a call
>>>>> nonetheless.
>>>> What would it take to inline the call, so there's no new performance
>>>> hazard introduced?
>>>>
>>>> This might be a place to add #ifdefs, if the templates don't let us
>>>> control the decision statically.
>>>>
>>>> (I confess I find the templates complex and hard to reason about.
>>>> But that's a potential maintenance issue with the access API, not
>>>> your work.)
>>> The Access API supports static inclusion/exclusion of stuff, and we've
>>> used it before to ensable barriers on primitive field accesses for
>>> Shenandoah-enabled builds. It works by passing
>>> -DSUPPORT_BARRIER_ON_PRIMITIVES to gcc. This is much better than 
>>> #ifdefs
>>> and does the same thing. I can add something similar for object 
>>> equality.
>>>
>>>>> The reason why I'm proposing it is that the GC might want to have 
>>>>> a say
>>>>> about object equality. In the case of Shenandoah, there may be two
>>>>> copies of any one object floating around, and comparing the two 
>>>>> copies
>>>>> might otherwise result in false negatives.
>>>> 1. Of course we need a way for some GC algorithms to control
>>>> object identity.
>>>>
>>>> 2. It's good coding discipline to find all the places where native
>>>> op== is used and upgrade them to the needed control.
>>> I hope I did that with the proposed patch (it's based on years of
>>> finding all those places in Shenandoah land ;-) ).
>>>
>>>> (2a. We also need a way to avoid having op== creep back in
>>>> in an uncontrolled way.)
>>> We have something for this in Shenandoah too. It works by overriding ==
>>> in oopsHierarch.hpp and running into ShouldNotReachHere() if both
>>> operands are != NULL. This catches all uses of naked == at runtime, 
>>> when
>>> running with +CheckUnhandledOops. It's not perfect, because it doesn't
>>> catch uses in places that are not executed in tests, but it's still
>>> incredibly useful. The caveat here is that we also need another
>>> abstraction for code paths that actually want naked ==, e.g. in the GC.
>>> In Shenandoah we have an oopDesc::unsafe_equals() for that. The other
>>> alternative would be to drop operators from oop to oopDesc* and compare
>>> that (which is exactly what oopDesc::unsafe_equals() does).
>>>
>>>> 3. We still need an unsurprising performance model for GC's
>>>> which don't require the extra identity hook.
>>> Yeah. Unfortunately this is C++ and not Java, and the decision which GC
>>> to use is made at runtime. The 'best' that I can see is to rip out the
>>> indirection in Shenandoah-disabled builds, as proposed above. I don't
>>> like it much, but if this is the only way we can get Shenandoah in, 
>>> then
>>> let's go for it?
>>>
>>>> So we're not there yet, I think.  If there's no easy way to adjust
>>>> the access API to get the last inlining, I suggest adding an #ifdef
>>>> into oopDesc::equals.
>>>>
>>>> I suspect the happy final solution will be to templatize hot loops
>>>> using making the inlinable adjusted op== be part of the "decorations"
>>>> of those loops.  Then we will have two copies of hot loops in the
>>>> JVM, one with the out-of-line hook and one without.
>>>>
>>>> For now, I think it's enough to have separate builds of the JVM,
>>>> one with the #ifdef set "the old way" and one that allows more
>>>> flexibility.
>>>>
>>>> (Is this the first time we've run into an occasion to make a separate
>>>> Shenandaoah build?  Surely not.)
>>> Primitive field accesses required similar jumping through hoops ;-)
>>>
>>>>> So... with inlining oopDesc::equals() we will get one straight call
>>>>> (from the caller of oopDesc::equals() to the impl), instead of 2 
>>>>> (caller
>>>>> -> oopDesc::equals() -> impl).
>>>>>
>>>>> It's the best I could come up with that does what (Shenandoah) GC 
>>>>> needs,
>>>>> and it hasn't shown up in any performance testing that we did, yet.
>>>>>
>>>>> Still good?
>>>> I think it's too risky to out-of-line the "cmpq" instruction. Let's
>>>> find an agreeable way to handle that via a trade-off that keeps
>>>> the old code shape in VM builds which don't need the new code
>>>> shape.  I'm speaking up here because I don't believe this is the
>>>> last time we'll have to consider adding something gross like an
>>>> #ifdef to support new features, and I want to aim at trade-offs
>>>> that keep the existing hot paths efficient, while still allowing
>>>> new code shapes to become part of the mix.
>>>>
>>>> (I'd be happy to see something cleaner than an #ifdef, of
>>>> course.  Seems to me that a constant non-product flag
>>>> -XX:+ShortCircuitOopEquals would work too, with a
>>>> binding of falseInShenandoah.)
>>> That would require coding 2 versions of hot loops that involve oop==oop
>>> and switch to one or the other based on that flag. I have checked all
>>> the uses of oopDesc::equals() in my patch, and none seemed 
>>> particularily
>>> critical. I think this would be overkill.
>>>
>>> And notice also that none interpreter/c1/c2 compiled code will suffer
>>> this problem because we'll just generate the right code at runtime.
>>>
>>> Roman
>>>
>>
>