RFR: JDK-8199781: Don't use naked == for comparing oops
Erik Osterlund
erik.osterlund at oracle.com
Wed Mar 28 20:11:06 UTC 2018
Hi Roman,
Thanks for the review.
/Erik
> On 28 Mar 2018, at 21:56, Roman Kennke <rkennke at redhat.com> wrote:
>
>
>>
>> Incremental webrev:
>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01_02/
>>
>> Full webrev:
>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/
>
> Looks good to me. (Well. As good as all this template-voodoo can look
> ;-) Better than before though).
>
> Cheers,
> Roman
>
>
>>
>>> Thank you for solving the inline problem with oopDesc::equals with
>>> this. I haven't fully reviewed it but don't wait for me.
>>
>> Thank you for having a look at this.
>>
>> /Erik
>>
>>> Thanks,
>>> Coleen
>>>
>>>> On 3/28/18 9:00 AM, Erik Österlund wrote:
>>>> Hi,
>>>>
>>>> The access.hpp support has reincarnated.
>>>>
>>>> Here is an incremental patch on Roman's work (rebased on latest
>>>> jdk-hs) that makes it possible to include access.hpp to call
>>>> Access<>::equals specifically. The infrastructure allows picking more
>>>> accesses that can be used from the hpp file, but I have kept it down
>>>> to the minimum requirements now, so for now it only works for equals.
>>>> Perhaps in the future, we might want to move more operations to be
>>>> reachable from hpp, but not today.
>>>>
>>>> I changed the involved header files to include access.hpp, and made
>>>> it possible to inline this again for non-Shenandoah builds, by
>>>> checking for the INTERNAL_BT_TO_SPACE_INVARIANT build-time decorator.
>>>>
>>>> Builds that only have to-space invariant GCs inline equals to a
>>>> normal native comparison of the pointers. Builds that have at least
>>>> one to-space invariant GC instead generate a single function pointer
>>>> call. This function pointer may be resolved in the access.cpp file,
>>>> instead of the access.inline.hpp file.
>>>>
>>>> Here is an explanation how this works:
>>>>
>>>> The template pipeline for each operation consists of 5 steps, and
>>>> they were previously all performed in the access.inline.hpp file. The
>>>> steps are:
>>>>
>>>> 1) Set default decorators and decay types
>>>> 2) Reduce types
>>>> 3) Pre-runtime dispatch
>>>> 4) Runtime-dispatch
>>>> 5.a) Barrier resolution (check which GC backend accessor to be used
>>>> upon the first invocation, then patch function pointer to 5.b)
>>>> 5.b) Post-runtime dispatch
>>>>
>>>> I have moved steps 1-4 to the accessBackend.hpp file, which is
>>>> included by access.hpp. This allows performing all steps until the
>>>> backend barrier resolution in only hpp files. This makes it possible
>>>> to use, e.g. raw accesses (although without compressed oops, atomic
>>>> or order access) from only the access.hpp file. The resolution of
>>>> barriers in the GC backends (comprising steps 4-5) remains in the
>>>> access.inline.hpp file. But they can be instantiated from the
>>>> access.cpp file for well chosen accesses and decorator combinations
>>>> that can be pre-generated in the libjvm.so file. The function
>>>> pointers used in builds that have not to-space invariant GCs will be
>>>> resolved to point right into these functions in the access.cpp file
>>>> at runtime.
>>>>
>>>> This change required moving the decorators to a new file called
>>>> accessDecorators.hpp. I kind of like having them there in a separate
>>>> file.
>>>>
>>>> Incremental webrev to Roman's patch:
>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.00_01/
>>>>
>>>> Full webrev:
>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01/
>>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>>> On 2018-03-27 09:14, Roman Kennke wrote:
>>>>>> Am 27.03.2018 um 00:46 schrieb John Rose:
>>>>>>> On Mar 26, 2018, at 2:21 PM, Roman Kennke <rkennke at redhat.com> wrote:
>>>>>>>> Am 26.03.2018 um 21:11 schrieb John Rose:
>>>>>>>> ...
>>>>>>>> This is clearly one of those cases. Let's code against surprises
>>>>>>>> (both present and future) by abstracting simpler operations with
>>>>>>>> inlines.
>>>>>>> Thanks John for clarifying. I generally do agree with you that it
>>>>>>> makes
>>>>>>> sense to inline something like an operator that will compile to a
>>>>>>> single
>>>>>>> instruction.
>>>>>>>
>>>>>>> Unfortunately, this new abstraction will generate a non-inlineable
>>>>>>> call.
>>>>>>> Thanks Erik's work with the Access API, it should be a straight
>>>>>>> call to
>>>>>>> the right implementation, not even a virtual call, but it's a call
>>>>>>> nonetheless.
>>>>>> What would it take to inline the call, so there's no new performance
>>>>>> hazard introduced?
>>>>>>
>>>>>> This might be a place to add #ifdefs, if the templates don't let us
>>>>>> control the decision statically.
>>>>>>
>>>>>> (I confess I find the templates complex and hard to reason about.
>>>>>> But that's a potential maintenance issue with the access API, not
>>>>>> your work.)
>>>>> The Access API supports static inclusion/exclusion of stuff, and we've
>>>>> used it before to ensable barriers on primitive field accesses for
>>>>> Shenandoah-enabled builds. It works by passing
>>>>> -DSUPPORT_BARRIER_ON_PRIMITIVES to gcc. This is much better than
>>>>> #ifdefs
>>>>> and does the same thing. I can add something similar for object
>>>>> equality.
>>>>>
>>>>>>> The reason why I'm proposing it is that the GC might want to have
>>>>>>> a say
>>>>>>> about object equality. In the case of Shenandoah, there may be two
>>>>>>> copies of any one object floating around, and comparing the two
>>>>>>> copies
>>>>>>> might otherwise result in false negatives.
>>>>>> 1. Of course we need a way for some GC algorithms to control
>>>>>> object identity.
>>>>>>
>>>>>> 2. It's good coding discipline to find all the places where native
>>>>>> op== is used and upgrade them to the needed control.
>>>>> I hope I did that with the proposed patch (it's based on years of
>>>>> finding all those places in Shenandoah land ;-) ).
>>>>>
>>>>>> (2a. We also need a way to avoid having op== creep back in
>>>>>> in an uncontrolled way.)
>>>>> We have something for this in Shenandoah too. It works by overriding ==
>>>>> in oopsHierarch.hpp and running into ShouldNotReachHere() if both
>>>>> operands are != NULL. This catches all uses of naked == at runtime,
>>>>> when
>>>>> running with +CheckUnhandledOops. It's not perfect, because it doesn't
>>>>> catch uses in places that are not executed in tests, but it's still
>>>>> incredibly useful. The caveat here is that we also need another
>>>>> abstraction for code paths that actually want naked ==, e.g. in the GC.
>>>>> In Shenandoah we have an oopDesc::unsafe_equals() for that. The other
>>>>> alternative would be to drop operators from oop to oopDesc* and compare
>>>>> that (which is exactly what oopDesc::unsafe_equals() does).
>>>>>
>>>>>> 3. We still need an unsurprising performance model for GC's
>>>>>> which don't require the extra identity hook.
>>>>> Yeah. Unfortunately this is C++ and not Java, and the decision which GC
>>>>> to use is made at runtime. The 'best' that I can see is to rip out the
>>>>> indirection in Shenandoah-disabled builds, as proposed above. I don't
>>>>> like it much, but if this is the only way we can get Shenandoah in,
>>>>> then
>>>>> let's go for it?
>>>>>
>>>>>> So we're not there yet, I think. If there's no easy way to adjust
>>>>>> the access API to get the last inlining, I suggest adding an #ifdef
>>>>>> into oopDesc::equals.
>>>>>>
>>>>>> I suspect the happy final solution will be to templatize hot loops
>>>>>> using making the inlinable adjusted op== be part of the "decorations"
>>>>>> of those loops. Then we will have two copies of hot loops in the
>>>>>> JVM, one with the out-of-line hook and one without.
>>>>>>
>>>>>> For now, I think it's enough to have separate builds of the JVM,
>>>>>> one with the #ifdef set "the old way" and one that allows more
>>>>>> flexibility.
>>>>>>
>>>>>> (Is this the first time we've run into an occasion to make a separate
>>>>>> Shenandaoah build? Surely not.)
>>>>> Primitive field accesses required similar jumping through hoops ;-)
>>>>>
>>>>>>> So... with inlining oopDesc::equals() we will get one straight call
>>>>>>> (from the caller of oopDesc::equals() to the impl), instead of 2
>>>>>>> (caller
>>>>>>> -> oopDesc::equals() -> impl).
>>>>>>>
>>>>>>> It's the best I could come up with that does what (Shenandoah) GC
>>>>>>> needs,
>>>>>>> and it hasn't shown up in any performance testing that we did, yet.
>>>>>>>
>>>>>>> Still good?
>>>>>> I think it's too risky to out-of-line the "cmpq" instruction. Let's
>>>>>> find an agreeable way to handle that via a trade-off that keeps
>>>>>> the old code shape in VM builds which don't need the new code
>>>>>> shape. I'm speaking up here because I don't believe this is the
>>>>>> last time we'll have to consider adding something gross like an
>>>>>> #ifdef to support new features, and I want to aim at trade-offs
>>>>>> that keep the existing hot paths efficient, while still allowing
>>>>>> new code shapes to become part of the mix.
>>>>>>
>>>>>> (I'd be happy to see something cleaner than an #ifdef, of
>>>>>> course. Seems to me that a constant non-product flag
>>>>>> -XX:+ShortCircuitOopEquals would work too, with a
>>>>>> binding of falseInShenandoah.)
>>>>> That would require coding 2 versions of hot loops that involve oop==oop
>>>>> and switch to one or the other based on that flag. I have checked all
>>>>> the uses of oopDesc::equals() in my patch, and none seemed
>>>>> particularily
>>>>> critical. I think this would be overkill.
>>>>>
>>>>> And notice also that none interpreter/c1/c2 compiled code will suffer
>>>>> this problem because we'll just generate the right code at runtime.
>>>>>
>>>>> Roman
>>>>>
>>>>
>>>
>>
>
>
More information about the hotspot-runtime-dev
mailing list