RFR: JDK-8199781: Don't use naked == for comparing oops
Erik Österlund
erik.osterlund at oracle.com
Wed Mar 28 15:32:18 UTC 2018
Hi Coleen,
On 2018-03-28 15:57, coleen.phillimore at oracle.com wrote:
>
> Erik,
>
> I'm so glad you are splitting up the access.hpp file, and really like
> the new accessDecorators.hpp file.
I am glad you like it. That is encouraging.
> access.hpp has AccessInternal in it but so does accessBackend.hpp. Can
> the AccessInternal implementation be moved from access.hpp to
> accessBackend.hpp ? Then access.hpp will be somewhat understandable
> to the casual reader and user, without having to dig into the template
> details.
Sure.
Incremental webrev:
http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01_02/
Full webrev:
http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/
> Thank you for solving the inline problem with oopDesc::equals with
> this. I haven't fully reviewed it but don't wait for me.
Thank you for having a look at this.
/Erik
> Thanks,
> Coleen
>
> On 3/28/18 9:00 AM, Erik Österlund wrote:
>> Hi,
>>
>> The access.hpp support has reincarnated.
>>
>> Here is an incremental patch on Roman's work (rebased on latest
>> jdk-hs) that makes it possible to include access.hpp to call
>> Access<>::equals specifically. The infrastructure allows picking more
>> accesses that can be used from the hpp file, but I have kept it down
>> to the minimum requirements now, so for now it only works for equals.
>> Perhaps in the future, we might want to move more operations to be
>> reachable from hpp, but not today.
>>
>> I changed the involved header files to include access.hpp, and made
>> it possible to inline this again for non-Shenandoah builds, by
>> checking for the INTERNAL_BT_TO_SPACE_INVARIANT build-time decorator.
>>
>> Builds that only have to-space invariant GCs inline equals to a
>> normal native comparison of the pointers. Builds that have at least
>> one to-space invariant GC instead generate a single function pointer
>> call. This function pointer may be resolved in the access.cpp file,
>> instead of the access.inline.hpp file.
>>
>> Here is an explanation how this works:
>>
>> The template pipeline for each operation consists of 5 steps, and
>> they were previously all performed in the access.inline.hpp file. The
>> steps are:
>>
>> 1) Set default decorators and decay types
>> 2) Reduce types
>> 3) Pre-runtime dispatch
>> 4) Runtime-dispatch
>> 5.a) Barrier resolution (check which GC backend accessor to be used
>> upon the first invocation, then patch function pointer to 5.b)
>> 5.b) Post-runtime dispatch
>>
>> I have moved steps 1-4 to the accessBackend.hpp file, which is
>> included by access.hpp. This allows performing all steps until the
>> backend barrier resolution in only hpp files. This makes it possible
>> to use, e.g. raw accesses (although without compressed oops, atomic
>> or order access) from only the access.hpp file. The resolution of
>> barriers in the GC backends (comprising steps 4-5) remains in the
>> access.inline.hpp file. But they can be instantiated from the
>> access.cpp file for well chosen accesses and decorator combinations
>> that can be pre-generated in the libjvm.so file. The function
>> pointers used in builds that have not to-space invariant GCs will be
>> resolved to point right into these functions in the access.cpp file
>> at runtime.
>>
>> This change required moving the decorators to a new file called
>> accessDecorators.hpp. I kind of like having them there in a separate
>> file.
>>
>> Incremental webrev to Roman's patch:
>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.00_01/
>>
>> Full webrev:
>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01/
>>
>> Thanks,
>> /Erik
>>
>> On 2018-03-27 09:14, Roman Kennke wrote:
>>> Am 27.03.2018 um 00:46 schrieb John Rose:
>>>> On Mar 26, 2018, at 2:21 PM, Roman Kennke <rkennke at redhat.com> wrote:
>>>>> Am 26.03.2018 um 21:11 schrieb John Rose:
>>>>>> ...
>>>>>> This is clearly one of those cases. Let's code against surprises
>>>>>> (both present and future) by abstracting simpler operations with
>>>>>> inlines.
>>>>> Thanks John for clarifying. I generally do agree with you that it
>>>>> makes
>>>>> sense to inline something like an operator that will compile to a
>>>>> single
>>>>> instruction.
>>>>>
>>>>> Unfortunately, this new abstraction will generate a non-inlineable
>>>>> call.
>>>>> Thanks Erik's work with the Access API, it should be a straight
>>>>> call to
>>>>> the right implementation, not even a virtual call, but it's a call
>>>>> nonetheless.
>>>> What would it take to inline the call, so there's no new performance
>>>> hazard introduced?
>>>>
>>>> This might be a place to add #ifdefs, if the templates don't let us
>>>> control the decision statically.
>>>>
>>>> (I confess I find the templates complex and hard to reason about.
>>>> But that's a potential maintenance issue with the access API, not
>>>> your work.)
>>> The Access API supports static inclusion/exclusion of stuff, and we've
>>> used it before to ensable barriers on primitive field accesses for
>>> Shenandoah-enabled builds. It works by passing
>>> -DSUPPORT_BARRIER_ON_PRIMITIVES to gcc. This is much better than
>>> #ifdefs
>>> and does the same thing. I can add something similar for object
>>> equality.
>>>
>>>>> The reason why I'm proposing it is that the GC might want to have
>>>>> a say
>>>>> about object equality. In the case of Shenandoah, there may be two
>>>>> copies of any one object floating around, and comparing the two
>>>>> copies
>>>>> might otherwise result in false negatives.
>>>> 1. Of course we need a way for some GC algorithms to control
>>>> object identity.
>>>>
>>>> 2. It's good coding discipline to find all the places where native
>>>> op== is used and upgrade them to the needed control.
>>> I hope I did that with the proposed patch (it's based on years of
>>> finding all those places in Shenandoah land ;-) ).
>>>
>>>> (2a. We also need a way to avoid having op== creep back in
>>>> in an uncontrolled way.)
>>> We have something for this in Shenandoah too. It works by overriding ==
>>> in oopsHierarch.hpp and running into ShouldNotReachHere() if both
>>> operands are != NULL. This catches all uses of naked == at runtime,
>>> when
>>> running with +CheckUnhandledOops. It's not perfect, because it doesn't
>>> catch uses in places that are not executed in tests, but it's still
>>> incredibly useful. The caveat here is that we also need another
>>> abstraction for code paths that actually want naked ==, e.g. in the GC.
>>> In Shenandoah we have an oopDesc::unsafe_equals() for that. The other
>>> alternative would be to drop operators from oop to oopDesc* and compare
>>> that (which is exactly what oopDesc::unsafe_equals() does).
>>>
>>>> 3. We still need an unsurprising performance model for GC's
>>>> which don't require the extra identity hook.
>>> Yeah. Unfortunately this is C++ and not Java, and the decision which GC
>>> to use is made at runtime. The 'best' that I can see is to rip out the
>>> indirection in Shenandoah-disabled builds, as proposed above. I don't
>>> like it much, but if this is the only way we can get Shenandoah in,
>>> then
>>> let's go for it?
>>>
>>>> So we're not there yet, I think. If there's no easy way to adjust
>>>> the access API to get the last inlining, I suggest adding an #ifdef
>>>> into oopDesc::equals.
>>>>
>>>> I suspect the happy final solution will be to templatize hot loops
>>>> using making the inlinable adjusted op== be part of the "decorations"
>>>> of those loops. Then we will have two copies of hot loops in the
>>>> JVM, one with the out-of-line hook and one without.
>>>>
>>>> For now, I think it's enough to have separate builds of the JVM,
>>>> one with the #ifdef set "the old way" and one that allows more
>>>> flexibility.
>>>>
>>>> (Is this the first time we've run into an occasion to make a separate
>>>> Shenandaoah build? Surely not.)
>>> Primitive field accesses required similar jumping through hoops ;-)
>>>
>>>>> So... with inlining oopDesc::equals() we will get one straight call
>>>>> (from the caller of oopDesc::equals() to the impl), instead of 2
>>>>> (caller
>>>>> -> oopDesc::equals() -> impl).
>>>>>
>>>>> It's the best I could come up with that does what (Shenandoah) GC
>>>>> needs,
>>>>> and it hasn't shown up in any performance testing that we did, yet.
>>>>>
>>>>> Still good?
>>>> I think it's too risky to out-of-line the "cmpq" instruction. Let's
>>>> find an agreeable way to handle that via a trade-off that keeps
>>>> the old code shape in VM builds which don't need the new code
>>>> shape. I'm speaking up here because I don't believe this is the
>>>> last time we'll have to consider adding something gross like an
>>>> #ifdef to support new features, and I want to aim at trade-offs
>>>> that keep the existing hot paths efficient, while still allowing
>>>> new code shapes to become part of the mix.
>>>>
>>>> (I'd be happy to see something cleaner than an #ifdef, of
>>>> course. Seems to me that a constant non-product flag
>>>> -XX:+ShortCircuitOopEquals would work too, with a
>>>> binding of falseInShenandoah.)
>>> That would require coding 2 versions of hot loops that involve oop==oop
>>> and switch to one or the other based on that flag. I have checked all
>>> the uses of oopDesc::equals() in my patch, and none seemed
>>> particularily
>>> critical. I think this would be overkill.
>>>
>>> And notice also that none interpreter/c1/c2 compiled code will suffer
>>> this problem because we'll just generate the right code at runtime.
>>>
>>> Roman
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list