RFR: JDK-8199781: Don't use naked == for comparing oops

Thu Mar 29 13:12:11 UTC 2018

Hi Coleen,

On 2018-03-29 14:27, coleen.phillimore at oracle.com wrote:
>
>
> On 3/29/18 5:46 AM, Erik Österlund wrote:
>> Hi Coleen,
>>
>> On 2018-03-28 22:41, coleen.phillimore at oracle.com wrote:
>>>
>>> I agree, this looks like template voodoo.   It seems like you might 
>>> be able to use more X macros to reduce some boilerplate code.  But 
>>> I'm not requesting you to try this for this change.
>>>
>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/src/hotspot/share/oops/accessBackend.new.hpp.html
>>>
>>> Can you put a } // namespace AccessInternal
>>> at these places?   Long namespace declarations are not as bad as 
>>> #ifdef blocks but it's nice to see what the stray } goes to.
>>>
>>> Here:
>>>   175 }
>>> 1372 }
>>> I don't need to see a webrev for this minor change.
>>
>> Sure.
>>
>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/src/hotspot/share/prims/jvm.cpp.udiff.html
>>>
>>> + if ((!oopDesc::equals(previous_protection_domain, 
>>> protection_domain)) && (protection_domain != NULL)) {
>>>
>>> So we don't need oopDesc::equals for NULL?
>>
>> As Roman pointed out: no. A large amount of code would be toast if 
>> NULL was not NULL. So I don't think we should even try to support 
>> that unless a new GC algorithm has a very compelling case for 
>> treating NULL as a logical NULL.
>
> Great.
>>
>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/src/hotspot/share/oops/accessBackend.new.hpp.html
>>>
>>> So against my better judgement, I was trying to follow the bouncing 
>>> template instantiation ball.  And I guess we end up calling this:
>>>
>>> RawAccessBarrier<>
>>>   405   static bool equals(oop o1, oop o2) { return o1 == o2; }
>>
>> Yup.
>>
>>> Can you show how this decomposes from:
>>>
>>> + inline static bool equals(oop o1, oop o2) { return 
>>> Access<>::equals(o1, o2); }
>>
>> Sure.
>>
>> First off, Access<>::equals in access.hpp performs some decorator 
>> verification (in this case, make sure no decorators were passed in), 
>
> access.hpp line 271
>     access.hpp line 294
>> then calls AccessInternal::equals in accessBackend.hpp.
>
> accessBackend.hpp line 1292
>    DecoratorFixup line 1293
>
>> Then, decorators are inflated with DecoratorFixup, which applies 
>> decorator rules (that are defined in one central place). Among these 
>> rules is to add build-time decorators. Builds with Shenandoah will 
>> not have the INTERNAL_BT_TO_SPACE_INVARIANT decorator set (configured 
>> in barrierSetConfig.hpp), and builds without it will have it set.
>> The inflated decorators are passed through a call to 
>> PreRuntimeDispatch::equals.
>
> accessBackend.hpp line 967-980
>     We have INTERNAL_BT_TO_SPACE_INVARIANT (what's BT backtrace? 
> barrier type,right?)  Don't change it. GC doesn't need longer names.

BT is for build-time.

>
>> In PreRuntimeDispatch, there are two overloads for equals, but only 
>> one is enabled in each build, by using EnableIf, and hence SFINAE 
>> (Substitution Failure Is Not An Error).
>>
>> The overload enabled when Shenandoah is not built forwards the call 
>> straight to RawAccessBarrier::equals, where the pointers are simply 
>> compared, and the journey ends there.
>
> accessBackend.hpp line 405
>
> My mental journey has ended as well.  And there's more ...
>>
>> The overload enabled when Shenandoah is built continues the journey 
>> to RuntimeDispatch::equals.
>> RuntimeDispatch::equals uses partial template specialization to 
>> define the behaviour for equals using RuntimeDispatch<decorators, T, 
>> BARRIER_EQUALS>::equals, where the call is forwarded to the 
>> _equals_func function pointer, which originally points at 
>> RuntimeDispatch<decorators, T, BARRIER_EQUALS>::equals_init.
>
> I see.   LInes 605 to 652 could be an X macro in a follow up RFE.

If you think breaking IDE navigation support and adding another layer of 
metaprogramming is what this code needs to be easier to understand, then 
sure. I'm not sure though if that would necessarily make it easier to 
understand.

>>
>> The RuntimeDispatch<decorators, T, BARRIER_EQUALS>::equals_init 
>> function performs the barrier resolution. But it is only declared in 
>> accessBackend.hpp. The definition resides in access.inline.hpp. So 
>> the accessBackend.hpp journey ends here.
>>
>> In access.inline.hpp, the journey continues.
>
> Do we need to include access.inline.hpp in all the files that have 
> oopDesc::equals calls so that they are inlined for Shenandoah?
> We is royal we, ie the shenandoah people.

No, it will behave the same; both will achieve optimal code possible for 
that build with just the .hpp file. But when having Shenandoah in the 
build, equals will call through a function pointer (materialized in 
access.cpp) to perform the comparison, as it can't know whether you have 
selected Shenandoah or not statically.

>>
>> The RuntimeDispatch<decorators, T, BARRIER_EQUALS>::equals_init 
>> definition here will be executed upon the first execution of the 
>> accessor. It firsts retrieves a function pointer from 
>> BarrierResolver<decorators, func_t, BARRIER_EQUALS>, where func_t is 
>> the type of the function pointer. It then patches the _equals_func 
>> function pointer to to this resolved function pointer, so that the 
>> next invocation will jump straight in there. The resolved function 
>> pointer forwards the call into the GC backend, through 
>> PostRuntimeDispatch. When the function pointer has been patched, the 
>> function is called.
>>
>> So how do we materialize these function pointers? The journey 
>> continues into BarrierResolver<decorators, func_t, BARRIER_EQUALS>. 
>> This mechanism is responsible for materializing the different 
>> possible run-time contexts for which the resolved GC backend barriers 
>> may run. Run-time context today involves whether UseCompressedOops is 
>> set or not, and which one is the currently chosen BarrierSet.
>>
>> resolve_barrier() returns the final function pointer. It starts by 
>> expanding the possible runtime-flags (compressed oops or not 
>> compressed oops) in resolve_barrier_rt(). This feeds into the next 
>> variation point in resolve_barrier_gc(), which expands the different 
>> possible BarrierSet selections there might be. A macro in 
>> barrierSetConfig.hpp does "something" for each BarrierSet::Name for 
>> each concrete BarrierSet there is. This macro is used to instantiate 
>> a PostRuntimeDispatch::access_barrier function for each BarrierSet, 
>> by passing the BarrierSet type as a parameter to PostRuntimeDispatch. 
>> The BarrierSet type is retrieved from the enum value using the 
>> BarrierSet<GetType> which is specialized for each concrete BarrierSet.
>>
>> Once we continue the journey to PostRuntimeDispatch, which gets the 
>> concrete BarrierSet::AccessBarrier as a parameter, it is time to call 
>> into the GC backends of Access.
>> In a Shenandoah-enabled build, this could land in any 
>> *BarrierSet::AccessBarrier::equals function, depending on which GC is 
>> selected. If the concrete BarrierSet::equals being expanded is not 
>> Shenandoah, this will end up in the top AccessBarrier in the 
>> hierarchy - at BarrierSet::AccessBarrier::equals, which calls 
>> Raw::equals, where the journey ends in a pointer comparison.
>>
>> When Shenandoah barriers are expanded, they will do their Brooks 
>> pointer thing there, and the journey ends there.
>>
>> Now, since the .hpp ride ended at the RuntimeDispatch::equals_init 
>> declaration, the access.inline.hpp file must be included to get its 
>> definition that materializes the different possible runtime paths. 
>> The macro in access.cpp expands the RuntimeDispatch pipeline required 
>> for equals specifically. That makes it okay to call the function 
>> pointer through access.hpp and let the equals_init function it 
>> originally points to be linked in at link time to the access.cpp 
>> template instantiation of it, that rolls out and materializes the 
>> different possible paths.
>>
>> I hope this explanation makes it easier to understand what happens.
>>
>
> Yikes.  I agree with Andrew and Andrew that this will be hard to 
> maintain.  It was hard to review.  With the exception of this code, I 
> think we should be careful with adding overly-templatized code, and 
> only so if there's a clear performance reason to do so.

Okay.

> Over time, I think we should try to find simplifications with this, at 
> least for reading it.   This accessBackend.hpp is a very long file 
> with a lot of template boilerplate.  Even splitting it further might 
> be helpful.   Not suggesting it now, just something to think about.

I am open to suggestions.

/Erik

> Thanks to Andrew Dinn for the entertaining reading for the morning.
>
> thanks,
> Coleen
>
>
>>> Thank you for getting this to inline and supporting Roman's patch.
>>
>> No problem!
>>
>> Thanks for reviewing.
>>
>> /Erik
>>
>>>
>>> Coleen
>>>
>>> On 3/28/18 4:11 PM, Erik Osterlund wrote:
>>>> Hi Roman,
>>>>
>>>> Thanks for the review.
>>>>
>>>> /Erik
>>>>
>>>>> On 28 Mar 2018, at 21:56, Roman Kennke<rkennke at redhat.com>  wrote:
>>>>>
>>>>>
>>>>>> Incremental webrev:
>>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01_02/
>>>>>>
>>>>>> Full webrev:
>>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.02/
>>>>> Looks good to me. (Well. As good as all this template-voodoo can look
>>>>> ;-) Better than before though).
>>>>>
>>>>> Cheers,
>>>>> Roman
>>>>>
>>>>>
>>>>>>> Thank you for solving the inline problem with oopDesc::equals with
>>>>>>> this.   I haven't fully reviewed it but don't wait for me.
>>>>>> Thank you for having a look at this.
>>>>>>
>>>>>> /Erik
>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>>> On 3/28/18 9:00 AM, Erik Österlund wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The access.hpp support has reincarnated.
>>>>>>>>
>>>>>>>> Here is an incremental patch on Roman's work (rebased on latest
>>>>>>>> jdk-hs) that makes it possible to include access.hpp to call
>>>>>>>> Access<>::equals specifically. The infrastructure allows picking more
>>>>>>>> accesses that can be used from the hpp file, but I have kept it down
>>>>>>>> to the minimum requirements now, so for now it only works for equals.
>>>>>>>> Perhaps in the future, we might want to move more operations to be
>>>>>>>> reachable from hpp, but not today.
>>>>>>>>
>>>>>>>> I changed the involved header files to include access.hpp, and made
>>>>>>>> it possible to inline this again for non-Shenandoah builds, by
>>>>>>>> checking for the INTERNAL_BT_TO_SPACE_INVARIANT build-time decorator.
>>>>>>>>
>>>>>>>> Builds that only have to-space invariant GCs inline equals to a
>>>>>>>> normal native comparison of the pointers. Builds that have at least
>>>>>>>> one to-space invariant GC instead generate a single function pointer
>>>>>>>> call. This function pointer may be resolved in the access.cpp file,
>>>>>>>> instead of the access.inline.hpp file.
>>>>>>>>
>>>>>>>> Here is an explanation how this works:
>>>>>>>>
>>>>>>>> The template pipeline for each operation consists of 5 steps, and
>>>>>>>> they were previously all performed in the access.inline.hpp file. The
>>>>>>>> steps are:
>>>>>>>>
>>>>>>>> 1) Set default decorators and decay types
>>>>>>>> 2) Reduce types
>>>>>>>> 3) Pre-runtime dispatch
>>>>>>>> 4) Runtime-dispatch
>>>>>>>> 5.a) Barrier resolution (check which GC backend accessor to be used
>>>>>>>> upon the first invocation, then patch function pointer to 5.b)
>>>>>>>> 5.b) Post-runtime dispatch
>>>>>>>>
>>>>>>>> I have moved steps 1-4 to the accessBackend.hpp file, which is
>>>>>>>> included by access.hpp. This allows performing all steps until the
>>>>>>>> backend barrier resolution in only hpp files. This makes it possible
>>>>>>>> to use, e.g. raw accesses (although without compressed oops, atomic
>>>>>>>> or order access) from only the access.hpp file. The resolution of
>>>>>>>> barriers in the GC backends (comprising steps 4-5) remains in the
>>>>>>>> access.inline.hpp file. But they can be instantiated from the
>>>>>>>> access.cpp file for well chosen accesses and decorator combinations
>>>>>>>> that can be pre-generated in the libjvm.so file. The function
>>>>>>>> pointers used in builds that have not to-space invariant GCs will be
>>>>>>>> resolved to point right into these functions in the access.cpp file
>>>>>>>> at runtime.
>>>>>>>>
>>>>>>>> This change required moving the decorators to a new file called
>>>>>>>> accessDecorators.hpp. I kind of like having them there in a separate
>>>>>>>> file.
>>>>>>>>
>>>>>>>> Incremental webrev to Roman's patch:
>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.00_01/
>>>>>>>>
>>>>>>>> Full webrev:
>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01/
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> /Erik
>>>>>>>>
>>>>>>>>> On 2018-03-27 09:14, Roman Kennke wrote:
>>>>>>>>>> Am 27.03.2018 um 00:46 schrieb John Rose:
>>>>>>>>>>> On Mar 26, 2018, at 2:21 PM, Roman Kennke<rkennke at redhat.com>  wrote:
>>>>>>>>>>>> Am 26.03.2018 um 21:11 schrieb John Rose:
>>>>>>>>>>>> ...
>>>>>>>>>>>> This is clearly one of those cases.  Let's code against surprises
>>>>>>>>>>>> (both present and future) by abstracting simpler operations with
>>>>>>>>>>>> inlines.
>>>>>>>>>>> Thanks John for clarifying. I generally do agree with you that it
>>>>>>>>>>> makes
>>>>>>>>>>> sense to inline something like an operator that will compile to a
>>>>>>>>>>> single
>>>>>>>>>>> instruction.
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately, this new abstraction will generate a non-inlineable
>>>>>>>>>>> call.
>>>>>>>>>>> Thanks Erik's work with the Access API, it should be a straight
>>>>>>>>>>> call to
>>>>>>>>>>> the right implementation, not even a virtual call, but it's a call
>>>>>>>>>>> nonetheless.
>>>>>>>>>> What would it take to inline the call, so there's no new performance
>>>>>>>>>> hazard introduced?
>>>>>>>>>>
>>>>>>>>>> This might be a place to add #ifdefs, if the templates don't let us
>>>>>>>>>> control the decision statically.
>>>>>>>>>>
>>>>>>>>>> (I confess I find the templates complex and hard to reason about.
>>>>>>>>>> But that's a potential maintenance issue with the access API, not
>>>>>>>>>> your work.)
>>>>>>>>> The Access API supports static inclusion/exclusion of stuff, and we've
>>>>>>>>> used it before to ensable barriers on primitive field accesses for
>>>>>>>>> Shenandoah-enabled builds. It works by passing
>>>>>>>>> -DSUPPORT_BARRIER_ON_PRIMITIVES to gcc. This is much better than
>>>>>>>>> #ifdefs
>>>>>>>>> and does the same thing. I can add something similar for object
>>>>>>>>> equality.
>>>>>>>>>
>>>>>>>>>>> The reason why I'm proposing it is that the GC might want to have
>>>>>>>>>>> a say
>>>>>>>>>>> about object equality. In the case of Shenandoah, there may be two
>>>>>>>>>>> copies of any one object floating around, and comparing the two
>>>>>>>>>>> copies
>>>>>>>>>>> might otherwise result in false negatives.
>>>>>>>>>> 1. Of course we need a way for some GC algorithms to control
>>>>>>>>>> object identity.
>>>>>>>>>>
>>>>>>>>>> 2. It's good coding discipline to find all the places where native
>>>>>>>>>> op== is used and upgrade them to the needed control.
>>>>>>>>> I hope I did that with the proposed patch (it's based on years of
>>>>>>>>> finding all those places in Shenandoah land ;-) ).
>>>>>>>>>
>>>>>>>>>> (2a. We also need a way to avoid having op== creep back in
>>>>>>>>>> in an uncontrolled way.)
>>>>>>>>> We have something for this in Shenandoah too. It works by overriding ==
>>>>>>>>> in oopsHierarch.hpp and running into ShouldNotReachHere() if both
>>>>>>>>> operands are != NULL. This catches all uses of naked == at runtime,
>>>>>>>>> when
>>>>>>>>> running with +CheckUnhandledOops. It's not perfect, because it doesn't
>>>>>>>>> catch uses in places that are not executed in tests, but it's still
>>>>>>>>> incredibly useful. The caveat here is that we also need another
>>>>>>>>> abstraction for code paths that actually want naked ==, e.g. in the GC.
>>>>>>>>> In Shenandoah we have an oopDesc::unsafe_equals() for that. The other
>>>>>>>>> alternative would be to drop operators from oop to oopDesc* and compare
>>>>>>>>> that (which is exactly what oopDesc::unsafe_equals() does).
>>>>>>>>>
>>>>>>>>>> 3. We still need an unsurprising performance model for GC's
>>>>>>>>>> which don't require the extra identity hook.
>>>>>>>>> Yeah. Unfortunately this is C++ and not Java, and the decision which GC
>>>>>>>>> to use is made at runtime. The 'best' that I can see is to rip out the
>>>>>>>>> indirection in Shenandoah-disabled builds, as proposed above. I don't
>>>>>>>>> like it much, but if this is the only way we can get Shenandoah in,
>>>>>>>>> then
>>>>>>>>> let's go for it?
>>>>>>>>>
>>>>>>>>>> So we're not there yet, I think.  If there's no easy way to adjust
>>>>>>>>>> the access API to get the last inlining, I suggest adding an #ifdef
>>>>>>>>>> into oopDesc::equals.
>>>>>>>>>>
>>>>>>>>>> I suspect the happy final solution will be to templatize hot loops
>>>>>>>>>> using making the inlinable adjusted op== be part of the "decorations"
>>>>>>>>>> of those loops.  Then we will have two copies of hot loops in the
>>>>>>>>>> JVM, one with the out-of-line hook and one without.
>>>>>>>>>>
>>>>>>>>>> For now, I think it's enough to have separate builds of the JVM,
>>>>>>>>>> one with the #ifdef set "the old way" and one that allows more
>>>>>>>>>> flexibility.
>>>>>>>>>>
>>>>>>>>>> (Is this the first time we've run into an occasion to make a separate
>>>>>>>>>> Shenandaoah build?  Surely not.)
>>>>>>>>> Primitive field accesses required similar jumping through hoops ;-)
>>>>>>>>>
>>>>>>>>>>> So... with inlining oopDesc::equals() we will get one straight call
>>>>>>>>>>> (from the caller of oopDesc::equals() to the impl), instead of 2
>>>>>>>>>>> (caller
>>>>>>>>>>> -> oopDesc::equals() -> impl).
>>>>>>>>>>>
>>>>>>>>>>> It's the best I could come up with that does what (Shenandoah) GC
>>>>>>>>>>> needs,
>>>>>>>>>>> and it hasn't shown up in any performance testing that we did, yet.
>>>>>>>>>>>
>>>>>>>>>>> Still good?
>>>>>>>>>> I think it's too risky to out-of-line the "cmpq" instruction. Let's
>>>>>>>>>> find an agreeable way to handle that via a trade-off that keeps
>>>>>>>>>> the old code shape in VM builds which don't need the new code
>>>>>>>>>> shape.  I'm speaking up here because I don't believe this is the
>>>>>>>>>> last time we'll have to consider adding something gross like an
>>>>>>>>>> #ifdef to support new features, and I want to aim at trade-offs
>>>>>>>>>> that keep the existing hot paths efficient, while still allowing
>>>>>>>>>> new code shapes to become part of the mix.
>>>>>>>>>>
>>>>>>>>>> (I'd be happy to see something cleaner than an #ifdef, of
>>>>>>>>>> course.  Seems to me that a constant non-product flag
>>>>>>>>>> -XX:+ShortCircuitOopEquals would work too, with a
>>>>>>>>>> binding of falseInShenandoah.)
>>>>>>>>> That would require coding 2 versions of hot loops that involve oop==oop
>>>>>>>>> and switch to one or the other based on that flag. I have checked all
>>>>>>>>> the uses of oopDesc::equals() in my patch, and none seemed
>>>>>>>>> particularily
>>>>>>>>> critical. I think this would be overkill.
>>>>>>>>>
>>>>>>>>> And notice also that none interpreter/c1/c2 compiled code will suffer
>>>>>>>>> this problem because we'll just generate the right code at runtime.
>>>>>>>>>
>>>>>>>>> Roman
>>>>>>>>>
>>>
>>
>