RFR: JDK-8199781: Don't use naked == for comparing oops

Tue Mar 27 08:09:27 UTC 2018

Hi,

This is indeed exactly what the build-time decorators were designed for. 
Specifically, the INTERNAL_BT_TO_SPACE_INVARIANT decorator fits the bill 
here. Access::equals should use the same hardwiring techniques that 
resolve did, where it inlines equals for builds that only have to-space 
invariant GCs. When Shenandoah is included in the build, there must be a 
static variation point. Then a function pointer call is performed. The 
overhead of that is small.

As for the include .inline.hpp problems, it is worth mentioning that 
previous incarnations of the Access API (before it went out for review) 
had a mechanism for generating the required backend functions that the 
function pointers (that indeed require a variation point in a given 
build) resolve to in .cpp files, and hence allow a chosen subset of 
common Access operations to be used directly from access.hpp instead. 
The main motivation then was reducing build times. I later optimized my 
templates so the build times were no longer a problem, and therefore 
removed the support for doing that as it seemed no longer necessary. But 
I did not consider using it to reduce .inline.hpp dependencies. It 
sounds like this is worth a shot. If you would like this, I can revive 
the support for doing that. Then you can include only access.hpp for 
equals, and solve the include problems.

Thanks,
/Erik

On 2018-03-27 09:14, Roman Kennke wrote:
> Am 27.03.2018 um 00:46 schrieb John Rose:
>> On Mar 26, 2018, at 2:21 PM, Roman Kennke <rkennke at redhat.com> wrote:
>>> Am 26.03.2018 um 21:11 schrieb John Rose:
>>>> ...
>>>> This is clearly one of those cases.  Let's code against surprises
>>>> (both present and future) by abstracting simpler operations with
>>>> inlines.
>>> Thanks John for clarifying. I generally do agree with you that it makes
>>> sense to inline something like an operator that will compile to a single
>>> instruction.
>>>
>>> Unfortunately, this new abstraction will generate a non-inlineable call.
>>> Thanks Erik's work with the Access API, it should be a straight call to
>>> the right implementation, not even a virtual call, but it's a call
>>> nonetheless.
>> What would it take to inline the call, so there's no new performance
>> hazard introduced?
>>
>> This might be a place to add #ifdefs, if the templates don't let us
>> control the decision statically.
>>
>> (I confess I find the templates complex and hard to reason about.
>> But that's a potential maintenance issue with the access API, not
>> your work.)
> The Access API supports static inclusion/exclusion of stuff, and we've
> used it before to ensable barriers on primitive field accesses for
> Shenandoah-enabled builds. It works by passing
> -DSUPPORT_BARRIER_ON_PRIMITIVES to gcc. This is much better than #ifdefs
> and does the same thing. I can add something similar for object equality.
>
>>> The reason why I'm proposing it is that the GC might want to have a say
>>> about object equality. In the case of Shenandoah, there may be two
>>> copies of any one object floating around, and comparing the two copies
>>> might otherwise result in false negatives.
>> 1. Of course we need a way for some GC algorithms to control
>> object identity.
>>
>> 2. It's good coding discipline to find all the places where native
>> op== is used and upgrade them to the needed control.
> I hope I did that with the proposed patch (it's based on years of
> finding all those places in Shenandoah land ;-) ).
>
>> (2a. We also need a way to avoid having op== creep back in
>> in an uncontrolled way.)
> We have something for this in Shenandoah too. It works by overriding ==
> in oopsHierarch.hpp and running into ShouldNotReachHere() if both
> operands are != NULL. This catches all uses of naked == at runtime, when
> running with +CheckUnhandledOops. It's not perfect, because it doesn't
> catch uses in places that are not executed in tests, but it's still
> incredibly useful. The caveat here is that we also need another
> abstraction for code paths that actually want naked ==, e.g. in the GC.
> In Shenandoah we have an oopDesc::unsafe_equals() for that. The other
> alternative would be to drop operators from oop to oopDesc* and compare
> that (which is exactly what oopDesc::unsafe_equals() does).
>
>> 3. We still need an unsurprising performance model for GC's
>> which don't require the extra identity hook.
> Yeah. Unfortunately this is C++ and not Java, and the decision which GC
> to use is made at runtime. The 'best' that I can see is to rip out the
> indirection in Shenandoah-disabled builds, as proposed above. I don't
> like it much, but if this is the only way we can get Shenandoah in, then
> let's go for it?
>
>> So we're not there yet, I think.  If there's no easy way to adjust
>> the access API to get the last inlining, I suggest adding an #ifdef
>> into oopDesc::equals.
>>
>> I suspect the happy final solution will be to templatize hot loops
>> using making the inlinable adjusted op== be part of the "decorations"
>> of those loops.  Then we will have two copies of hot loops in the
>> JVM, one with the out-of-line hook and one without.
>>
>> For now, I think it's enough to have separate builds of the JVM,
>> one with the #ifdef set "the old way" and one that allows more
>> flexibility.
>>
>> (Is this the first time we've run into an occasion to make a separate
>> Shenandaoah build?  Surely not.)
> Primitive field accesses required similar jumping through hoops ;-)
>
>>> So... with inlining oopDesc::equals() we will get one straight call
>>> (from the caller of oopDesc::equals() to the impl), instead of 2 (caller
>>> -> oopDesc::equals() -> impl).
>>>
>>> It's the best I could come up with that does what (Shenandoah) GC needs,
>>> and it hasn't shown up in any performance testing that we did, yet.
>>>
>>> Still good?
>> I think it's too risky to out-of-line the "cmpq" instruction.  Let's
>> find an agreeable way to handle that via a trade-off that keeps
>> the old code shape in VM builds which don't need the new code
>> shape.  I'm speaking up here because I don't believe this is the
>> last time we'll have to consider adding something gross like an
>> #ifdef to support new features, and I want to aim at trade-offs
>> that keep the existing hot paths efficient, while still allowing
>> new code shapes to become part of the mix.
>>
>> (I'd be happy to see something cleaner than an #ifdef, of
>> course.  Seems to me that a constant non-product flag
>> -XX:+ShortCircuitOopEquals would work too, with a
>> binding of falseInShenandoah.)
> That would require coding 2 versions of hot loops that involve oop==oop
> and switch to one or the other based on that flag. I have checked all
> the uses of oopDesc::equals() in my patch, and none seemed particularily
> critical. I think this would be overkill.
>
> And notice also that none interpreter/c1/c2 compiled code will suffer
> this problem because we'll just generate the right code at runtime.
>
> Roman
>