RFR: JDK-8199781: Don't use naked == for comparing oops

Wed Mar 28 15:04:21 UTC 2018

Hi Roman,

On 2018-03-28 16:03, Roman Kennke wrote:
> Hi Erik,
>
> Thanks for this work!
>
> without having looked too deeply at the code (it might be in vain
> anyway, what with all the template voodoo stuff going on :-P), I have a
> few conceptual questions:
>
> - For the issue at hand (replacing naked == with oopDesc::equals() or
> such), wouldn't it have been enough to mark it with the
> INTERNAL_BT_TO_SPACE_INVARIANT decorator, to get inlining for
> non-Shenandoah (or more generally, non-to-space-invariant) builds. ?

I have done that as well to get inlining for non-to-space-invariant 
builds. But obviously, by moving e.g. Handles::operator == to the cpp 
file, and oopDesc::equals to the cpp file, you broke inlining anyway. To 
get the inlining back, choices were considering comparing two naked oops 
as not trivial and hence forcing all transitive dependencies to code 
that compares two naked oops to move to .inline.hpp files, or to make it 
allowed from .hpp files. This is my solution for allowing existing 
callsites that compare naked oops in .hpp files (because it was IMO 
rightfully considered trivial), to remain in .hpp files, and preserve 
all inlining. Therefore, only looking for that build-time decorator 
would not have been enough to make this legally allowed from .hpp files, 
because the very logic for folding away the non-to-space-invariant path 
was in access.inline.hpp.There must also exist a path for the 
non-to-space-invariant build from hpp files only to get to get to the GC 
backend implementation, which relies on .inline.hpp file support. That 
is why I have shuffled things around to be able to reach all the way up 
until but not including barrier resolution through the hpp files, and be 
able to finish the rest of the path in access.cpp for selected operations.

> - What is the advantage of this refactoring? It is a refactoring, right?
> or do we get new features here?

This is a refactoring that allows using Access<>::equals() from old 
header files that have been included all over the place, that have used 
raw oop comparisons in .hpp files in the past. One alternative is moving 
the caller of Access<>::equals to cpp files (and I heard objections to 
that from both Andrew Haley and John Rose, arguing that this should be a 
fast operation, and losing the inlining purely because of file 
dependency technicalities would be a shame). Another alternative is 
moving these callsites out to .inline.hpp files, and then expanding the 
transitive closure of required .inline.hpp dependency changes that 
creates. This is my attempt at making everyone happy - maximized 
inlining with or without to-space-invariant GCs in the build, without 
violating transitive .inline.hpp include dependency rules, and without 
having to move everything that ever transitively depends on comparing 
two oops to .inline.hpp files.

> - So now we can use the Access API by only including access.hpp. Fine.
> But this comes at the cost of having actual code in .hpp files? I am not
> sure about Hotspot style, but I try to avoid actual code in .hpp files
> whenever I can, except *maybe* for the most trivial of stuff.

For now, this is only intended for Access<>::equals to solve your 
problem at hand in this review: not violating transitive .inline.hpp 
dependencies, while preserving inlining all the way for 
non-to-space-invariant builds, and at the same time injecting variation 
points for Shenandoah without any GC backend .inline.hpp dependencies 
leaking into .hpp files. While this mechanism can be extended for more 
operations, there is currently no need for it.

About code style of calling "actual code" from .hpp files, I agree that 
it is a kind of gray area, where it is preferable to move non-trivial 
things to .inline.hpp files. Comparing naked oops has been considered 
trivial in the past by engineers, (despite calling oop::operator ==() 
when CHECK_UNHANDLED_OOPS is enabled), or now by calling 
oopDesc::equals, and I can see why. This patch allows existing code that 
considered oop1 == oop2 to be trivial enough to be in a header file to 
continue doing so, without violating transitive include dependency rules.

Thanks,
/Erik

>
> Thanks,
> Roman
>
>
>> Hi,
>>
>> The access.hpp support has reincarnated.
>>
>> Here is an incremental patch on Roman's work (rebased on latest jdk-hs)
>> that makes it possible to include access.hpp to call Access<>::equals
>> specifically. The infrastructure allows picking more accesses that can
>> be used from the hpp file, but I have kept it down to the minimum
>> requirements now, so for now it only works for equals. Perhaps in the
>> future, we might want to move more operations to be reachable from hpp,
>> but not today.
>>
>> I changed the involved header files to include access.hpp, and made it
>> possible to inline this again for non-Shenandoah builds, by checking for
>> the INTERNAL_BT_TO_SPACE_INVARIANT build-time decorator.
>>
>> Builds that only have to-space invariant GCs inline equals to a normal
>> native comparison of the pointers. Builds that have at least one
>> to-space invariant GC instead generate a single function pointer call.
>> This function pointer may be resolved in the access.cpp file, instead of
>> the access.inline.hpp file.
>>
>> Here is an explanation how this works:
>>
>> The template pipeline for each operation consists of 5 steps, and they
>> were previously all performed in the access.inline.hpp file. The steps are:
>>
>> 1) Set default decorators and decay types
>> 2) Reduce types
>> 3) Pre-runtime dispatch
>> 4) Runtime-dispatch
>> 5.a) Barrier resolution (check which GC backend accessor to be used upon
>> the first invocation, then patch function pointer to 5.b)
>> 5.b) Post-runtime dispatch
>>
>> I have moved steps 1-4 to the accessBackend.hpp file, which is included
>> by access.hpp. This allows performing all steps until the backend
>> barrier resolution in only hpp files. This makes it possible to use,
>> e.g. raw accesses (although without compressed oops, atomic or order
>> access) from only the access.hpp file. The resolution of barriers in the
>> GC backends (comprising steps 4-5) remains in the access.inline.hpp
>> file. But they can be instantiated from the access.cpp file for well
>> chosen accesses and decorator combinations that can be pre-generated in
>> the libjvm.so file. The function pointers used in builds that have not
>> to-space invariant GCs will be resolved to point right into these
>> functions in the access.cpp file at runtime.
>>
>> This change required moving the decorators to a new file called
>> accessDecorators.hpp. I kind of like having them there in a separate file.
>>
>> Incremental webrev to Roman's patch:
>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.00_01/
>>
>> Full webrev:
>> http://cr.openjdk.java.net/~eosterlund/8199781/webrev.01/
>>
>> Thanks,
>> /Erik
>>
>> On 2018-03-27 09:14, Roman Kennke wrote:
>>> Am 27.03.2018 um 00:46 schrieb John Rose:
>>>> On Mar 26, 2018, at 2:21 PM, Roman Kennke <rkennke at redhat.com> wrote:
>>>>> Am 26.03.2018 um 21:11 schrieb John Rose:
>>>>>> ...
>>>>>> This is clearly one of those cases.  Let's code against surprises
>>>>>> (both present and future) by abstracting simpler operations with
>>>>>> inlines.
>>>>> Thanks John for clarifying. I generally do agree with you that it makes
>>>>> sense to inline something like an operator that will compile to a
>>>>> single
>>>>> instruction.
>>>>>
>>>>> Unfortunately, this new abstraction will generate a non-inlineable
>>>>> call.
>>>>> Thanks Erik's work with the Access API, it should be a straight call to
>>>>> the right implementation, not even a virtual call, but it's a call
>>>>> nonetheless.
>>>> What would it take to inline the call, so there's no new performance
>>>> hazard introduced?
>>>>
>>>> This might be a place to add #ifdefs, if the templates don't let us
>>>> control the decision statically.
>>>>
>>>> (I confess I find the templates complex and hard to reason about.
>>>> But that's a potential maintenance issue with the access API, not
>>>> your work.)
>>> The Access API supports static inclusion/exclusion of stuff, and we've
>>> used it before to ensable barriers on primitive field accesses for
>>> Shenandoah-enabled builds. It works by passing
>>> -DSUPPORT_BARRIER_ON_PRIMITIVES to gcc. This is much better than #ifdefs
>>> and does the same thing. I can add something similar for object equality.
>>>
>>>>> The reason why I'm proposing it is that the GC might want to have a say
>>>>> about object equality. In the case of Shenandoah, there may be two
>>>>> copies of any one object floating around, and comparing the two copies
>>>>> might otherwise result in false negatives.
>>>> 1. Of course we need a way for some GC algorithms to control
>>>> object identity.
>>>>
>>>> 2. It's good coding discipline to find all the places where native
>>>> op== is used and upgrade them to the needed control.
>>> I hope I did that with the proposed patch (it's based on years of
>>> finding all those places in Shenandoah land ;-) ).
>>>
>>>> (2a. We also need a way to avoid having op== creep back in
>>>> in an uncontrolled way.)
>>> We have something for this in Shenandoah too. It works by overriding ==
>>> in oopsHierarch.hpp and running into ShouldNotReachHere() if both
>>> operands are != NULL. This catches all uses of naked == at runtime, when
>>> running with +CheckUnhandledOops. It's not perfect, because it doesn't
>>> catch uses in places that are not executed in tests, but it's still
>>> incredibly useful. The caveat here is that we also need another
>>> abstraction for code paths that actually want naked ==, e.g. in the GC.
>>> In Shenandoah we have an oopDesc::unsafe_equals() for that. The other
>>> alternative would be to drop operators from oop to oopDesc* and compare
>>> that (which is exactly what oopDesc::unsafe_equals() does).
>>>
>>>> 3. We still need an unsurprising performance model for GC's
>>>> which don't require the extra identity hook.
>>> Yeah. Unfortunately this is C++ and not Java, and the decision which GC
>>> to use is made at runtime. The 'best' that I can see is to rip out the
>>> indirection in Shenandoah-disabled builds, as proposed above. I don't
>>> like it much, but if this is the only way we can get Shenandoah in, then
>>> let's go for it?
>>>
>>>> So we're not there yet, I think.  If there's no easy way to adjust
>>>> the access API to get the last inlining, I suggest adding an #ifdef
>>>> into oopDesc::equals.
>>>>
>>>> I suspect the happy final solution will be to templatize hot loops
>>>> using making the inlinable adjusted op== be part of the "decorations"
>>>> of those loops.  Then we will have two copies of hot loops in the
>>>> JVM, one with the out-of-line hook and one without.
>>>>
>>>> For now, I think it's enough to have separate builds of the JVM,
>>>> one with the #ifdef set "the old way" and one that allows more
>>>> flexibility.
>>>>
>>>> (Is this the first time we've run into an occasion to make a separate
>>>> Shenandaoah build?  Surely not.)
>>> Primitive field accesses required similar jumping through hoops ;-)
>>>
>>>>> So... with inlining oopDesc::equals() we will get one straight call
>>>>> (from the caller of oopDesc::equals() to the impl), instead of 2
>>>>> (caller
>>>>> -> oopDesc::equals() -> impl).
>>>>>
>>>>> It's the best I could come up with that does what (Shenandoah) GC
>>>>> needs,
>>>>> and it hasn't shown up in any performance testing that we did, yet.
>>>>>
>>>>> Still good?
>>>> I think it's too risky to out-of-line the "cmpq" instruction.  Let's
>>>> find an agreeable way to handle that via a trade-off that keeps
>>>> the old code shape in VM builds which don't need the new code
>>>> shape.  I'm speaking up here because I don't believe this is the
>>>> last time we'll have to consider adding something gross like an
>>>> #ifdef to support new features, and I want to aim at trade-offs
>>>> that keep the existing hot paths efficient, while still allowing
>>>> new code shapes to become part of the mix.
>>>>
>>>> (I'd be happy to see something cleaner than an #ifdef, of
>>>> course.  Seems to me that a constant non-product flag
>>>> -XX:+ShortCircuitOopEquals would work too, with a
>>>> binding of falseInShenandoah.)
>>> That would require coding 2 versions of hot loops that involve oop==oop
>>> and switch to one or the other based on that flag. I have checked all
>>> the uses of oopDesc::equals() in my patch, and none seemed particularily
>>> critical. I think this would be overkill.
>>>
>>> And notice also that none interpreter/c1/c2 compiled code will suffer
>>> this problem because we'll just generate the right code at runtime.
>>>
>>> Roman
>>>
>