RFR: 8194312: Support parallel and concurrent JNI global handle processing

Thu Jan 11 00:23:03 UTC 2018

Hi Kim,

On 2018-01-10 21:57, Kim Barrett wrote:
>> On Jan 10, 2018, at 8:50 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
>>
>> Hi Kim,
>>
>> I have already seen earlier incarnations of this code so I only have a few comments regarding things that were not in that incarnation.
> Thanks for looking at it yet again.
>
>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shared/OopStorage.java:
>>
>> The Java implementation of OopStorage for the SA agent seems a bit empty and full of TODO comments to actually implement something. It seems like quite a bit of the other changes to the SA assumes that this actually does something. For example there are changes to use OopStorage oopsDo instead of JNIHandleBlock oopsDo, which I presume means there has to be some oopsDo implementation in place for OopStorage.
>> Is the intention to implement that later on in a separate RFE?
> Quoting from the original RFR email:
>
>    - Serviceability Agent updates for OopStorage and JNI usage.  Just
>    enough has been done to allow existing SA tests to "pass".  This
>    will be addressed as a followup.
>
> I haven't filed that RFE yet, but will be doing so.

Okay, I see. Thank you for the explanation. I missed that.

>> src/hotspot/share/gc/shared/oopStorage.hpp:
>>
>> The OopStorage::oops_do functions accept some kind of OopClosure and performs iteration with it. I'm curious though why it takes the specific type of OopClosure as a template parameter, passes it around, and eventually discards the information without making use of it? I was thinking perhaps the intention was to use the closure type information to devirtualize the do_oop() call, but that does not appear to be the case. To do that you would have to in OopStorage::OopFn::operator()() perform a qualified _cl->Closure::do_oop() call instead of _cl->do_oop() (and possibly take some care to prevent abstract oop closure types like OopClosure or ExtendedOopClosure from being devirtualized, which would be awkward (doable with some EnableIf<!IsSame<Closure, OopClosure>::value>::type overload, or something like that).
>> In other words, it seems like either this template parameter is unnecessary and you could just replace uses of it with OopClosure, or it should be made useful, e.g. by devirtualizing the calls or something. Same goes for the IsAliveClosure parameter for weak_oops_do.
> Sorry, but I'm not understanding what problem is being suggested.
> OopStorage doesn't depend on OopClosure as a type, at all, anywhere.
> There are no occurrences of the type OopClosure anywhere in the
> OopStorage implementation.  The types coming in to OopStorage
> iteration are preserved and used, without any need for odd syntax.
>
> With the type preserved like that, for common use-cases such as
>
>    MyClosureType my_closure;
>    storage->oops_do(&my_closure);
>
> the compiler can know the exact type of my_closure at the call site
> for do_oop(), in which case the compiler can devirtualize the call
> (and I would expect any decent compiler to do so; I've seen gcc do it).
>
> Maybe the issue is about JNIHandles::oops_do and friends, which
> presently are ordinary functions with OopClosure* arguments and the
> like?  Yes, the type information isn't flowing through there from the
> caller to the OopStorage infrastructure.  That's because that was
> going to introduce some additional fannout in this changeset.  I was
> going to move the inline definitions to oopStorage.inline.hpp (per
> Coleen's pre-review request), and all uses of iteration would then
> need to be changed to include that, which would have required a new
> jniHandles.inline.hpp, which would have required at least some
> additional downstream include changes.  I'm planning to make those
> changes as a followup.  Of course, I then forgot Coleen's pre-review
> request.

Your example may very well be devirtualized. But your assumption that 
passing around the type information is what allows that devirtualization 
is incorrect. In fact it does not make it any easier at all.
If you have a class A and a class B derived from A (and no other classes 
in your program), then seeing an arbitrary value of type B* at a call 
site does not make it safe to devirtualize calls to such objects in the 
general case. You need more than the type information here. There is no 
way for the compiler to prove that there is not another class C deriving 
from B in some other compilation unit that will be linked in at some 
later stage (link-time or even run-time), and hence unless otherwise 
proven through something other than the declared type, the B* might 
(from the compiler point of view) hypothetically point at a C* instance.
What the compiler may do though which is probably what you have 
observed, is to perform points-to analysis that proves that all possible 
values of the object at the call site have the exact type of B, by 
following the object as it gets passed around back to all possible 
allocation sites from the call site, using data-flow analysis, proving 
that all those allocation sites were of type B. And that points-to 
analysis is not being helped in any way by passing around the exact type 
in declarations. In fact, you could pass the type inaccurately as A* and 
the compiler will still see that all possible values of the object at 
the callsite are exactly B and devirtualize the call anyway. When 
points-to analysis can prove the type of the object, the passed around 
type information no longer matters. And when points-to analysis can not 
prove the type of the object, the passed around type information still 
does not matter.

Here is a small sample program you can compile to assembly to 
demonstrate my point with clang/g++ -O3 -S main.cpp:

#include <stdio.h>

class A {
public:
   virtual void foo() {
     printf("A::foo");
   }
};

class B: public A {
public:
   virtual void foo() {
     printf("B::foo");
   }
};

extern B& get_b();

int main(int argc, char* argv[]) {
   B& b_ref = get_b();
   B b;
   A& a_ref = b;
   b.foo();        // generates non-virtual call; points-to analysis 
(trivially) determines the derived type of all possible values of b is B
   a_ref.foo();    // generates non-virtual call; points-to analysis 
determines the derived type of all possible values of a_ref is B 
(despite declared type being A - it does not matter)
   b_ref.foo();    // generates virtual call; points-to analysis can not 
determine the derived type of all possible values of b_ref is B. The 
declared type B does not matter or help in any way, as another 
compilation unit could have derived B. Only with link-time optimization 
is it safe to remove the virtual call
   b_ref.B::foo(); // generates non-virtual call; qualifier forces B 
interpretation and devirtualizes the call; no need to worry about having 
no clue about the dynamic type

   return 0;
}

I disassembled with both clang and gcc to verify that my comments 
reflect reality.

What I am trying to say is that the compiler will not be able to 
devirtualize any more or less calls to the OopClosure by passing around 
its exact type as a Closure type parameter, unless you utilize that type 
information to devirtualize it with a qualified _cl->Closure::do_oop(), 
which is the proper way to devirtualize calls (without relying on the 
success of points-to analysis finding all allocation sites, which still 
is invariant of the passed around Closure type information). Therefore, 
it seems strange to me to pass around the Closure type as a template 
parameter, but then never make any use of that type information.

So my suggestion remains the same: either make use of the type 
information by explicitly devirtualizing the calls to do_oop, or remove 
that template type and replace it with OopClosure instead, as that is 
equally as accurate for virtual calls that are not explicitly 
devirtualized, and makes it a bit easier to read.

>
>> src/hotspot/share/gc/shared/oopStorage.cpp:
>>
>> I recall in the pre-review we had some discussion about whether the infrastructure for supporting early iteration termination was necessary or not. You had me convinced that JNI would need it for something. Is this still the case, or has that use case been revised? Reading through the new JNI code, I could not find uses of OopStorage::iterate_safepoint() from the JNI code, and hence no uses of early termination. Having said that, it does look rather simple now with only the sequential case supporting early termination.
>> However, if there is indeed no use any longer for OopStorage::iterate() and hence the only publicly available tool for early iteration exit, I wonder if perhaps it should be reconsidered if we want to remove that infrastructure. Or will there be other use cases down the road? Currently I find it at least a bit confusing that, e.g. OopStorage::iterate_safepoint() expects a function of type F that returns bool while OopStorage::BasicParState::iterate() expects a function of  type F that returns void. For the OopStorage::iterate_safepoint() case it is clearly described in the documentation, in the OopStorage::BasicParState::iterate() case I did not find it clear that this F is a different F to the safepoint one.
>> I guess I propose either removing early termination support alltogether if nobody is using it anyway (and hence having the F function type consistently return void), or documenting clearly that the sequential and parallel iteration functions expect different function types - one with bool return type and one with void return type, depending on whether early termination is supported or not.
> The SimpleRootsClosure used by
> VM_HeapWalkOperation::collect_simple_roots is a use-case for early
> termination.

I see. Sounds like it is still worthwhile to keep early termination then.

> iteration_safepoint documents a requirement on the result of the
> argument function (the result must be implicitly convertible to
> bool), and the associated behavior.  ParState::iterate doesn't specify
> any requirements on the result of the argument function, nor specify
> any associated behavior, so has none.  I should make that explicit
> though, e.g. document that any result returned by f is ignored.

Yes, documenting that explicitly would be fantastic.

> Since oops_do and weak_oops_do are expected to be the usual iteration
> entry points, and the early termination support doesn't affect them at
> all, I would like to leave it in place, to support use-cases like that
> one from JVMTI.  As we discussed during pre-review, ParState doesn't
> support early termination, since ParState is for exclusive use by GC,
> and we don't have or envision use-cases for early termination by GC.

I am okay with leaving it in place.

Thanks,
/Erik

>> Apart from the above minor remarks, I am delighted to see these changes. I am extra delighted to see deleted_handle() disappear. I am also delighted to see the new simplified synchronization for parallel iteration - that worked out nicely. Thank you for building a good oop storage solution.
> Thanks.
>