RFR(S): 8239072: subtype check macro node causes node budget to be exhausted

Roland Westrelin rwestrel at redhat.com
Wed Mar 11 10:23:23 UTC 2020


Hi Vladimir,

> The problem doesn't look specific to subtype checks, though new 
> SubTypeCheck node exacerbates it.
>
> The fix you propose adds an adhoc pass specifically for subtype checks 
> to force expansion for simple cases.
>
> Can we try to improve the situation for all macro nodes instead?
>
> Assuming worst case scenario (200 + 50%) for every macro node does look 
> too conservative.
>
> IMO the root cause of the problem is not in macro expansion but 
> preceding optimizations/inlining which don't take into account macro 
> nodes and just treat them as ordinary ideal nodes when deciding whether 
> to expand the graph or not and by how much.
>
> But, as a stop-the-gap solution, can we relax the problematic check and 
> perform it repeatedly during macro node expansion instead (possibly, 
> interleaved with some optimization passes akin to what we do during 
> incremental inlining [1]) in an attempt to squeeze expanded nodes into 
> the graph? The chance to hit the hard node limit won't completely go 
> away, but its likelihood should be much lower.
>
> What do you think?

My first attempt at a fix for this was to have all macro nodes report an
estimate of their size once expanded. First issue with this is that
there's a maintenance cost associated with such as a solution. Estimates
must be kept accurate enough which I did by comparing the estimate
reported by the node and the actual count number of nodes added at
expansion time with an assert.

For subtype checks, my intent was to have an estimate that would take
the types of inputs into account so it would return a smaller number of
nodes if the expansion was expected to be a simple comparison.

Because a subtype check node can be shared between several ifs and is
then expanded by doing one expansion for each if, the estimate must take
into account the number of uses.

Note also that the way expansion is performed is no efficient at
minimizing the number of created nodes because it delays IGVN to once
expansion is over. So for this code for instance:

  Node *p1 = gvn.transform(new AddPNode(superklass, superklass, gvn.MakeConX(in_bytes(Klass::super_check_offset_offset()))));
  Node* m = C->immutable_memory();
  Node *chk_off = gvn.transform(new LoadINode(NULL, m, p1, gvn.type(p1)->is_ptr(), TypeInt::INT, MemNode::unordered));
  int cacheoff_con = in_bytes(Klass::secondary_super_cache_offset());
  bool might_be_cache = (gvn.find_int_con(chk_off, cacheoff_con) == cacheoff_con);

at expansion time might_be_cache is always true (granted this condition
can be changed so it doesn't require a gvn to transform nodes). Also if
a subtype check node has 2 uses, the logic for both subtypes would be
duplicated and then only after expansion would some nodes common. All
this to say that the way expansion works leads to a big increase of live
nodes until igvn runs. So even with better estimates we would still need
to be cautious and somewhat pessimistic.

I thought about feeding macro node estimates back into live nodes
computations during inlining. Given how small changes to inlining derail
performance of nashorn benchmarks, I wasn't to eager to go down that
path to be fair.

At that point, I felt this was all becoming too complicated. That's when
I came up with the current patch which is simple, robust, grows live
nodes count incrementally and has no drawback that I can think of. So I
think it's the right fix for this particular issue.

I agree it only solves that particular issue where a lot of subtype
checks optimize down to a simple pattern and that there's a bigger issue
likely made worse by the addition of more macro nodes. But it's also
true that when c2 operates close to the live nodes limit, all bets are
off, the entire compilation process is not robust anymore and you'd
better not be unlucky.

I don't feel comfortable adjusting heuristics if there's no way to
evaluate the result. And with that patch in, AFAICT, we don't have test
cases that trigger the problem.

Roland.



More information about the hotspot-compiler-dev mailing list