A failed attempt to add Phi::exact_type() to C1

Tue Aug 9 06:14:37 PDT 2011

Hi all,

I tried to add an implementation of Phi::exact_type() to C1 last weekend,
but failed. I'd like to share my experience. Any comment would be
appreciated.

I reading a blog post with a microbenchmark [1]. The microbenchmark, when
run on the client VM, triggers an OSR compilation of Client1.main() by C1;
all method invocations on the local variable "list" were not inlined. At
first I thought it was weird: since there's only one definition of "list",
all of its uses should know its exact type, and thus should be able to get
inlined. I tried moving the code in main() to another method, so that it can
get a standard compilation, and found those methods were indeed inlined when
standard compiled.

So apparently the difference had to do with OSRs. The I realized it was the
extra Phi introduced by the OSR entry that lost the exact type information.
And it wasn't just in OSRs, Phi nodes in C1 HIR always loses exact type
information, becuase it doesn't override Instruction::exact_type(). C1 will
not inline "list.size()" in the code snippet below, which contains diamond
control flow, with different definitions of the same variable:

public static void test(String[] args) {
  List<?> list;
  if (args.length % 2 == 0) {
    list = new ArrayList<String>();   // a1
  } else {
    list = new ArrayList<String>(32); // a2
  }

  // a3 = Phi(a1, a2)
  int size = list.size(); // a3.invokeinterface() java/util/List.size()I
  System.out.println(size);
}

Even if the local variable "list" always holds a reference to an ArrayList
instance, the Phi node stops the exact type information to flow through it,
so the "list.size()" call site can't be inlined.

I thought I'd be able to fix the problem by adding an implementation of
Phi::exact_type(), and I made a patch, avaiable at [2].

The basic idea is simple: if all operands of a Phi node agrees on a single
exact type, use it as the exact type of this Phi.
And some assumptions:
1. Because C1's HIR is in SSA form, the only kind of nodes that can have
cycles in data dependence graph is Phi. Cycles have to be broken when
recursively traversing the operands of a Phi node. If a cycle is found, I'll
just give up  finding the exact type of this Phi.
2. Unlike C2, which prunes the part of the graph not reachable from the OSR
entry point in OSR compilations, C1 always sees the whole graph of a method,
regardless of standard or an OSR compilation. If a variable needs a Phi and
the operands don't agree on a single exact type, standard compilation would
have noticed; otherwise, if a variable doesn't need a Phi, or it needs a Phi
but the operands agree on a single exact type, it should still hold in an
OSR compilation. So, if an operand of a Phi node is a UnsafeGetRaw (which
can only be introduced in an OSR entry), skipping it should be safe.

Applying the patch did allow the affected call sites to get inlined, in the
microbenchmark in [1]. The diamond control flow example got the
"list.size()" call site inlined as well.

But, the patch had a fatal bug. The Java code example in [2] demonstrates
that bug.
C1 builds the HIR graph incrementally; inline decisions are made as a part
of the HIR building process. When C1's GraphBuilder sees an invoke*
bytecode, it'll try to devirtualize the call site by asking for the
receiver's exact_type(). But the relationships between Phi nodes may still
be incomplete by then, so the Phi::exact_type() in my patch may return
immature (thus incorrect) results.
In the example in [2], the "list.size()" call site at line 18 inlined
java.util.ArrayList.size(), which is incorrect. The HIR log shows that when
GraphBuilder tried to inline this call site, the receiver (list4 in code
comment) had only one operand (list3 in code comment), which covers the
first two definitions of "list" (list1 and list2) but missed the third one
(list5). The connection between list4 and list5 was added later, too late.

So, the patch doesn't work.

My questions:
1. Any ideas on how to implement a correct Phi::exact_type() that conforms
to the way HIR graph is built now?
2. If the inline decisions are decoupled from the HIR graph building phase,
and pushed to a later phase, would it significantly slow down/complicate C1?
If it was done later, it could have allowed a much better chance of inlining
more stuff. Besides, it might allow policy-controlled iterations of other
optimizations + inlining, so in tiered mode warm methods may get finer grain
control of optimizations, and result in better code quality (suppose it
couldn't go to tier 4, or got deopt'd and fell to tier 1).

Regards,
Kris Mok

[1]: http://icyfenix.iteye.com/blog/1110279
[2]: https://gist.github.com/1133678
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110809/585331bc/attachment.html