Inlining difference when using G1GC instead of ParallelGC
Charlie Gracie
Charlie.Gracie at microsoft.com
Tue Jul 28 20:39:58 UTC 2020
Hi,
I have noticed an inlining difference in C2 when the JVM is using G1GC as compared to
ParallelGC. It is causing a measurable difference in performance since other optimizations
cannot take place if the method is not inlined. This is a code snippet my small example [1]
that demonstrates the difference:
public class TypeCheck {
public static void main(String[] args) {
...
Handler handler1 = new Handler(new InnerImpl1());
handler1.doIt();
…
Handler handler2 = new Handler(new InnerImpl2());
handler2.doIt();
…
Handler handler3 = new Handler(new InnerImpl3());
handler3.doIt();
}
}
public class Handler {
Inner inner;
public Handler(Inner i) {
inner = i;
}
public int doIt() {
return inner.getValue();
}
}
abstract class Inner {
public abstract int getValue();
}
Handler.doIt() is invoked with Handler.inner having more than 2 different types, so
TypeSpeculation is not used. When the JVM is using ParallelGC, C2 determines a concrete
type because it can see the value stored to the `inner` field in the constructor instead of
reading it from the field. I believe this is happening because of
MemNode::can_see_stored_value(). With this optimization the concrete subclass type
is known and the method is inlined. When using G1GC the GC write barrier contains
an Op_MemBarVolatile. I believe that volatile memory barrier generated for the field write
in the constructor stops the value from being visible after the write barrier. This forces the
read of `inner` in doIt() to happen and then the result only has a type of Inner so getValue()
cannot be inlined.
Is this a deficiency that should be investigated further to attempt a "fix"? I would like to
work on a solution, but I am looking for feedback on whether this is something the
community feels can and should be fixed.
Cheers,
Charlie Gracie
Extra information:
If the code is modified such that the allocations are on separate lines, then C2 can inline
the getValue() method when the JVM is using G1GC and ParallelGC. This is because the
constructor will directly follow the allocation of the Handler object. When this happens
the GC barrier can be elided so the original value being stored can be used instead of
having to do the read. I have a 2nd example [3] which can be used to demonstrate this.
[1] https://github.com/charliegracie/code-examples/tree/master/java/InlineTests
[2] https://github.com/charliegracie/code-examples/blob/master/java/InlineTests/TypeCheck2.java
More information about the hotspot-compiler-dev
mailing list