[Exp] First prototype of new acmp bytecode

Thu Mar 8 16:39:07 UTC 2018

Hi,

I've added type speculation on non-nullness to avoid the null check when emitting the new acmp (it
only helps for non-jsr292 methods if -XX:TypeProfileLevel is set to > 111). We might want to add
more speculation in the future but I think that should be enough for now. I've also fixed the code
in macro.cpp and converted the test to the jtreg format:
http://cr.openjdk.java.net/~thartmann/valhalla/exp/acmp.04/

Here is a simple JMH benchmark that tests some common cases:
http://cr.openjdk.java.net/~thartmann/valhalla/exp/acmp.04/NewAcmpBenchmark.java

-XX:-TieredCompilation -XX:-UseNewAcmp
Benchmark                                Mode  Cnt    Score   Error   Units
NewAcmpBenchmark.newCmp                 thrpt  200  108.911 ± 0.086  ops/us
NewAcmpBenchmark.newCmpDoubleNull       thrpt  200   88.206 ± 4.792  ops/us
NewAcmpBenchmark.newCmpDoubleNullFalse  thrpt  200   72.742 ± 7.563  ops/us
NewAcmpBenchmark.newCmpField            thrpt  200  107.090 ± 0.083  ops/us
NewAcmpBenchmark.oldCmp                 thrpt  200  114.466 ± 0.077  ops/us

-XX:-TieredCompilation -XX:+UseNewAcmp -XX:ValueBasedClasses=compiler/valhalla/valuetypes/MyValue
Benchmark                                Mode  Cnt    Score   Error   Units
NewAcmpBenchmark.newCmp                 thrpt  200  101.480 ± 0.260  ops/us
NewAcmpBenchmark.newCmpDoubleNull       thrpt  200   90.429 ± 4.741  ops/us
NewAcmpBenchmark.newCmpDoubleNullFalse  thrpt  200   81.230 ± 4.115  ops/us
NewAcmpBenchmark.newCmpField            thrpt  200  102.224 ± 0.019  ops/us
NewAcmpBenchmark.oldCmp                 thrpt  200  114.336 ± 0.239  ops/us

In the worst case, if we need to emit the new acmp and the first operand is not null, there is a
performance impact of 6.80% (see newCmp).

However, in many cases we can use static type information to optimize. For example, if we know that
one operand is a value type, we can emit a "double null check". This causes the performance impact
to disappear into the noise (see newCmpDoubleNull). If we know in addition that one operand is
always non-null, we can emit a static false. This improves performance by ~11% (high error) compared
to old acmp.

There is one pitfall. If we compare two object fields, C2 optimizes old acmp to directly compare the
narrow oops (no need to decode). With the new acmp, we need to decode the oop because we use derived
oops for perturbation. Surprisingly, the newCmpField benchmark shows that the regression is even
lower than in the newCmp case (4.5%). That's probably because the comparison is always false and
therefore the CPUs branch prediction works better, mitigating the performance impact of the
additional instructions.

The last benchmark (oldCmp) verifies that if C2 is able to determine that one operand is not a value
type, we can use the old acmp and performance is equal to the baseline.

I will re-run the tests with type speculation enabled to see how much of a difference that makes.

I think this is stable enough to be pushed to the Exp branch. Any objections?

Thanks,
Tobias

On 23.02.2018 14:22, Tobias Hartmann wrote:
> Hi John,
> 
> On 21.02.2018 22:04, John Rose wrote:
>> You might even be able to get rid of the special node type (arity=3),
>> if the cases where CmpP sees a derived oop can be recognized
>> as perturbations.  I don't think we do CmpP on derived oops in
>> any other circumstance (no C-style pointer/limit loops).
> 
> Yes, we only use CmpP with AddP inputs for raw pointer comparisons (for example, in
> PhaseMacroExpand::expand_allocate_common) and we can easily filter these out.
> 
> Here's the new webrev:
> http://cr.openjdk.java.net/~thartmann/valhalla/exp/acmp.03/
> 
> Changes include:
> - Using derived oops for perturbation
> - Got rid of all CastX2P usages
> - Removed additional input edge from CmpP
> - Factored common code into separate methods
> - Swap operand optimization to avoid null checks in new acmp
> - Added code to ensure OrX is folded to null check or constant false if possible
> - Interface supertype support
> 
> I've executed performance runs with -XX:-TieredCompilation -XX:ValueBasedClasses= -XX:+UseNewAcmp
> and there is no significant performance difference with SPECjvm2008 and SPECjbb2015 (and some of our
> internal benchmarks).
> 
> TODOs:
> - JMH benchmarks
> - Type speculation on (non-)nullness
> - Fix changes in macro.cpp
> - Convert test to jtreg format
> 
> Best regards,
> Tobias
>