[aarch64-port-dev ] FP conversion and comparison in C2
Andrew Dinn
adinn at redhat.com
Fri Nov 8 03:30:00 PST 2013
I have looked into the behaviour of FCVTZ and FCMP on AArch64 and how it
affects C2. Having checked the ARM ARM it appears that FCVTZ does
exactly what Java needs when converting NaNs, Infinites and Denormals so
we can simply plant an fcvtzs instruction for ConvF2I (ditto for the
other 3 cases) and not bother to do any clearing/testing of the FPSR nor
any consequent special case handling. That result also applies to C1 and
interpreter code. They can just execute fcvtzs and rely on the hardware
to install the correct result in the destination register.
So, rather than patching C2 I have now fixed the simulator to do what
the ARM ARM says.
At the same time I have also been looking at FCMP and how it can be used
to implement the C2 CmpF and CmpD operations on AArch64. Now on x86 this
is quite complicated and heavyweight and took some understanding because
x86 has to detect unordered FP results by testing extra flags not used
for normal comparisons:
The ad file defines a special flags register rFlagsRegUCF for use as the
output from a CmpF or CmpD i.e.
instruct cmpF_cc_reg(rFlagsRegU cr, regF src1, regF src2)
%{
match(Set cr (CmpF src1 src2));
. . .
It also defines two associated compare operands cmpOpUCF and cmpOpUCF2,
one used for ordered ineuqality comparisons and the othere for EQ/NEQ tests.
// Floating comparisons that don't require any fixup for the unordered
case
operand cmpOpUCF() %{
match(Bool);
predicate(n->as_Bool()->_test._test == BoolTest::lt ||
n->as_Bool()->_test._test == BoolTest::ge ||
n->as_Bool()->_test._test == BoolTest::le ||
n->as_Bool()->_test._test == BoolTest::gt);
format %{ "" %}
interface(COND_INTER) %{
equal(0x4, "e");
. . .
// Floating comparisons that can be fixed up with extra conditional jumps
operand cmpOpUCF2() %{
match(Bool);
predicate(n->as_Bool()->_test._test == BoolTest::ne ||
n->as_Bool()->_test._test == BoolTest::eq);
format %{ "" %}
interface(COND_INTER) %{
equal(0x4, "e");
. . .
Operand cmpOpUCF is used to type cop and rFlagsRegUCF to type cr when
matching
(CMoveX (Binary cop cr) (Binary dst src))
and
(CountedLoopEnd cop cmp)
The values encode in the interface ensure that unordered results (i.e.
comparisons involving NaNs) never pass the relevant test. So, unordered
will not pass a GT nor will it pass an LE if the test is inverted.
However, the use of these extra register/comparison operands means that
Intel requires an extra CMoveX rule for each value of X in {I, L, P, N,
F, D} and an extra CountedLoopEnd rule.
cmpOpUCF2 is used to type cop and rFlagsRegUCF to type cr when matching
(IF cop cr)
This also has its own rule and it actually does something interesting.
Intel has to special case this type of compare because it needs to add
extra code to detect the unordered case either to fold it into the
branch taken (NE) or skip the branch taken (EQ).
instruct jmpConUCF2(cmpOpUCF2 cop, rFlagsRegUCF cmp, label labl) %{
match(If cop cmp);
effect(USE labl);
ins_cost(200);
format %{ $$template
. . .
%}
ins_encode %{
Label* l = $labl$$label;
if ($cop$$cmpcode == Assembler::notEqual) {
__ jcc(Assembler::parity, *l, false);
__ jcc(Assembler::notEqual, *l, false);
} else if ($cop$$cmpcode == Assembler::equal) {
Label done;
__ jccb(Assembler::parity, done);
__ jcc(Assembler::equal, *l, false);
__ bind(done);
} else {
ShouldNotReachHere();
}
%}
ins_pipe(pipe_jcc);
%}
The need for an extra skip also seems to explain why it does do a CMoveX
based on an EQ or NEQ test -- these cases would require either two
conditional moves or a skip and a conditional move in order to correctly
handle the unordered case.
So, what is the upshot for AArch64. Well, it turns out that we get all
this for free if we make CmpF and CmpD output a normal rFlagsReg result
and, hence, do our comparisons using a standard cmpOp. This means that
ordered inequalities are performed using GT, GE, LT and LE (which never
capture the unordered (NaN) case). It also means that the other two
cases employ a standard EQ or NE and, mirabile dictu, ARM have already
seen fit to fold unordered into the NE case.
So, we don't need an rFlagsRegUCF, we don't need a cmpOpUCF or cmpOpUCF2
and we don't need extra rules to match CMoveX, CountedLoopEnd or If with
these operands as input. We just need to make CmpF and CmpD output an
rFlagsReg e.g.
instruct compF_reg_reg(rFlagsReg cr, vRegF src1, vRegF src2)
%{
match(Set cr (CmpF src1 src2));
. . .
Nice one ARM!
I plan to check in the C2 fixes today (they resolve the remainign TCK
failures from Pavel's frst round). I will leave the C1 and interpreter
code redundantly checking for and correcting FP exceptions for now -- we
can improve that in another checkin.
regards,
Andrew Dinn
-----------
More information about the aarch64-port-dev
mailing list