[aarch64-port-dev ] [PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics, approval request
B. Blaser
bsrbnd at gmail.com
Sat Mar 2 14:55:55 UTC 2019
On Fri, 1 Mar 2019 at 11:21, Bhateja, Jatin <jatin.bhateja at intel.com> wrote:
>
> Current patch which is under review does not contain above code change to bypass intrinsic creation for reduction patterns.
> For X86 performance degrades with intrinsic w.r.t to non-intrinsic implementation in reduction
> scenarios with and without data variance (i.e. with and without branch predication effects).
>
> I could not find right hooks which can be called from common code for adding any such target specific checks during ideal(DAG) construction.
> Please share if you know any.
I guess if you want to back out all reduction scenarios, a probably
better way to do this would be to add predicates to your matching
rules:
instruct minF_random_reg(legRegF dst, legRegF a, legRegF b, legRegF
tmp, legRegF atmp, legRegF btmp) %{
predicate(UseAVX > 0 && !n->is_reduction());
Reductions being computed properly here:
diff --git a/src/hotspot/share/opto/loopTransform.cpp
b/src/hotspot/share/opto/loopTransform.cpp
--- a/src/hotspot/share/opto/loopTransform.cpp
+++ b/src/hotspot/share/opto/loopTransform.cpp
@@ -2039,7 +2039,8 @@
if (n_ctrl != NULL && loop->is_member(get_loop(n_ctrl))) {
// Now test it to see if it fits the standard pattern for a
reduction operator.
int opc = def_node->Opcode();
- if (opc != ReductionNode::opcode(opc,
def_node->bottom_type()->basic_type())) {
+ if (opc != ReductionNode::opcode(opc,
def_node->bottom_type()->basic_type())
+ || opc == Op_MinD || opc == Op_MinF || opc == Op_MaxD
|| opc == Op_MaxF) {
if (!def_node->is_reduction()) { // Not marked yet
// To be a reduction, the arithmetic node must have the
phi as input and provide a def to it
bool ok = false;
And if this is a reduction you could use alternative rules, see [0]:
instruct minF_reg(regF dst, regF a, regF b, rRegI tmp) %{
predicate(UseAVX > 0 && n->is_reduction());
But I'm not sure if 'blend/min/max' is really preferable to a single 'ucomisd'?
To summarize:
| blend/min/max | one ucomisd
--------------|---------------|------------
predictable | ! tiny loss ! | 10% gain
unpredictable | 50% gain | 10% gain
reduction | !!high loss!! | 10% gain
If we'd like to maximize the unpredictable gain we find in examples
like random search trees [1], we'd have to choose the 'blend/min/max'
variant. To avoid regressions, we'd have to use statistics which might
be shared between architectures [2]. However, data isn't collected per
call-site [3] and the prediction might be wrong [4] in which case we'd
have a tiny loss. For reduction scenarios, it'd be safer to back out
all of them when using 'blend/min/max'.
A safer variant would be to optimize the current compiled code of the
API using only one 'ucomisd' [0] vs several before [5]. The gain would
be stable of about 10% in every scenarios. So, no need to do
predictions and no need to bail out any more!
The discussion is still open... any opinion?
Thanks,
Bernard
[0] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-March/033011.html
[1] http://hg.openjdk.java.net/jdk/submit/file/d164e0b595e6/test/hotspot/jtreg/compiler/intrinsics/math/TestFpMinMaxIntrinsics.java#l185
[2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-February/032973.html
[3] https://bugs.openjdk.java.net/browse/JDK-8015416
[4] http://hg.openjdk.java.net/jdk/submit/file/d164e0b595e6/src/hotspot/share/opto/library_call.cpp#l6612
[5] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032564.html
More information about the hotspot-compiler-dev
mailing list