RFR(L): 8074981 (Integer/FP scalar reduction optimization )

Wed Mar 25 22:07:58 UTC 2015

Hi Michael,

I have few major concerns which you need to address.

Adding new field _attr to Node class should be avoided - it will 
increase significantly memory footprint of graph and not be used 
frequently (vectorization is rare case).

NodeFlags has only 16 bits and you used 2. And I don't see how 
Flag_is_loop_carried_dep is used.

All above goes to one question: why mark_reductions() is executed in 
loopopts before each unroll and not during superword processing?
If you do mark_reductions() in superword you can use VectorSet to 
indicate nodes which are reduction nodes.

And the same for _attr. Why to store alignment in Node and not use 
_node_info in packset_eval()?

Small note. Instead of:
+        Node *defNode = n->in(len - 1);
use:
+        Node *defNode = n->in(LoopNode::LoopBackControl);

Thanks,
Vladimir

On 3/25/15 1:09 PM, Berg, Michael C wrote:
> Christian/Nils: Any additional comments for the review, if not Thursday
> I will upload the final webrev with the requested change.
>
> Thanks,
>
> -Michael
>
> *From:*Berg, Michael C
> *Sent:* Thursday, March 19, 2015 5:55 PM
> *To:* Christian Thalinger
> *Cc:* hotspot-compiler-dev at openjdk.java.net
> *Subject:* RE: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
>
> Christian, yes we could rely on the base class definitions instead,
> since we are not augmenting arguments.
>
> I will remove the file changes after the review concludes in case there
> are any other modifications.
>
> Thanks,
>
> -Michael
>
> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
> *Sent:* Thursday, March 19, 2015 3:52 PM
> *To:* Berg, Michael C
> *Cc:* hotspot-compiler-dev at openjdk.java.net
> <mailto:hotspot-compiler-dev at openjdk.java.net>
> *Subject:* Re: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
>
>     On Mar 19, 2015, at 3:23 PM, Berg, Michael C
>     <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>
>     I have updated the webrev contents after some feedback(with no code
>     changes), and Vladimir has placed it in location everyone can access.
>     Anyone should be able to apply the patch or review the code from
>     this info:
>
>     http://cr.openjdk.java.net/~kvn/8074981/webrev.00/
>
> src/cpu/x86/vm/macroAssembler_x86.hpp:
>
> Why do we need these methods?  MacroAssembler extends Assembler.
>
>
>     this replaces the JBS version of the webrev files for 8074981.
>
>     Thanks,
>
>     -Michael
>
>     -----Original Message-----
>     From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com]
>     Sent: Thursday, March 19, 2015 12:55 AM
>     To: Berg, Michael C
>     Cc: hotspot-compiler-dev at openjdk.java.net
>     <mailto:hotspot-compiler-dev at openjdk.java.net>
>     Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
>
>     Michael,
>
>     I've got it, thank you for explanation.
>
>     Regards,
>     Filipp.
>
>     On Wed, Mar 18, 2015 at 5:53 PM, Berg, Michael C
>     <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>
>         Filipp, for large iteration loops, if I am taking your meaning
>         correctly, you could not do that without splitting the loop and
>         re-architecting it into a loop nest pair to manage the reduction
>         components.  Seems like the overhead from that scenario could
>         create cost issues where reductions could actually hamper
>         performance in small vector expressions.  Right now we never
>         degrade and generally benefit with the implementation as it
>         stands with the reductions stitched into the vector unit
>         computations directly.
>
>         Regarding sub/div/etc:
>         For now we have waived off on non-commuting operations like sub
>         and div, they would have to be very strictly managed via
>         pack-set placement.  But the answer is yes we could support them.
>
>         Thanks,
>         -Michael
>
>         -----Original Message-----
>         From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com]
>         Sent: Wednesday, March 18, 2015 2:20 AM
>         To: Berg, Michael C
>         Cc: hotspot-compiler-dev at openjdk.java.net
>         <mailto:hotspot-compiler-dev at openjdk.java.net>
>         Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction
>         optimization
>         )
>
>         Hi Michael,
>
>         thank you for contributing such a great improvement!
>
>         Sorry if my question is silly, but I'm curious wouldn't it be
>         better to replace integer scalar reduction variable with a
>         vector "Rv" in loop's prologue, compile loop's body as a regular
>         vectorized addition/multiplication, and reduce "Rv" to a scalar
>         in loop's epilogue?
>
>         Why you didn't add SubReduction* nodes?
>
>         Best regards,
>         Filipp.
>
>
>         On Tue, Mar 17, 2015 at 12:40 AM, Berg, Michael C
>         <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>
>             Hi All,
>
>
>
>             We would like to contribute the Integer/FP scalar reduction
>             optimization from Intel.
>
>             The contribution is referenced as Bug ID 8074981 as a
>             performance
>             enhancement.
>
>
>
>             Please review this patch:
>
>             Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981
>
>             webrev:
>             https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip
>
>
>
>             The optimization achieves as much as 2.3x on integer
>             reductions and
>             supports float and double precision optimizations
>
>             which also have significant optimization uplift an obey
>             strict fp
>             constraints.
>
>
>
>             Nils Eliasson has offered to sponsor this patch.
>
>
>
>             Thanks,
>
>
>
>             -Michael
>