RFR(L): 8074981 (Integer/FP scalar reduction optimization )

Wed Apr 1 00:37:42 UTC 2015

On 3/31/15 5:36 PM, Christian Thalinger wrote:
>
>> On Mar 30, 2015, at 7:50 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> On 3/30/15 7:20 PM, Berg, Michael C wrote:
>>> Almost, it's more than that, there are missing components in long support in AVX2, so we only allow what superword can currently process safely and bypass the question of long support for reductions until AVX3, where support is complete enough to allow those forms of reductions.
>>
>> Okay.
>>
>>> Nils was the initial reviewer and sponsor, so Nils can you make another pass and comment on the current webrev for the review.
>>
>> Nils is out for few days. Christian looked on this too, let him do second review.
>
> The only comment I have is this opening brace should be one the same line:
>
> + void SuperWord::packset_sort(int n)
> + {

I will fix it before push.

Thanks,
Vladimir

>
> Otherwise this looks good.  Thanks.
>
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Thanks,
>>>
>>> -Michael
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Monday, March 30, 2015 6:47 PM
>>> To: Berg, Michael C
>>> Cc: 'hotspot-compiler-dev at openjdk.java.net'
>>> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
>>>
>>> Here is updated webrev which addressed these and other issues:
>>>
>>> http://cr.openjdk.java.net/~kvn/8074981/webrev.01/
>>>
>>> Michael, I noticed that .ad file does not have matched instructions for AddReductionVL. I assume it is because there is no avx3 yet. Right?
>>>
>>> Otherwise this look good to me. You need second review from an other Reviewer since changes are big.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 3/25/15 4:00 PM, Vladimir Kozlov wrote:
>>>> Please, ignore previous email. I screwed up Michael's email address.
>>>>
>>>> Hi Michael,
>>>>
>>>> I have few major concerns which you need to address.
>>>>
>>>> Adding new field _attr to Node class should be avoided - it will
>>>> increase significantly memory footprint of graph and not be used
>>>> frequently (vectorization is rare case).
>>>>
>>>> NodeFlags has only 16 bits and you used 2. And I don't see how
>>>> Flag_is_loop_carried_dep is used.
>>>>
>>>> All above goes to one question: why mark_reductions() is executed in
>>>> loopopts before each unroll and not during superword processing?
>>>> If you do mark_reductions() in superword you can use VectorSet to
>>>> indicate nodes which are reduction nodes.
>>>>
>>>> And the same for _attr. Why to store alignment in Node and not use
>>>> _node_info in packset_eval()?
>>>>
>>>> Small note. Instead of:
>>>> +        Node *defNode = n->in(len - 1);
>>>> use:
>>>> +        Node *defNode = n->in(LoopNode::LoopBackControl);
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>
>>>> On 3/25/15 1:09 PM, Berg, Michael C wrote:
>>>>> Christian/Nils: Any additional comments for the review, if not
>>>>> Thursday I will upload the final webrev with the requested change.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Michael
>>>>>
>>>>> *From:*Berg, Michael C
>>>>> *Sent:* Thursday, March 19, 2015 5:55 PM
>>>>> *To:* Christian Thalinger
>>>>> *Cc:* hotspot-compiler-dev at openjdk.java.net
>>>>> *Subject:* RE: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>> optimization )
>>>>>
>>>>> Christian, yes we could rely on the base class definitions instead,
>>>>> since we are not augmenting arguments.
>>>>>
>>>>> I will remove the file changes after the review concludes in case
>>>>> there are any other modifications.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Michael
>>>>>
>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>> *Sent:* Thursday, March 19, 2015 3:52 PM
>>>>> *To:* Berg, Michael C
>>>>> *Cc:* hotspot-compiler-dev at openjdk.java.net
>>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>> *Subject:* Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>> optimization )
>>>>>
>>>>>      On Mar 19, 2015, at 3:23 PM, Berg, Michael C
>>>>>      <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>
>>>>>      I have updated the webrev contents after some feedback(with no code
>>>>>      changes), and Vladimir has placed it in location everyone can access.
>>>>>      Anyone should be able to apply the patch or review the code from
>>>>>      this info:
>>>>>
>>>>>      http://cr.openjdk.java.net/~kvn/8074981/webrev.00/
>>>>>
>>>>> src/cpu/x86/vm/macroAssembler_x86.hpp:
>>>>>
>>>>> Why do we need these methods?  MacroAssembler extends Assembler.
>>>>>
>>>>>
>>>>>      this replaces the JBS version of the webrev files for 8074981.
>>>>>
>>>>>      Thanks,
>>>>>
>>>>>      -Michael
>>>>>
>>>>>      -----Original Message-----
>>>>>      From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com]
>>>>>      Sent: Thursday, March 19, 2015 12:55 AM
>>>>>      To: Berg, Michael C
>>>>>      Cc: hotspot-compiler-dev at openjdk.java.net
>>>>>      <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>      Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>> optimization )
>>>>>
>>>>>      Michael,
>>>>>
>>>>>      I've got it, thank you for explanation.
>>>>>
>>>>>      Regards,
>>>>>      Filipp.
>>>>>
>>>>>      On Wed, Mar 18, 2015 at 5:53 PM, Berg, Michael C
>>>>>      <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>
>>>>>          Filipp, for large iteration loops, if I am taking your meaning
>>>>>          correctly, you could not do that without splitting the loop and
>>>>>          re-architecting it into a loop nest pair to manage the reduction
>>>>>          components.  Seems like the overhead from that scenario could
>>>>>          create cost issues where reductions could actually hamper
>>>>>          performance in small vector expressions.  Right now we never
>>>>>          degrade and generally benefit with the implementation as it
>>>>>          stands with the reductions stitched into the vector unit
>>>>>          computations directly.
>>>>>
>>>>>          Regarding sub/div/etc:
>>>>>          For now we have waived off on non-commuting operations like sub
>>>>>          and div, they would have to be very strictly managed via
>>>>>          pack-set placement.  But the answer is yes we could support them.
>>>>>
>>>>>          Thanks,
>>>>>          -Michael
>>>>>
>>>>>          -----Original Message-----
>>>>>          From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com]
>>>>>          Sent: Wednesday, March 18, 2015 2:20 AM
>>>>>          To: Berg, Michael C
>>>>>          Cc: hotspot-compiler-dev at openjdk.java.net
>>>>>          <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>          Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>>          optimization
>>>>>          )
>>>>>
>>>>>          Hi Michael,
>>>>>
>>>>>          thank you for contributing such a great improvement!
>>>>>
>>>>>          Sorry if my question is silly, but I'm curious wouldn't it be
>>>>>          better to replace integer scalar reduction variable with a
>>>>>          vector "Rv" in loop's prologue, compile loop's body as a regular
>>>>>          vectorized addition/multiplication, and reduce "Rv" to a scalar
>>>>>          in loop's epilogue?
>>>>>
>>>>>          Why you didn't add SubReduction* nodes?
>>>>>
>>>>>          Best regards,
>>>>>          Filipp.
>>>>>
>>>>>
>>>>>          On Tue, Mar 17, 2015 at 12:40 AM, Berg, Michael C
>>>>>          <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>> wrote:
>>>>>
>>>>>              Hi All,
>>>>>
>>>>>
>>>>>
>>>>>              We would like to contribute the Integer/FP scalar reduction
>>>>>              optimization from Intel.
>>>>>
>>>>>              The contribution is referenced as Bug ID 8074981 as a
>>>>>              performance
>>>>>              enhancement.
>>>>>
>>>>>
>>>>>
>>>>>              Please review this patch:
>>>>>
>>>>>              Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981
>>>>>
>>>>>              webrev:
>>>>>
>>>>> https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip
>>>>>
>>>>>
>>>>>
>>>>>              The optimization achieves as much as 2.3x on integer
>>>>>              reductions and
>>>>>              supports float and double precision optimizations
>>>>>
>>>>>              which also have significant optimization uplift an obey
>>>>>              strict fp
>>>>>              constraints.
>>>>>
>>>>>
>>>>>
>>>>>              Nils Eliasson has offered to sponsor this patch.
>>>>>
>>>>>
>>>>>
>>>>>              Thanks,
>>>>>
>>>>>
>>>>>
>>>>>              -Michael
>>>>>
>