RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5]
Jatin Bhateja
jbhateja at openjdk.org
Thu May 11 07:58:51 UTC 2023
On Thu, 11 May 2023 07:52:10 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>>
>> use is_counted and is_innermost
>
>> I agree with the phasing of the optimization as it gives us an opportunity to perform similar optimization for Vectors created at parse time i.e., though VectorAPIs.
>>
>> 
>>
>> VectorAPI based kernel will have a different graph shape and proposed pattern matching will not be able to handle it. Also, trip count is feeding into LoadVector and scalar reduction operation (scalar add in above example) is secondary an induction variable. I think we can still handle it in a follow up patch by doing a two pass over loop.
>>
>> * Scan loop body and collect all the UnorderdReduction and their users.
>> * Exist optimization if any of following condition holds good.
>>
>> * Different UnorderedReduction have different reduction opcodes.
>> * Reduction node has more than one user.
>> * If above conditions are met, then your algorithm will have to traverse the Scalar operations chain and check if one of its input is UnorderedReduction and other input should be driven by same graph pattern.
>> * Once we find a legal graph pallet then replace Reductions with Vector counterparts and move reduction out of loop as is being done currently by your patch.
>>
>> 
>
> BTW, with VectorAPI users are expected to be more intelligent and your optimizations can be directly implemented in kernel which perform VectorADD operations in main loop followed by Reduction out of loop e.g.
>
>
> outer_loop :
> hand_unrolled_vector_loop:
> v1 = VectorADD(broadcast(0))
> v2 = v1.VectorADD(LoadVector)
> v3 = v2.VectorADD(LoadVector)
> ...
> ...
> inner_loop_end
> res += v3.ReductionAdd()
> outer_loop_end
>
>
> So its not a pressing issue anyways for us.
> @jatin-bhateja exactly. With the Vector API the vector reduction can be explicitly put outside the loop. With SuperWord, we need to take care of it in the compiler.
Your changes looks good to me. Thanks!
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1543515005
More information about the hotspot-compiler-dev
mailing list