CR for RFR 8149421

Thu Feb 11 06:04:43 UTC 2016

What are the changes in assembler_x86.cpp? You changed no_mask_reg arguments value. Was it bug?

Looks like you copy-paste code from insert_pre_post_loops() which is fine.
One thing is worry me is that due to ratio of unrolling done before vectorization and vector size you can have several 
repetitive vector operations. It would be nice if we do unrolling equal vector size then do vectorization to generate 
one vector instruction, then clone to create vector_post_loop. And then unroll main more.
Or you are already doing something like that?

Thanks,
Vladimir

On 2/9/16 3:16 PM, Berg, Michael C wrote:
> Hi Folks,
>
> I would like to contribute vectorized post loops. This patch is initially targeted for x86.  The design is versatile so
> as to be portable to other targets as well. This code poses the addition of atomic unrolled drain loops which precede
> fix-up segments and which are significantly faster than scalar code. The requirement is that the main loop is super
> unrolled after vectorization. I see up to 54% uplift on micro benchmarks on x86 targets for loops which pass superword
> vectorization and which meet the above criteria.  Also scimark metrics in SpecJvm2008 like lu.small  and fft.small show
> the usage of this design for benefit on x86.
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8149421
>
>
> webrev:
>
> http://cr.openjdk.java.net/~mcberg/8149421/webrev.01/
>
> Thanks,
>
> Michael
>