RFR: 8320347: Emulate vblendvp[sd] on ECore

Jatin Bhateja jbhateja at openjdk.org
Tue Nov 21 19:07:06 UTC 2023


On Mon, 20 Nov 2023 21:32:54 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:

> > Hi @vpaprotsk , please add checks to skip special emulation for 128 bit vectors at applicable places, as per section "4.1.8.4 256-bit Variable Blend Instructions" of x86 optimization manual variable blends are micro-coded only for 256 bit vectors.
> 
> I went and remeasured performance of 128-bit vectors with `-XX:MaxVectorSize=16`...
> 
> ```
> =============== BEFORE ===============
> Benchmark                Mode  Cnt    Score   Error  Units
> MaxMinOptimizeTest.dAdd  avgt    3   77.232 ± 0.034  us/op
> MaxMinOptimizeTest.dMax  avgt    3  149.242 ± 2.373  us/op
> MaxMinOptimizeTest.dMin  avgt    3  150.000 ± 1.763  us/op
> MaxMinOptimizeTest.dMul  avgt    3   77.237 ± 0.020  us/op
> MaxMinOptimizeTest.fAdd  avgt    3   77.156 ± 0.012  us/op
> MaxMinOptimizeTest.fMax  avgt    3  110.729 ± 0.743  us/op
> MaxMinOptimizeTest.fMin  avgt    3  110.716 ± 0.157  us/op
> MaxMinOptimizeTest.fMul  avgt    3   77.157 ± 0.017  us/op
> Benchmark                 (SIZE)  Mode  Cnt    Score    Error  Units
> VectorSignum.floatSignum     256  avgt    3  134.137 ±  4.586  ns/op
> VectorSignum.floatSignum     512  avgt    3  258.117 ±  0.518  ns/op
> VectorSignum.floatSignum    1024  avgt    3  512.706 ±  5.924  ns/op
> VectorSignum.floatSignum    2048  avgt    3  979.276 ± 46.734  ns/op
> VectorSignum.doubleSignum     256  avgt    3   233.108 ±  5.314  ns/op
> VectorSignum.doubleSignum     512  avgt    3   457.757 ±  3.537  ns/op
> VectorSignum.doubleSignum    1024  avgt    3   907.037 ±  2.768  ns/op
> VectorSignum.doubleSignum    2048  avgt    3  1816.200 ± 15.869  ns/op
> 
> =============== AFTER ===============
> Benchmark                Mode  Cnt    Score   Error  Units
> MaxMinOptimizeTest.dAdd  avgt    3   77.238 ± 0.092  us/op
> MaxMinOptimizeTest.dMax  avgt    3  106.636 ± 0.072  us/op
> MaxMinOptimizeTest.dMin  avgt    3  103.060 ± 0.129  us/op
> MaxMinOptimizeTest.dMul  avgt    3   77.233 ± 0.044  us/op
> MaxMinOptimizeTest.fAdd  avgt    3   77.158 ± 0.021  us/op
> MaxMinOptimizeTest.fMax  avgt    3  105.256 ± 1.682  us/op
> MaxMinOptimizeTest.fMin  avgt    3  103.126 ± 0.049  us/op
> MaxMinOptimizeTest.fMul  avgt    3   77.155 ± 0.019  us/op
> Benchmark                 (SIZE)  Mode  Cnt    Score   Error  Units
> VectorSignum.floatSignum     256  avgt    3   60.523 ± 0.026  ns/op
> VectorSignum.floatSignum     512  avgt    3  118.415 ± 0.076  ns/op
> VectorSignum.floatSignum    1024  avgt    3  235.203 ± 0.323  ns/op
> VectorSignum.floatSignum    2048  avgt    3  467.230 ± 0.144  ns/op
> VectorSignum.doubleSignum     256  avgt    3  120.955 ± 0.217  ns/op
> VectorSignum.doubleSignum     512  avgt    3  241.753 ± 0.371  ns/op
> VectorSignum.doubleSignum    1024  avgt    3  498.055 ± 0.410  ns/op
> VectorSignum.doubleSignum    2048  avgt    3  974.891 ± 1.472  ns/op
> ```
> 
> For Max/Min, keeping this patch gets us up to 40%, and `VectorSignum.*Signum`, the fix is actually >2x.

I see following results on cascade lake

-XX:+UnlockDiagnosticVMOptions -XX:-EnableX86ECoreOpts -XX:MaxVectorSize=16

Benchmark                    Mode  Cnt    Score   Error  Units
MaxMinOptimizeTest.dMax      avgt    2  119.131          us/op
MaxMinOptimizeTest.dMax:asm  avgt           NaN            ---
MaxMinOptimizeTest.dMin      avgt    2  117.812          us/op
MaxMinOptimizeTest.dMin:asm  avgt           NaN            ---


-XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:MaxVectorSize=16
Benchmark                    Mode  Cnt    Score   Error  Units
MaxMinOptimizeTest.dMax      avgt    2  128.076          us/op
MaxMinOptimizeTest.dMax:asm  avgt           NaN            ---
MaxMinOptimizeTest.dMin      avgt    2  126.978          us/op
MaxMinOptimizeTest.dMin:asm  avgt           NaN            ---

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1821505204


More information about the hotspot-dev mailing list