RFR: 8286972: Support the new loop induction variable related PopulateIndex IR node on x86 [v3]

Fri May 20 04:49:58 UTC 2022

On Thu, 19 May 2022 23:34:59 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> This PR adds x86 backend support for the new loop induction variable related PopulateIndex IR node. 
>> This IR node was added as part of [JDK-8280510](https://bugs.openjdk.java.net/browse/JDK-8280510).
>> 
>> The performance numbers are as follows:
>> Before:
>> Benchmark                   (count)   Mode  Cnt       Score       Error  Units
>> IndexVector.exprWithIndex1    65536  thrpt    3   64556.552 ±  1126.396  ops/s
>> IndexVector.exprWithIndex2    65536  thrpt    3   22117.050 ± 11452.098  ops/s
>> IndexVector.indexArrayFill    65536  thrpt    3  117776.383 ±  1120.957  ops/s
>> 
>> After:
>> Benchmark                   (count)   Mode  Cnt       Score       Error  Units
>> IndexVector.exprWithIndex1    65536  thrpt    3  203180.290 ±  2147.807  ops/s
>> IndexVector.exprWithIndex2    65536  thrpt    3  274132.756 ±  6853.393  ops/s
>> IndexVector.indexArrayFill    65536  thrpt    3  374165.202 ± 46930.779  ops/s
>> 
>> Please review.
>> 
>> Best Regards,
>> Sandhya
>
> Sandhya Viswanathan has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - remove warmup
>  - Add jtreg test

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 2311:

> 2309:       case T_SHORT: evpbroadcastw(dst, src, vlen_enc); return;
> 2310:       case T_FLOAT: case T_INT: evpbroadcastd(dst, src, vlen_enc); return;
> 2311:       case T_DOUBLE: case T_LONG: evpbroadcastq(dst, src, vlen_enc); return;

Can't we use single and double precision broadcasts for floating point types, like you have done in else part
It may save domain switch over penalty (Section 3.5.2.2 Bypass between Execution Domains, Intel® 64 and IA-32 Architectures Optimization Reference Manual)

src/hotspot/cpu/x86/x86.ad line 8269:

> 8267:   format %{ "vector_populate_index $dst $src1 $src2\t! using $vtmp and $scratch as TEMP" %}
> 8268:   ins_encode %{
> 8269:      int vlen_in_bytes = Matcher::vector_length_in_bytes(this);

Matcher::vector_length can be directly used instead of following computation in line 8274
    vlen_in_bytes/type2aelembytes(elem_bt)

src/hotspot/cpu/x86/x86.ad line 8272:

> 8270:      int vlen_enc = vector_length_encoding(this);
> 8271:      BasicType elem_bt = Matcher::vector_element_basic_type(this);
> 8272:      assert($src2$$constant == 1, "required");

Ideally assertion should be the first statement in a block, since they determine the pre-conditions under which code should executed.

src/hotspot/cpu/x86/x86.ad line 8288:

> 8286:   format %{ "vector_populate_index $dst $src1 $src2\t! using $vtmp and $scratch as TEMP" %}
> 8287:   ins_encode %{
> 8288:      int vlen_in_bytes = Matcher::vector_length_in_bytes(this);

Same as above.

src/hotspot/cpu/x86/x86.ad line 8291:

> 8289:      int vlen_enc = vector_length_encoding(this);
> 8290:      BasicType elem_bt = Matcher::vector_element_basic_type(this);
> 8291:      assert($src2$$constant == 1, "required");

Same as above

test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java line 26:

> 24: /**
> 25: * @test
> 26: * @summary Test vectorization of loop induction variable usage in the loop

PR id missing.

test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java line 39:

> 37: 
> 38: public class TestPopulateIndex {
> 39:     private static final int count = 65536;

Small array size around 10K may work, we can also tune CompileThresholdScaling.

test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java line 71:

> 69: 
> 70:     public void checkResultIndexArrayFill() {
> 71:         for (int i = 0; i < count; ++i) {

post-incrementation for consistency.

test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java line 107:

> 105: 
> 106:     public void checkResultExprWithIndex2() {
> 107:         for (int i = 0; i < count; ++i) {

post-increment induction.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8778