RFR: 8286972: Support the new loop induction variable related PopulateIndex IR node on x86 [v3]
Jatin Bhateja
jbhateja at openjdk.java.net
Fri May 20 04:49:58 UTC 2022
On Thu, 19 May 2022 23:34:59 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> This PR adds x86 backend support for the new loop induction variable related PopulateIndex IR node.
>> This IR node was added as part of [JDK-8280510](https://bugs.openjdk.java.net/browse/JDK-8280510).
>>
>> The performance numbers are as follows:
>> Before:
>> Benchmark (count) Mode Cnt Score Error Units
>> IndexVector.exprWithIndex1 65536 thrpt 3 64556.552 ± 1126.396 ops/s
>> IndexVector.exprWithIndex2 65536 thrpt 3 22117.050 ± 11452.098 ops/s
>> IndexVector.indexArrayFill 65536 thrpt 3 117776.383 ± 1120.957 ops/s
>>
>> After:
>> Benchmark (count) Mode Cnt Score Error Units
>> IndexVector.exprWithIndex1 65536 thrpt 3 203180.290 ± 2147.807 ops/s
>> IndexVector.exprWithIndex2 65536 thrpt 3 274132.756 ± 6853.393 ops/s
>> IndexVector.indexArrayFill 65536 thrpt 3 374165.202 ± 46930.779 ops/s
>>
>> Please review.
>>
>> Best Regards,
>> Sandhya
>
> Sandhya Viswanathan has updated the pull request incrementally with two additional commits since the last revision:
>
> - remove warmup
> - Add jtreg test
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 2311:
> 2309: case T_SHORT: evpbroadcastw(dst, src, vlen_enc); return;
> 2310: case T_FLOAT: case T_INT: evpbroadcastd(dst, src, vlen_enc); return;
> 2311: case T_DOUBLE: case T_LONG: evpbroadcastq(dst, src, vlen_enc); return;
Can't we use single and double precision broadcasts for floating point types, like you have done in else part
It may save domain switch over penalty (Section 3.5.2.2 Bypass between Execution Domains, Intel® 64 and IA-32 Architectures Optimization Reference Manual)
src/hotspot/cpu/x86/x86.ad line 8269:
> 8267: format %{ "vector_populate_index $dst $src1 $src2\t! using $vtmp and $scratch as TEMP" %}
> 8268: ins_encode %{
> 8269: int vlen_in_bytes = Matcher::vector_length_in_bytes(this);
Matcher::vector_length can be directly used instead of following computation in line 8274
vlen_in_bytes/type2aelembytes(elem_bt)
src/hotspot/cpu/x86/x86.ad line 8272:
> 8270: int vlen_enc = vector_length_encoding(this);
> 8271: BasicType elem_bt = Matcher::vector_element_basic_type(this);
> 8272: assert($src2$$constant == 1, "required");
Ideally assertion should be the first statement in a block, since they determine the pre-conditions under which code should executed.
src/hotspot/cpu/x86/x86.ad line 8288:
> 8286: format %{ "vector_populate_index $dst $src1 $src2\t! using $vtmp and $scratch as TEMP" %}
> 8287: ins_encode %{
> 8288: int vlen_in_bytes = Matcher::vector_length_in_bytes(this);
Same as above.
src/hotspot/cpu/x86/x86.ad line 8291:
> 8289: int vlen_enc = vector_length_encoding(this);
> 8290: BasicType elem_bt = Matcher::vector_element_basic_type(this);
> 8291: assert($src2$$constant == 1, "required");
Same as above
test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java line 26:
> 24: /**
> 25: * @test
> 26: * @summary Test vectorization of loop induction variable usage in the loop
PR id missing.
test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java line 39:
> 37:
> 38: public class TestPopulateIndex {
> 39: private static final int count = 65536;
Small array size around 10K may work, we can also tune CompileThresholdScaling.
test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java line 71:
> 69:
> 70: public void checkResultIndexArrayFill() {
> 71: for (int i = 0; i < count; ++i) {
post-incrementation for consistency.
test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java line 107:
> 105:
> 106: public void checkResultExprWithIndex2() {
> 107: for (int i = 0; i < count; ++i) {
post-increment induction.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8778
More information about the hotspot-compiler-dev
mailing list