RFR: 8264409: AArch64: generate better code for Vector API allTrue

Ningsheng Jian njian at openjdk.java.net
Fri Apr 2 03:00:23 UTC 2021


On Fri, 2 Apr 2021 02:26:52 GMT, Pengfei Li <pli at openjdk.org> wrote:

>> In Vector API NEON implementation, we use a vector register to represent vector mask, where an element value of -1 is a true mask and an element value of 0 is a false mask. The allTrue() api is used to check whether all the elements of current mask are set.
>> 
>> Currently, the AArch64 NEON allTrue implementation looks like:
>> 
>>   andr  $tmp, T16B $src1, $src2\t# src2 is maskAllTrue
>>   notr  $tmp, T16B, $tmp
>>   addv  $tmp, T16B, $tmp
>>   umov  $dst, $tmp, B, 0
>>   cmp   $dst, 0
>>   cset  $dst
>> 
>> where $src2 is a preset all true (-1) constant. We can optimize it to the code sequence like below, to check whether all bits are set:
>> 
>>   uminv $tmp, T16B, $src1
>>   umov  $dst, $tmp, B, 0
>>   cmp   $dst, 0xff
>>   cset  $dst
>> 
>> With this codegen improvement, we can see about 8%~70% performance uplift on different machines for Alibaba's Vector API bigdata benchmarks [1][2].
>> 
>> Tested with tier1 and vector api jtreg tests.
>> 
>> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/bigdata/BooleanArrayCheck.java#L61
>> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/bigdata/ValueRangeCheckAndCastL2I.java#L93
>
> src/hotspot/cpu/aarch64/aarch64_neon.ad line 3571:
> 
>> 3569:   format %{ "uminv $tmp, T8B, $src1\n\t"
>> 3570:             "umov  $dst, $tmp, B, 0\n\t"
>> 3571:             "cmp   $dst, 0xff\n\t"
> 
> I think we should write "#0xff" here. But it looks that all other immediates in format field of aarch64_neon.ad lose the number sign as well.

Thanks for the review, but I think both are ok.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3302


More information about the hotspot-compiler-dev mailing list