RFR: 8303401: Add a Vector API equalsIgnoreCase micro benchmark [v6]
Eirik Bjorsnos
duke at openjdk.org
Wed Mar 1 14:22:17 UTC 2023
On Wed, 1 Mar 2023 09:10:47 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:
>> This PR suggests we add a vectorized equalsIgnoreCase benchmark to the set of benchmarks in `org.openjdk.bench.jdk.incubator.vector`. This benchmark serves as an example of how vectorization can be useful also in the area of text processing. It takes advantage of the fact that ASCII and Latin-1 were designed to optimize case-twiddling operations.
>>
>> The code came about during the work on #12632, where vectorization was deemed out of scope.
>>
>> Benchmark results:
>>
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> EqualsIgnoreCaseBenchmark.scalar 16 avgt 15 20.671 ± 0.718 ns/op
>> EqualsIgnoreCaseBenchmark.scalar 32 avgt 15 46.155 ± 3.258 ns/op
>> EqualsIgnoreCaseBenchmark.scalar 64 avgt 15 68.248 ± 1.767 ns/op
>> EqualsIgnoreCaseBenchmark.scalar 128 avgt 15 148.948 ± 0.890 ns/op
>> EqualsIgnoreCaseBenchmark.scalar 1024 avgt 15 1090.708 ± 7.540 ns/op
>> EqualsIgnoreCaseBenchmark.vectorized 16 avgt 15 21.872 ± 0.232 ns/op
>> EqualsIgnoreCaseBenchmark.vectorized 32 avgt 15 11.378 ± 0.097 ns/op
>> EqualsIgnoreCaseBenchmark.vectorized 64 avgt 15 13.703 ± 0.135 ns/op
>> EqualsIgnoreCaseBenchmark.vectorized 128 avgt 15 21.632 ± 0.735 ns/op
>> EqualsIgnoreCaseBenchmark.vectorized 1024 avgt 15 105.509 ± 7.493 ns/op
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision:
>
> The equal.allTrue check early if the loop does not cover cases where some bytes are equal, but not all. Reverting this change.
We can narrow down this performance difference to the following observations:
- `compare(LT, (byte) 0xDF)` is faster than `compare(LE, (byte) 0xDE)`
- `compare(GT, (byte) 0XBF)` is faster than `compare(GE, (byte) 0XC0)`
- `compare(EQ, (byte) 0xD7).not()` is faster than `compare(NE, (byte) 0xD7)`
LT, GT, EQ:
Benchmark (size) Mode Cnt Score Error Units
EqualsIgnoreCaseBenchmark.vectorized 1024 avgt 15 91.796 ± 1.083 ns/op
LE, GE, NE:
Benchmark (size) Mode Cnt Score Error Units
EqualsIgnoreCaseBenchmark.vectorized 1024 avgt 15 118.898 ± 5.480 ns/op
Code:
VectorMask<Byte> asciiLetter = upperA.compare(GT, (byte) '@').and(upperA.compare(LT, (byte) '['));
VectorMask<Byte> lat1Letter = upperA
.compare(LT, (byte) 0xDF) // <= Thorn
.and(upperA.compare(GT, (byte) 0XBF)) // >= A-grave
.and(upperA.compare(EQ, (byte) 0xD7).not()); // Excluding multiplication
vs.
VectorMask<Byte> asciiLetter = upperA.compare(GE, (byte) 'A').and(upperA.compare(LE, (byte) 'Z'));
VectorMask<Byte> lat1Letter = upperA
.compare(LE, (byte) 0xDE) // <= Thorn
.and(upperA.compare(GE, (byte) 0XC0)) // >= A-grave
.and(upperA.compare(NE, (byte) 0xD7)); // Excluding multiplication
-------------
PR: https://git.openjdk.org/jdk/pull/12790
More information about the core-libs-dev
mailing list