RFR: 8303401: Add a Vector API equalsIgnoreCase micro benchmark [v6]
Eirik Bjorsnos
duke at openjdk.org
Wed Mar 1 09:32:28 UTC 2023
On Wed, 1 Mar 2023 09:10:47 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:
>> This PR suggests we add a vectorized equalsIgnoreCase benchmark to the set of benchmarks in `org.openjdk.bench.jdk.incubator.vector`. This benchmark serves as an example of how vectorization can be useful also in the area of text processing. It takes advantage of the fact that ASCII and Latin-1 were designed to optimize case-twiddling operations.
>>
>> The code came about during the work on #12632, where vectorization was deemed out of scope.
>>
>> Benchmark results:
>>
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> EqualsIgnoreCaseBenchmark.scalar 16 avgt 15 20.671 ± 0.718 ns/op
>> EqualsIgnoreCaseBenchmark.scalar 32 avgt 15 46.155 ± 3.258 ns/op
>> EqualsIgnoreCaseBenchmark.scalar 64 avgt 15 68.248 ± 1.767 ns/op
>> EqualsIgnoreCaseBenchmark.scalar 128 avgt 15 148.948 ± 0.890 ns/op
>> EqualsIgnoreCaseBenchmark.scalar 1024 avgt 15 1090.708 ± 7.540 ns/op
>> EqualsIgnoreCaseBenchmark.vectorized 16 avgt 15 21.872 ± 0.232 ns/op
>> EqualsIgnoreCaseBenchmark.vectorized 32 avgt 15 11.378 ± 0.097 ns/op
>> EqualsIgnoreCaseBenchmark.vectorized 64 avgt 15 13.703 ± 0.135 ns/op
>> EqualsIgnoreCaseBenchmark.vectorized 128 avgt 15 21.632 ± 0.735 ns/op
>> EqualsIgnoreCaseBenchmark.vectorized 1024 avgt 15 105.509 ± 7.493 ns/op
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision:
>
> The equal.allTrue check early if the loop does not cover cases where some bytes are equal, but not all. Reverting this change.
While using the compare method with the GT/GE/NE operators allows for cleaner code, it also seems to come with a significant performance penalty.
Is this to be expected?
Before, using lt, not:
Benchmark (size) Mode Cnt Score Error Units
EqualsIgnoreCaseBenchmark.vectorized 1024 avgt 15 98.903 ± 1.508 ns/op
After, using compare with LE, GE, NE:
Benchmark (size) Mode Cnt Score Error Units
EqualsIgnoreCaseBenchmark.vectorized 1024 avgt 15 119.723 ± 2.903 ns/op
The lt, not version:
// Determine which bytes represent ASCII or Latin-1 letters:
VectorMask<Byte> asciiLetter = upperA.lt((byte) '[').and(upperA.lt((byte) '@').not());
VectorMask<Byte> lat1Letter = upperA
.lt((byte) 0xDF) // <= Thorn
.and(upperA.lt((byte) 0XBF).not()) // >= A-grave
.and(upperA.eq((byte) 0xD7).not()); // Excluding multiplication
And the LE, GE, NE version:
// Determine which bytes represent ASCII or Latin-1 letters:
VectorMask<Byte> asciiLetter = upperA.compare(GE, (byte) 'A') // >= 'A'
.and(upperA.compare(LE, (byte) 'Z')); // <= 'Z'
VectorMask<Byte> lat1Letter = upperA.compare(GE, (byte) 0XC0) // >= A-grave
.and(upperA.compare(LE, (byte) 0xDE)) // <= Thorn
.and(upperA.compare(NE, (byte) 0xD7)); // Excluding multiplication
-------------
PR: https://git.openjdk.org/jdk/pull/12790
More information about the core-libs-dev
mailing list