RFR: 8335444: Generalize implementation of AndNode mul_ring
Jasmine Karthikeyan
jkarthikeyan at openjdk.org
Tue Aug 6 04:07:36 UTC 2024
On Mon, 5 Aug 2024 08:03:52 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:
>> Thank you for running testing @chhagedorn! I think I didn't run into this because my device doesn't support AVX-512. Does the failure have an ideal node printout as well? I think that could help in diagnosing the issue. Thanks!
>
> @jaskarth out of curiosity: could you by chance notice any measurable performance difference (e.g. for specific/ad-hoc benchmarks)?
@dafedafe I added a microbenchmark based on the case I saw above, and got these results:
Baseline Patch Improvement
Benchmark (COUNT) (seed) Mode Cnt Score Error Units Score Error Units
TypeVectorOperations.TypeVectorOperationsNonSuperWord.andZ 512 0 avgt 8 155.288 ± 1.175 ns/op 188.844 ± 4.189 ns/op (+ 19.5%)
TypeVectorOperations.TypeVectorOperationsNonSuperWord.andZ 2048 0 avgt 8 629.098 ± 7.489 ns/op 732.558 ± 3.983 ns/op (+ 15.2%)
TypeVectorOperations.TypeVectorOperationsSuperWord.andZ 512 0 avgt 8 22.917 ± 0.338 ns/op 23.578 ± 1.003 ns/op (+ 2.8%)
TypeVectorOperations.TypeVectorOperationsSuperWord.andZ 2048 0 avgt 8 35.683 ± 0.232 ns/op 37.546 ± 1.063 ns/op (+ 5.1%)
In general though I've found that unfortunately it's pretty difficult to identify specific places where performance is improved, since rather than improving nodes locally this analysis strengthens other idealizations that use int types. By improving the type we might be able to find more operations that evaluate to constants or prune out redundant comparisons, either directly or through another node that transforms the type further. I've been wanting to make our type analysis stronger, so that we can find more nontrivial optimizations without needing specialized idealization rules.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20066#issuecomment-2270339413
More information about the hotspot-compiler-dev
mailing list