RFR: 8279508: Auto-vectorize Math.round API [v9]

Sat Feb 26 01:33:54 UTC 2022

On Fri, 25 Feb 2022 06:22:42 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Summary of changes:
>> - Intrinsify Math.round(float) and Math.round(double) APIs.
>> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics.
>> - Test creation using new IR testing framework.
>> 
>> Following are the performance number of a JMH micro included with the patch 
>> 
>> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server)
>> 
>> 
>> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio
>> -- | -- | -- | -- | -- | -- | -- | --
>> FpRoundingBenchmark.test_round_double | 1024.00 | 504.15 | 2209.54 | 4.38 | 510.36 | 548.39 | 1.07
>> FpRoundingBenchmark.test_round_double | 2048.00 | 293.64 | 1271.98 | 4.33 | 293.48 | 274.01 | 0.93
>> FpRoundingBenchmark.test_round_float | 1024.00 | 825.99 | 4754.66 | 5.76 | 751.83 | 2274.13 | 3.02
>> FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | 388.52 | 1334.18 | 3.43
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8279508: Adding descriptive comments.

Other than this the patch looks good to me. What testing have you done?

src/hotspot/cpu/x86/x86.ad line 7263:

> 7261:     __ vector_round_float_avx($dst$$XMMRegister, $src$$XMMRegister, $xtmp1$$XMMRegister,
> 7262:                               $xtmp2$$XMMRegister, $xtmp3$$XMMRegister, $xtmp4$$XMMRegister,
> 7263:                               ExternalAddress(vector_float_signflip()), new_mxcsr, $scratch$$Register, vlen_enc);

The vector_float_signflip() here should be replaced by vector_all_bits_set().
cvtps2dq description:
If a converted result cannot be represented in the destination
format, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value
(2w-1, where w represents the number of bits in the destination format) is returned.

src/hotspot/cpu/x86/x86.ad line 7280:

> 7278:     __ vector_round_float_evex($dst$$XMMRegister, $src$$XMMRegister, $xtmp1$$XMMRegister,
> 7279:                                $xtmp2$$XMMRegister, $ktmp1$$KRegister, $ktmp2$$KRegister,
> 7280:                                ExternalAddress(vector_float_signflip()), new_mxcsr, $scratch$$Register, vlen_enc);

The vector_float_signflip() here should be replaced by vector_all_bits_set().

src/hotspot/cpu/x86/x86.ad line 7295:

> 7293:     __ vector_round_double_evex($dst$$XMMRegister, $src$$XMMRegister, $xtmp1$$XMMRegister,
> 7294:                                 $xtmp2$$XMMRegister, $ktmp1$$KRegister, $ktmp2$$KRegister,
> 7295:                                 ExternalAddress(vector_double_signflip()), new_mxcsr, $scratch$$Register, vlen_enc);

The vector_double_signflip() here should be replaced by vector_all_bits_set().
vcvtpd2qq description:
If a converted result cannot be represented in the destination
format, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value
(2w-1, where w represents the number of bits in the destination format) is returned.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7094