RFR: 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite [v6]
Srinivas Vamsi Parasa
duke at openjdk.java.net
Wed May 18 07:03:57 UTC 2022
On Wed, 18 May 2022 05:52:02 GMT, Srinivas Vamsi Parasa <duke at openjdk.java.net> wrote:
>> Impressive. Few comments.
>>
>> You are testing performance of storing `boolean` results into array but usually these Java methods used in conditions. Measuring that will be more real word case. For both case: with `avx512dq` On and OFF.
>>
>> And you need to post you perf results at least in RFE. Please, also show what instructions are currently generated vs your changes. I don't get how you made `isNaN()` faster - you generate more instructions is seems.
>>
>> Instead of 3 new Ideal nodes per type you can use one and store instrinsic id (or other enum) in its field which you can read in `.ad` file instructions. Instead I suggest to split those mach instructions based on `avx512dq` support to avoid unused registers killing.
>>
>> Why Double type support is limited to LP64? Why there is no `x86_32.ad` changes?
>>
>> You can reuse `tmp1` in `double_class_check()`.
>
>> Impressive. Few comments.
>>
>> You are testing performance of storing `boolean` results into array but usually these Java methods used in conditions. Measuring that will be more real word case. For both case: with `avx512dq` On and OFF.
>>
>> And you need to post you perf results at least in RFE. Please, also show what instructions are currently generated vs your changes. I don't get how you made `isNaN()` faster - you generate more instructions is seems.
>>
>> Instead of 3 new Ideal nodes per type you can use one and store instrinsic id (or other enum) in its field which you can read in `.ad` file instructions. Instead I suggest to split those mach instructions based on `avx512dq` support to avoid unused registers killing.
>>
>> Why Double type support is limited to LP64? Why there is no `x86_32.ad` changes?
>>
>> You can reuse `tmp1` in `double_class_check()`.
>
> Hi Vladimir (@vnkozlov),
> Sorry for the delay!
> As per your suggestions, the JMH benchmarks were updated to use these Java methods in conditions and updated the RFE with performance data with and without the vfpclassss/d instructions. Also removed the redundant temp2 as per your suggestion.
> Will work on condensing the 3 nodes to one and add support for x86_32.ad.
>
> Thanks,
> Vamsi
> @vamsi-parasa Yes #8525 change the matching of `Bool` to `cmpOpUCF` for eq/ne where both inputs are the same and add `CMove` rules for `cmpOpUCF2`, which should prevent floating point comparison from matching `cmpOpU`, which has bad overhead of fixing the flags.
Glad to know that you removed the overhead of fixing up the flags register.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8459
More information about the hotspot-compiler-dev
mailing list