RFR: 8327147: optimized implementation of round operation for x86_64 CPUs
Sandhya Viswanathan
sviswanathan at openjdk.org
Fri Mar 1 22:27:52 UTC 2024
On Fri, 1 Mar 2024 21:46:37 GMT, Dean Long <dlong at openjdk.org> wrote:
>> The goal of this PR is to provide ~4x faster implementation of Math.ceil, Math.floor and Math.rint for x86_64 CPUs.
>>
>> Below is the performance data on an Intel Tiger Lake machine.
>>
>> <html xmlns:v="urn:schemas-microsoft-com:vml"
>> xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40">
>>
>> <head>
>>
>> <meta name=ProgId content=Excel.Sheet>
>> <meta name=Generator content="Microsoft Excel 15">
>> <link id=Main-File rel=Main-File
>> href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> <link rel=File-List
>> href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>>
>>
>> </head>
>>
>> <body link="#0563C1" vlink="#954F72">
>>
>>
>> Benchmark | Stock JDK (ops/ms) | This PR (ops/ms) | Speedup
>> -- | -- | -- | --
>> MathBench.ceilDouble | 547979 | 2170198 | 3.96
>> MathBench.floorDouble | 547979 | 2167459 | 3.96
>> MathBench.rintDouble | 547962 | 2130499 | 3.89
>>
>>
>>
>> </body>
>>
>> </html>
>
> src/hotspot/cpu/x86/x86.ad line 3895:
>
>> 3893:
>> 3894: /*
>> 3895: instruct roundD_mem(legRegD dst, memory src, immU8 rmode) %{
>
> Don't we want roundD_mem enabled, for both roundsd (UseAVX == 0) and vroundsd (UseAVX > 0)?
@dean-long the roundD_mem instruct is the cause of slow performance due to a false dependency. It generates the instruction of the following form which has a 128 bit result:
roundsd xmm0, memory_src, mode
vroundsd xmm0, xmm0, memory_src, mode
xmm0 bits 0:63 are result of round operation on memory_src
xmm0 bits 64:128 are dependent on old value of xmm0 (false dependency)
By forcing the load of memory_src into a register before the operation by as below removes the false dependency:
vmovsd xmm0, memory_src ; bits 64 and above are cleared by vmovsd
vroundsd xmm0, xmm0, xmm0, mode
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18089#discussion_r1509635469
More information about the hotspot-compiler-dev
mailing list