RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4]

Srinivas Vamsi Parasa duke at openjdk.org
Thu Mar 28 00:45:33 UTC 2024


> The goal of this PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used.
> 
> The performance data using the ComputePI.java benchmark (part of this PR) is as follows:
> <html xmlns:v="urn:schemas-microsoft-com:vml"
> xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40">
> 
> <head>
> 
> <meta name=ProgId content=Excel.Sheet>
> <meta name=Generator content="Microsoft Excel 15">
> <link id=Main-File rel=Main-File
> href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
> <link rel=File-List
> href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> </head>
> 
> <body link="#0563C1" vlink="#954F72">
> 
> 
> Benchmark   (ns/op) | Stock JDK | This PR (AVX=3) | Speedup
> -- | -- | -- | --
> ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0
> ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9
> ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5
> ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8
> ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8
> ComputePI.compute_pi_long_flt | 628.62 | 627.725 | 1.0
> 
> 
> 
> </body>
> 
> </html>
> 
> <html xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
> xmlns="http://www.w3.org/TR/REC-html40">
> 
> <head>
> 
> <meta name=ProgId content=OneNote.File>
> <meta name=Generator content="Microsoft OneNote 15">
> </head>
> 
> <body lang=en-US style='font-family:Calibri;font-size:11.0pt'>
> 
> 
> <div style='direction:ltr'>
> 
> 
> Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup
> -- | -- | -- | --
> ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0
> ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0
> ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0
> ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0
> ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0
> ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0
> 
> 
> 
> </div>
> 
> 
> </body>
> 
> </html>

Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:

  fix L2F cvtsi2ssq

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/18503/files
  - new: https://git.openjdk.org/jdk/pull/18503/files/fad7180e..970716f4

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18503&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18503&range=02-03

  Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/18503.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18503/head:pull/18503

PR: https://git.openjdk.org/jdk/pull/18503


More information about the hotspot-compiler-dev mailing list