RFR: 8270340: Base64 decodeBlock intrinsic for Power64 needs cleanup [v2]
Corey Ashford
cashford at openjdk.java.net
Sat Jul 17 02:01:23 UTC 2021
> This series of commits was written to accomplish several cleanup and Power10 optimization tasks for the Base64 decodeBlock intrinsic for Power64:
> * Remove the ISA 3.1+ (Power10+) pextd instruction optimization in decodeBlock. This "optimization" turned out to actually cause a performance hit. Removing it gains back about 3% in performance.
> * Introduce a constant block, similar to that in use by encodeBlock() to speed up constant loading.
> * Add the ISA 3.1+ xxpermx instruction and align_prefix() method for use in a Power10 optimization for decodeBlock. Please see the commit log for my concerns about this change.
> * Implement the xxpermx-based decodeBlock algorithm for Power10+, which gives about a 5% performance boost.
>
> More details can be found in the commit logs.
>
> I want to note here that I looked into changing the loop_unrolls constant, and found that at large buffer sizes, the values of 2 and 4 give some extra performance gain. For example, on a 20001-byte destination buffer, I see an increase from 4.7X over intrinsic disabled (loop_unrolls=1), to 5.3X over intrinsic disabled (loop_unrolls=4), but on smaller buffer sizes, up to about 512, it causes performance degradation over loop_unrolls=1, so I have decided to stick with the original value of 1, since I don't know where to focus the performance versus buffer length tradeoff.
Corey Ashford has updated the pull request incrementally with one additional commit since the last revision:
stubGenerator_ppc.cpp: fixes for feedback from Kazunori Ogata and Martin Doerr
decodeBlock changes:
* Remove unroll loop and associated comments
* Change comments referring to "P10+" to "P10 (or later)" to remove ambiguity
* Make clear the lack of need for "aligh_prefix()" calls when using xxpermx due to the align(32) call
The following change isn't based on feedback:
encodeBlock changes:
* Fix a comment that still referred to "unrolled loop". Unrolling was removed in an earlier commit.
-------------
Changes:
- all: https://git.openjdk.java.net/jdk/pull/4762/files
- new: https://git.openjdk.java.net/jdk/pull/4762/files/bfa1d3be..1400e26d
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4762&range=01
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4762&range=00-01
Stats: 275 lines in 1 file changed: 103 ins; 127 del; 45 mod
Patch: https://git.openjdk.java.net/jdk/pull/4762.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/4762/head:pull/4762
PR: https://git.openjdk.java.net/jdk/pull/4762
More information about the hotspot-dev
mailing list