RFR: 8318650: Optimized subword gather for x86 targets. [v9]
Jatin Bhateja
jbhateja at openjdk.org
Mon Dec 18 06:01:12 UTC 2023
> Hi All,
>
> This patch optimizes sub-word gather operation for x86 targets with AVX2 and AVX512 features.
>
> Following is the summary of changes:-
>
> 1) Intrinsify sub-word gather using hybrid algorithm which initially partially unrolls scalar loop to accumulates values from gather indices into a quadword(64bit) slice followed by vector permutation to place the slice into appropriate vector lanes, it prevents code bloating and generates compact JIT sequence. This coupled with savings from expansive array allocation in existing java implementation translates into significant performance of 1.5-10x gains with included micro on Intel Atom family CPUs and with JVM option UseAVX=2.
>
> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d)
>
>
> 2) For AVX512 targets algorithm uses integral gather instructions to load values from normalized indices which are multiple of integer size, followed by shuffling and packing exact sub-word values from integral lanes.
>
> 3) Patch was also compared against modified java fallback implementation by replacing temporary array allocation with zero initialized vector and a scalar loops which inserts gathered values into vector. But, vector insert operation in higher vector lanes is a three step process which first extracts the upper vector 128 bit lane, updates it with gather subword value and then inserts the lane back to its original position. This makes inserts into higher order lanes costly w.r.t to proposed solution. In addition generated JIT code for modified fallback implementation was very bulky. This may impact in-lining decisions into caller contexts.
>
> Kindly review and share your feedback.
>
> Best Regards,
> Jatin
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
Removing JDK-8321648 related changes.
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/16354/files
- new: https://git.openjdk.org/jdk/pull/16354/files/a6f0f8cf..4af776e8
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=08
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=07-08
Stats: 16 lines in 1 file changed: 14 ins; 0 del; 2 mod
Patch: https://git.openjdk.org/jdk/pull/16354.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/16354/head:pull/16354
PR: https://git.openjdk.org/jdk/pull/16354
More information about the hotspot-compiler-dev
mailing list