RFR: 8257772: Vectorizing clear memory operation using AVX-512 masked operations

Jatin Bhateja jbhateja at openjdk.java.net
Fri Dec 4 18:34:24 UTC 2020


A newly allocated memory is initialized either using user provided initialization values for various fields or setting the memory to zero as per java semantics (System initialization).

C2 compiler creates ClearArray Node in order to perform system initialization. ClearArray accepts the number of Heap Words to be initialized, this number can be constant or a non-constant value. For constant number of heap words less than InitArrayShortSize (default value 64 bytes) currently compiler generates StoreL nodes which does the initialization at the granularity of 8 bytes.

This patch vectorizes the initializing store operations for constant sized heap word less than InitArrayShortSize by emitting special instruction sequence for various tail sizes.

In addition existing implementation for initialization under UseXMMForObjInit is extended to use masked operation to optimize tail initialization sequence. In case AVX3Threshold is set to 0 then new initialization sequence uses 64 byte ZMM registers.

Following are the performance stats collected using  micro-benchmark included with the patch.

Testing : Tier1-Tier3 level tests are clean.

System Configuration : Cascadelake, Intel Xeon Platinum 8280L @ 2.7 GHz, 2 socket, 28 cores per socket.

Baseline:
Benchmark                      Mode  Cnt       Score       Error  Units
ClearMemory.testClearMemory3  thrpt   10  212508.522 ± 14071.493  ops/s
ClearMemory.testClearMemory4  thrpt   10  189530.643 ± 12882.421  ops/s
ClearMemory.testClearMemory5  thrpt   10  167878.803 ± 10307.163  ops/s
ClearMemory.testClearMemory6  thrpt   10  152732.184 ±  8740.128  ops/s
ClearMemory.testClearMemory7  thrpt   10  132111.536 ±  5493.043  ops/s

With Optimization:

Benchmark                      Mode  Cnt       Score       Error  Units
ClearMemory.testClearMemory3  thrpt   10  220378.082 ± 18533.701  ops/s
ClearMemory.testClearMemory4  thrpt   10  198023.913 ± 15995.780  ops/s
ClearMemory.testClearMemory5  thrpt   10  183476.886 ± 13488.821  ops/s
ClearMemory.testClearMemory6  thrpt   10  161710.750 ±  9270.182  ops/s
ClearMemory.testClearMemory7  thrpt   10  145059.426 ±  8217.484  ops/s

-------------

Commit messages:
 - 8257772: Vectorizing clear memory operation using AVX-512 masked operations

Changes: https://git.openjdk.java.net/jdk/pull/1631/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1631&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8257772
  Stats: 349 lines in 7 files changed: 321 ins; 3 del; 25 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1631.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1631/head:pull/1631

PR: https://git.openjdk.java.net/jdk/pull/1631


More information about the hotspot-compiler-dev mailing list