RFR: 8257772: Vectorizing clear memory operation using AVX-512 masked operations [v4]

Jatin Bhateja jbhateja at openjdk.java.net
Tue Dec 8 18:19:24 UTC 2020


> A newly allocated memory is initialized either using user provided initialization values for various fields or setting the memory to zero as per java semantics (System initialization).
> 
> C2 compiler creates ClearArray Node in order to perform system initialization. ClearArray accepts the number of Heap Words to be initialized, this number can be constant or a non-constant value. For constant number of heap words less than InitArrayShortSize (default value 64 bytes) currently compiler generates StoreL nodes which does the initialization at the granularity of 8 bytes.
> 
> This patch vectorizes the initializing store operations for constant sized heap word less than InitArrayShortSize by emitting special instruction sequence for various tail sizes.
> 
> In addition existing implementation for initialization under UseXMMForObjInit is extended to use masked operation to optimize tail initialization sequence. In case AVX3Threshold is set to 0 then new initialization sequence uses 64 byte ZMM registers.
> 
> Following are the performance stats collected using  micro-benchmark included with the patch.
> 
> Testing : Tier1-Tier3 level tests are clean.
> 
> System Configuration : Cascadelake, Intel Xeon Platinum 8280L @ 2.7 GHz, 2 socket, 28 cores per socket.
> 
> Baseline:
> Benchmark                      Mode  Cnt       Score       Error  Units
> ClearMemory.testClearMemory3  thrpt   10  212508.522 ± 14071.493  ops/s
> ClearMemory.testClearMemory4  thrpt   10  189530.643 ± 12882.421  ops/s
> ClearMemory.testClearMemory5  thrpt   10  167878.803 ± 10307.163  ops/s
> ClearMemory.testClearMemory6  thrpt   10  152732.184 ±  8740.128  ops/s
> ClearMemory.testClearMemory7  thrpt   10  132111.536 ±  5493.043  ops/s
> 
> With Optimization:
> 
> Benchmark                      Mode  Cnt       Score       Error  Units
> ClearMemory.testClearMemory3  thrpt   10  220378.082 ± 18533.701  ops/s
> ClearMemory.testClearMemory4  thrpt   10  198023.913 ± 15995.780  ops/s
> ClearMemory.testClearMemory5  thrpt   10  183476.886 ± 13488.821  ops/s
> ClearMemory.testClearMemory6  thrpt   10  161710.750 ±  9270.182  ops/s
> ClearMemory.testClearMemory7  thrpt   10  145059.426 ±  8217.484  ops/s

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  8257772: Changing the default value for UseXMMForObjInit and UseFastStosb flags.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/1631/files
  - new: https://git.openjdk.java.net/jdk/pull/1631/files/f96c01e7..af1bf755

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1631&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1631&range=02-03

  Stats: 90 lines in 3 files changed: 50 ins; 12 del; 28 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1631.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1631/head:pull/1631

PR: https://git.openjdk.java.net/jdk/pull/1631


More information about the hotspot-dev mailing list