Integrated: 8257772: Vectorizing clear memory operation using AVX-512 masked operations
Jatin Bhateja
jbhateja at openjdk.java.net
Thu Dec 17 04:44:59 UTC 2020
On Fri, 4 Dec 2020 18:28:44 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
> A newly allocated memory is initialized either using user provided initialization values for various fields or setting the memory to zero as per java semantics (System initialization).
>
> C2 compiler creates ClearArray Node in order to perform system initialization. ClearArray accepts the number of Heap Words to be initialized, this number can be constant or a non-constant value. For constant number of heap words less than InitArrayShortSize (default value 64 bytes) currently compiler generates StoreL nodes which does the initialization at the granularity of 8 bytes.
>
> This patch vectorizes the initializing store operations for constant sized heap word less than InitArrayShortSize by emitting special instruction sequence for various tail sizes.
>
> In addition existing implementation for initialization under UseXMMForObjInit is extended to use masked operation to optimize tail initialization sequence. In case AVX3Threshold is set to 0 then new initialization sequence uses 64 byte ZMM registers.
>
> Following are the performance stats collected using micro-benchmark included with the patch.
>
> Testing : Tier1-Tier3 level tests are clean.
>
> System Configuration : Cascadelake, Intel Xeon Platinum 8280L @ 2.7 GHz, 2 socket, 28 cores per socket.
>
> ### Baseline:
> Benchmark Mode Cnt Score Error Units
> ClearMemory.testClearMemory16K thrpt 2 1427741.069 ops/s
> ClearMemory.testClearMemory1K thrpt 2 47628368.596 ops/s
> ClearMemory.testClearMemory1M thrpt 2 27388.979 ops/s
> ClearMemory.testClearMemory24B thrpt 2 167681010.419 ops/s
> ClearMemory.testClearMemory2K thrpt 2 22043948.290 ops/s
> ClearMemory.testClearMemory32B thrpt 2 168599498.817 ops/s
> ClearMemory.testClearMemory32K thrpt 2 775985.067 ops/s
> ClearMemory.testClearMemory40B thrpt 2 153375273.800 ops/s
> ClearMemory.testClearMemory48B thrpt 2 145328531.804 ops/s
> ClearMemory.testClearMemory4K thrpt 2 6492257.452 ops/s
> ClearMemory.testClearMemory56B thrpt 2 122376321.652 ops/s
> ClearMemory.testClearMemory8K thrpt 2 2857444.413 ops/s
> ClearMemory.testClearMemory8M thrpt 2 3461.674 ops/s
> ### With Optimization:
> Benchmark Mode Cnt Score Error Units
> ClearMemory.testClearMemory16K thrpt 2 2529701.368 ops/s
> ClearMemory.testClearMemory1K thrpt 2 50276682.550 ops/s
> ClearMemory.testClearMemory1M thrpt 2 27458.588 ops/s
> ClearMemory.testClearMemory24B thrpt 2 178751174.642 ops/s
> ClearMemory.testClearMemory2K thrpt 2 22574802.694 ops/s
> ClearMemory.testClearMemory32B thrpt 2 176630844.950 ops/s
> ClearMemory.testClearMemory32K thrpt 2 1297627.181 ops/s
> ClearMemory.testClearMemory40B thrpt 2 167469550.653 ops/s
> ClearMemory.testClearMemory48B thrpt 2 159391163.006 ops/s
> ClearMemory.testClearMemory4K thrpt 2 9045158.643 ops/s
> ClearMemory.testClearMemory56B thrpt 2 134550172.421 ops/s
> ClearMemory.testClearMemory8K thrpt 2 4581450.664 ops/s
> ClearMemory.testClearMemory8M thrpt 2 3446.834 ops/s
This pull request has now been integrated.
Changeset: c11525a4
Author: Jatin Bhateja <jbhateja at openjdk.org>
URL: https://git.openjdk.java.net/jdk/commit/c11525a4
Stats: 404 lines in 8 files changed: 376 ins; 3 del; 25 mod
8257772: Vectorizing clear memory operation using AVX-512 masked operations
Reviewed-by: thartmann, kvn
-------------
PR: https://git.openjdk.java.net/jdk/pull/1631
More information about the hotspot-compiler-dev
mailing list