RFR: 8257772: Vectorizing clear memory operation using AVX-512 masked operations [v5]
Jatin Bhateja
jbhateja at openjdk.java.net
Mon Dec 14 05:00:19 UTC 2020
> A newly allocated memory is initialized either using user provided initialization values for various fields or setting the memory to zero as per java semantics (System initialization).
>
> C2 compiler creates ClearArray Node in order to perform system initialization. ClearArray accepts the number of Heap Words to be initialized, this number can be constant or a non-constant value. For constant number of heap words less than InitArrayShortSize (default value 64 bytes) currently compiler generates StoreL nodes which does the initialization at the granularity of 8 bytes.
>
> This patch vectorizes the initializing store operations for constant sized heap word less than InitArrayShortSize by emitting special instruction sequence for various tail sizes.
>
> In addition existing implementation for initialization under UseXMMForObjInit is extended to use masked operation to optimize tail initialization sequence. In case AVX3Threshold is set to 0 then new initialization sequence uses 64 byte ZMM registers.
>
> Following are the performance stats collected using micro-benchmark included with the patch.
>
> Testing : Tier1-Tier3 level tests are clean.
>
> System Configuration : Cascadelake, Intel Xeon Platinum 8280L @ 2.7 GHz, 2 socket, 28 cores per socket.
>
> ### Baseline:
> Benchmark Mode Cnt Score Error Units
> ClearMemory.testClearMemory16K thrpt 2 1427741.069 ops/s
> ClearMemory.testClearMemory1K thrpt 2 47628368.596 ops/s
> ClearMemory.testClearMemory1M thrpt 2 27388.979 ops/s
> ClearMemory.testClearMemory24B thrpt 2 167681010.419 ops/s
> ClearMemory.testClearMemory2K thrpt 2 22043948.290 ops/s
> ClearMemory.testClearMemory32B thrpt 2 168599498.817 ops/s
> ClearMemory.testClearMemory32K thrpt 2 775985.067 ops/s
> ClearMemory.testClearMemory40B thrpt 2 153375273.800 ops/s
> ClearMemory.testClearMemory48B thrpt 2 145328531.804 ops/s
> ClearMemory.testClearMemory4K thrpt 2 6492257.452 ops/s
> ClearMemory.testClearMemory56B thrpt 2 122376321.652 ops/s
> ClearMemory.testClearMemory8K thrpt 2 2857444.413 ops/s
> ClearMemory.testClearMemory8M thrpt 2 3461.674 ops/s
> ### With Optimization:
> Benchmark Mode Cnt Score Error Units
> ClearMemory.testClearMemory16K thrpt 2 2529701.368 ops/s
> ClearMemory.testClearMemory1K thrpt 2 50276682.550 ops/s
> ClearMemory.testClearMemory1M thrpt 2 27458.588 ops/s
> ClearMemory.testClearMemory24B thrpt 2 178751174.642 ops/s
> ClearMemory.testClearMemory2K thrpt 2 22574802.694 ops/s
> ClearMemory.testClearMemory32B thrpt 2 176630844.950 ops/s
> ClearMemory.testClearMemory32K thrpt 2 1297627.181 ops/s
> ClearMemory.testClearMemory40B thrpt 2 167469550.653 ops/s
> ClearMemory.testClearMemory48B thrpt 2 159391163.006 ops/s
> ClearMemory.testClearMemory4K thrpt 2 9045158.643 ops/s
> ClearMemory.testClearMemory56B thrpt 2 134550172.421 ops/s
> ClearMemory.testClearMemory8K thrpt 2 4581450.664 ops/s
> ClearMemory.testClearMemory8M thrpt 2 3446.834 ops/s
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
Review comments resolution.
-------------
Changes:
- all: https://git.openjdk.java.net/jdk/pull/1631/files
- new: https://git.openjdk.java.net/jdk/pull/1631/files/af1bf755..717e71c5
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1631&range=04
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1631&range=03-04
Stats: 168 lines in 6 files changed: 21 ins; 25 del; 122 mod
Patch: https://git.openjdk.java.net/jdk/pull/1631.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/1631/head:pull/1631
PR: https://git.openjdk.java.net/jdk/pull/1631
More information about the hotspot-dev
mailing list