RFR: 8257772: Vectorizing clear memory operation using AVX-512 masked operations [v5]
Tobias Hartmann
thartmann at openjdk.java.net
Mon Dec 14 07:37:59 UTC 2020
On Mon, 14 Dec 2020 05:00:19 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> A newly allocated memory is initialized either using user provided initialization values for various fields or setting the memory to zero as per java semantics (System initialization).
>>
>> C2 compiler creates ClearArray Node in order to perform system initialization. ClearArray accepts the number of Heap Words to be initialized, this number can be constant or a non-constant value. For constant number of heap words less than InitArrayShortSize (default value 64 bytes) currently compiler generates StoreL nodes which does the initialization at the granularity of 8 bytes.
>>
>> This patch vectorizes the initializing store operations for constant sized heap word less than InitArrayShortSize by emitting special instruction sequence for various tail sizes.
>>
>> In addition existing implementation for initialization under UseXMMForObjInit is extended to use masked operation to optimize tail initialization sequence. In case AVX3Threshold is set to 0 then new initialization sequence uses 64 byte ZMM registers.
>>
>> Following are the performance stats collected using micro-benchmark included with the patch.
>>
>> Testing : Tier1-Tier3 level tests are clean.
>>
>> System Configuration : Cascadelake, Intel Xeon Platinum 8280L @ 2.7 GHz, 2 socket, 28 cores per socket.
>>
>> ### Baseline:
>> Benchmark Mode Cnt Score Error Units
>> ClearMemory.testClearMemory16K thrpt 2 1427741.069 ops/s
>> ClearMemory.testClearMemory1K thrpt 2 47628368.596 ops/s
>> ClearMemory.testClearMemory1M thrpt 2 27388.979 ops/s
>> ClearMemory.testClearMemory24B thrpt 2 167681010.419 ops/s
>> ClearMemory.testClearMemory2K thrpt 2 22043948.290 ops/s
>> ClearMemory.testClearMemory32B thrpt 2 168599498.817 ops/s
>> ClearMemory.testClearMemory32K thrpt 2 775985.067 ops/s
>> ClearMemory.testClearMemory40B thrpt 2 153375273.800 ops/s
>> ClearMemory.testClearMemory48B thrpt 2 145328531.804 ops/s
>> ClearMemory.testClearMemory4K thrpt 2 6492257.452 ops/s
>> ClearMemory.testClearMemory56B thrpt 2 122376321.652 ops/s
>> ClearMemory.testClearMemory8K thrpt 2 2857444.413 ops/s
>> ClearMemory.testClearMemory8M thrpt 2 3461.674 ops/s
>> ### With Optimization:
>> Benchmark Mode Cnt Score Error Units
>> ClearMemory.testClearMemory16K thrpt 2 2529701.368 ops/s
>> ClearMemory.testClearMemory1K thrpt 2 50276682.550 ops/s
>> ClearMemory.testClearMemory1M thrpt 2 27458.588 ops/s
>> ClearMemory.testClearMemory24B thrpt 2 178751174.642 ops/s
>> ClearMemory.testClearMemory2K thrpt 2 22574802.694 ops/s
>> ClearMemory.testClearMemory32B thrpt 2 176630844.950 ops/s
>> ClearMemory.testClearMemory32K thrpt 2 1297627.181 ops/s
>> ClearMemory.testClearMemory40B thrpt 2 167469550.653 ops/s
>> ClearMemory.testClearMemory48B thrpt 2 159391163.006 ops/s
>> ClearMemory.testClearMemory4K thrpt 2 9045158.643 ops/s
>> ClearMemory.testClearMemory56B thrpt 2 134550172.421 ops/s
>> ClearMemory.testClearMemory8K thrpt 2 4581450.664 ops/s
>> ClearMemory.testClearMemory8M thrpt 2 3446.834 ops/s
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>
> Review comments resolution.
Looks good to me but @vnkozlov should also finish his review before this is integrated.
-------------
Marked as reviewed by thartmann (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/1631
More information about the hotspot-compiler-dev
mailing list