RFR: 8257772: Vectorizing clear memory operation using AVX-512 masked operations [v5]

Tobias Hartmann thartmann at openjdk.java.net
Mon Dec 14 07:37:59 UTC 2020


On Mon, 14 Dec 2020 05:00:19 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> A newly allocated memory is initialized either using user provided initialization values for various fields or setting the memory to zero as per java semantics (System initialization).
>> 
>> C2 compiler creates ClearArray Node in order to perform system initialization. ClearArray accepts the number of Heap Words to be initialized, this number can be constant or a non-constant value. For constant number of heap words less than InitArrayShortSize (default value 64 bytes) currently compiler generates StoreL nodes which does the initialization at the granularity of 8 bytes.
>> 
>> This patch vectorizes the initializing store operations for constant sized heap word less than InitArrayShortSize by emitting special instruction sequence for various tail sizes.
>> 
>> In addition existing implementation for initialization under UseXMMForObjInit is extended to use masked operation to optimize tail initialization sequence. In case AVX3Threshold is set to 0 then new initialization sequence uses 64 byte ZMM registers.
>> 
>> Following are the performance stats collected using  micro-benchmark included with the patch.
>> 
>> Testing : Tier1-Tier3 level tests are clean.
>> 
>> System Configuration : Cascadelake, Intel Xeon Platinum 8280L @ 2.7 GHz, 2 socket, 28 cores per socket.
>> 
>> ### Baseline:
>> Benchmark                        Mode  Cnt          Score   Error  Units
>> ClearMemory.testClearMemory16K  thrpt    2    1427741.069          ops/s
>> ClearMemory.testClearMemory1K   thrpt    2   47628368.596          ops/s
>> ClearMemory.testClearMemory1M   thrpt    2      27388.979          ops/s
>> ClearMemory.testClearMemory24B  thrpt    2  167681010.419          ops/s
>> ClearMemory.testClearMemory2K   thrpt    2   22043948.290          ops/s
>> ClearMemory.testClearMemory32B  thrpt    2  168599498.817          ops/s
>> ClearMemory.testClearMemory32K  thrpt    2     775985.067          ops/s
>> ClearMemory.testClearMemory40B  thrpt    2  153375273.800          ops/s
>> ClearMemory.testClearMemory48B  thrpt    2  145328531.804          ops/s
>> ClearMemory.testClearMemory4K   thrpt    2    6492257.452          ops/s
>> ClearMemory.testClearMemory56B  thrpt    2  122376321.652          ops/s
>> ClearMemory.testClearMemory8K   thrpt    2    2857444.413          ops/s
>> ClearMemory.testClearMemory8M   thrpt    2       3461.674          ops/s
>> ### With Optimization:
>> Benchmark                        Mode  Cnt          Score   Error  Units
>> ClearMemory.testClearMemory16K  thrpt    2    2529701.368          ops/s
>> ClearMemory.testClearMemory1K   thrpt    2   50276682.550          ops/s
>> ClearMemory.testClearMemory1M   thrpt    2      27458.588          ops/s
>> ClearMemory.testClearMemory24B  thrpt    2  178751174.642          ops/s
>> ClearMemory.testClearMemory2K   thrpt    2   22574802.694          ops/s
>> ClearMemory.testClearMemory32B  thrpt    2  176630844.950          ops/s
>> ClearMemory.testClearMemory32K  thrpt    2    1297627.181          ops/s
>> ClearMemory.testClearMemory40B  thrpt    2  167469550.653          ops/s
>> ClearMemory.testClearMemory48B  thrpt    2  159391163.006          ops/s
>> ClearMemory.testClearMemory4K   thrpt    2    9045158.643          ops/s
>> ClearMemory.testClearMemory56B  thrpt    2  134550172.421          ops/s
>> ClearMemory.testClearMemory8K   thrpt    2    4581450.664          ops/s
>> ClearMemory.testClearMemory8M   thrpt    2       3446.834          ops/s
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review comments resolution.

Looks good to me but @vnkozlov should also finish his review before this is integrated.

-------------

Marked as reviewed by thartmann (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/1631


More information about the hotspot-compiler-dev mailing list