RFR: 8349452: Fix performance regression for Arrays.fill() with AVX512 [v10]

Srinivas Vamsi Parasa sparasa at openjdk.org
Thu Dec 4 02:44:51 UTC 2025


> The goal of this PR is to fix the performance regression in Arrays.fill() x86 stubs caused by masked AVX stores. The fix is to replace the masked AVX stores with store instructions without masks (i.e. unmasked stores). `fill32_masked()` and `fill64_masked()` stubs are replaced with `fill32_unmasked()` and `fill64_unmasked()` respectively.
> 
> To speedup unmasked stores, array fills for sizes < 64 bytes are broken down into sequences of 32B, 16B, 8B, 4B, 2B and 1B stores, depending on the size.
> 
> 
> ### **Performance comparison for byte array fills in a loop for 1 million times**
> 
> 
> UseAVX=3   ByteArray Size | +OptimizeFill    (Masked store   stub)     [secs] | -OptimizeFill   (No stub)   [secs] | --->This PR: +OptimizeFill   (Unmasked store   stub)   [secs]
> -- | -- | -- | --
> 1 | 0.46 | 0.14 | 0.185
> 2 | 0.46 | 0.16 | 0.195
> 3 | 0.46 | 0.176 | 0.199
> 4 | 0.46 | 0.244 | 0.207
> 5 | 0.46 | 0.29 | 0.32
> 10 | 0.46 | 0.58 | 0.303
> 15 | 0.46 | 0.42 | 0.271
> 16 | 0.46 | 0.46 | 0.32
> 17 | 0.21 | 0.5 | 0.299
> 20 | 0.21 | 0.37 | 0.299
> 25 | 0.21 | 0.59 | 0.282
> 31 | 0.21 | 0.53 | 0.273
> 32 | 0.21 | 0.58 | 0.199
> 35 | 0.5 | 0.77 | 0.259
> 40 | 0.5 | 0.61 | 0.33
> 45 | 0.5 | 0.52 | 0.281
> 48 | 0.5 | 0.66 | 0.32
> 49 | 0.22 | 0.69 | 0.3
> 50 | 0.22 | 0.78 | 0.3
> 55 | 0.22 | 0.67 | 0.292
> 60 | 0.22 | 0.67 | 0.3293
> 64 | 0.22 | 0.82 | 0.23
> 70 | 0.51 | 1.1 | 0.34
> 80 | 0.49 | 0.89 | 0.365
> 90 | 0.225 | 0.68 | 0.33
> 100 | 0.54 | 1.09 | 0.347
> 110 | 0.6 | 0.98 | 0.36
> 120 | 0.26 | 0.75 | 0.386
> 128 | 0.266 | 1.1 | 0.289

Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision:

 - Fix for failing tests; keep dest pointer unchanged
 - Merge branch 'master' of https://git.openjdk.java.net/jdk into fill_array
 - Merge branch 'master' of https://git.openjdk.java.net/jdk into fill_array
 - fix missing array length updates for size=1
 - revert to jccb in one place
 - remove all masked stores altogether
 - fastpath for size <= 4 bytes
 - Merge branch 'master' of https://git.openjdk.java.net/jdk into fill_array
 - undo jccb to jcc change as needed
 - refactor code to use fill32_tail at the end of the stub
 - ... and 2 more: https://git.openjdk.org/jdk/compare/eda35725...f54dfd78

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28442/files
  - new: https://git.openjdk.org/jdk/pull/28442/files/d3724b88..f54dfd78

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28442&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28442&range=08-09

  Stats: 20479 lines in 460 files changed: 11197 ins; 6643 del; 2639 mod
  Patch: https://git.openjdk.org/jdk/pull/28442.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28442/head:pull/28442

PR: https://git.openjdk.org/jdk/pull/28442


More information about the hotspot-dev mailing list