RFR: 8352075: Perf regression accessing fields [v3]

Radim Vansa rvansa at openjdk.org
Mon Apr 28 07:44:04 UTC 2025


> This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 .
> 
> This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields).
> 
> In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either.
> 
> My measurements on the attached reproducer
> 
> hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC'
> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC
>   Time (mean ± σ):      51.3 ms ±   2.8 ms    [User: 44.7 ms, System: 13.7 ms]
>   Range (min … max):    45.1 ms …  53.9 ms    100 runs
> 
> hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC'
> Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC
>   Time (mean ± σ):      78.2 ms ±   1.0 ms    [User: 74.6 ms, System: 17.3 ms]
>   Range (min … max):    73.8 ms …  79.7 ms    100 runs
> 
> (the jdk25-master above already contains JDK-8353175)
> 
> hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC'
> Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC
>   Time (mean ± σ):      38.5 ms ±   0.5 ms    [User: 34.4 ms, System: 17.3 ms]
>   Range (min … max):    37.7 ms …  42.1 ms    100 runs
> 
> While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement:
> 
> JDK 17: 1.6 s
> JDK 21 (no patches): 22 s
> JDK25-master: 12.3 s
> JDK25-this-pr: 0.5 s

Radim Vansa has updated the pull request incrementally with three additional commits since the last revision:

 - Fix VerifyRawIndexesTest
 - Fix reordering in layout and annotations
 - Use qsort_r for different platforms

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/24847/files
  - new: https://git.openjdk.org/jdk/pull/24847/files/fe798710..ef69ec06

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=01-02

  Stats: 89 lines in 7 files changed: 67 ins; 14 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/24847.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24847/head:pull/24847

PR: https://git.openjdk.org/jdk/pull/24847


More information about the hotspot-dev mailing list