RFR: 8352075: Perf regression accessing fields [v21]

Radim Vansa rvansa at openjdk.org
Thu Jun 5 21:25:56 UTC 2025


On Tue, 3 Jun 2025 07:16:47 GMT, Radim Vansa <rvansa at openjdk.org> wrote:

>> This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 .
>> 
>> This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields).
>> 
>> In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either.
>> 
>> My measurements on the attached reproducer
>> 
>> hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC'
>> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC
>>   Time (mean ± σ):      51.3 ms ±   2.8 ms    [User: 44.7 ms, System: 13.7 ms]
>>   Range (min … max):    45.1 ms …  53.9 ms    100 runs
>> 
>> hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC'
>> Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC
>>   Time (mean ± σ):      78.2 ms ±   1.0 ms    [User: 74.6 ms, System: 17.3 ms]
>>   Range (min … max):    73.8 ms …  79.7 ms    100 runs
>> 
>> (the jdk25-master above already contains JDK-8353175)
>> 
>> hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC'
>> Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC
>>   Time (mean ± σ):      38.5 ms ±   0.5 ms    [User: 34.4 ms, System: 17.3 ms]
>>   Range (min … max):    37.7 ms …  42.1 ms    100 runs
>> 
>> While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement:
>> 
>> JDK 17: 1.6 s
>> JDK 21 (no patches): 22 s
>> JDK25-master: 12.3 s
>> JDK25-this-pr: 0.5 s
>
> Radim Vansa has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - Moved jtreg test
>  - Improved documentation
>  - Fix coding style (asterisk placement)

> Reading further, I see what this mapping is and intentionally generalized. I guess a comment like, the key and value are sized to represent the maximum value for each and then compacted, or something like that. But maybe I haven't figured out the packing. Are they increments of u1, u2 or u4 or something in between?

>From the constructor that accepts the maximum number for each we figure out the number of bits required to store those numbers. Imagine that only as bits (not aligned to byte boundary... yet). Then we concatenate the bits for key and value, and then 'add' 1-7 padding zeroes (high-order bits) to align on bytes. So in the end we have each element in the table consuming 1-8 bytes (case with 0 bits for both key and value is ruled out). In case of the fields search table, the key will be at most the stream length (which in turn has at most 65536 fields and each occupies up to 8 varints which is <= 40 bytes, though normally far less) and value is at most 65535 (# fields in class) so this is 1-5 bytes.

I thought that the presence of `max_key` and `max_value` in constructors along with the comment on `PackedTableBase` saying

> Each element consists of **up to** 32-bit key, and **up to** 32-bit value; these are **packed into a bit-record** with 1-byte alignment.

pretty much says that - shall I place extra comments somewhere else?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24847#issuecomment-2946329561


More information about the serviceability-dev mailing list