RFR: 8352075: Perf regression accessing fields [v8]

John R Rose jrose at openjdk.org
Sat May 24 20:50:59 UTC 2025


On Thu, 22 May 2025 08:21:38 GMT, Radim Vansa <rvansa at openjdk.org> wrote:

>> This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 .
>> 
>> This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields).
>> 
>> In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either.
>> 
>> My measurements on the attached reproducer
>> 
>> hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC'
>> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC
>>   Time (mean ± σ):      51.3 ms ±   2.8 ms    [User: 44.7 ms, System: 13.7 ms]
>>   Range (min … max):    45.1 ms …  53.9 ms    100 runs
>> 
>> hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC'
>> Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC
>>   Time (mean ± σ):      78.2 ms ±   1.0 ms    [User: 74.6 ms, System: 17.3 ms]
>>   Range (min … max):    73.8 ms …  79.7 ms    100 runs
>> 
>> (the jdk25-master above already contains JDK-8353175)
>> 
>> hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC'
>> Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC
>>   Time (mean ± σ):      38.5 ms ±   0.5 ms    [User: 34.4 ms, System: 17.3 ms]
>>   Range (min … max):    37.7 ms …  42.1 ms    100 runs
>> 
>> While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement:
>> 
>> JDK 17: 1.6 s
>> JDK 21 (no patches): 22 s
>> JDK25-master: 12.3 s
>> JDK25-this-pr: 0.5 s
>
> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add search table validation

Changes requested by jrose (Reviewer).

src/hotspot/share/oops/fieldInfo.cpp line 220:

> 218: }
> 219: 
> 220: #ifdef ASSERT

Thank you for adding this.

Suggest an additional check to validate the binary search:

For each name/sig pair in the field stream, call the lookup function (or its subroutine) and ensure that it steers to the right position in the field stream.

src/hotspot/share/oops/fieldInfo.cpp line 252:

> 250: int FieldInfoReader::search_table_lookup(const Array<u1> *search_table, const Symbol *name, const Symbol *signature, ConstantPool *cp, int java_fields) {
> 251:   UNSIGNED5::Reader<const u1*, int> r2(_r.array());
> 252:   int low = 0, high = java_fields - 1;

This is probably correct, but I recommend a few changes:
A. add an assert `low <= high` at the beginning (defend against surprise `java_fields` number).
B. repeat the assert after each new calculation of high or low.
C. declare high and low and mid as `uint32_t` to prove overflow (UB) impossible without the need to reason about integral range

src/hotspot/share/oops/fieldInfo.cpp line 265:

> 263:     Symbol *mid_name = cp->symbol_at(checked_cast<u2>(r2.next_uint()));
> 264:     Symbol *mid_sig = cp->symbol_at(checked_cast<u2>(r2.next_uint()));
> 265: 

This is the place to use `read_name_and_signature` as requested elsewhere.

src/hotspot/share/oops/fieldInfo.hpp line 273:

> 271:  private:
> 272:   // Don't generate the table for small classes at all.
> 273:   static const int SEARCH_TABLE_THRESHOLD = 16;

One way to create a stress test is to move this to globals.hpp, as a debug-only variable, something like `BinarySearchThreshold` (default 16).  It's OK not to name it more specifically, if (as I believe) this algorithm is destined to be pulled out and used in several places.  Individual uses can scale the number if need be.

src/hotspot/share/oops/fieldInfo.hpp line 276:

> 274: 
> 275:   static inline int search_table_position_width(int stream_length) {
> 276:     assert(stream_length <= (1 << 24), "stream too long");

This assert is OK for here and now.  The binary search widget I have in mind would be 32-bit clean, and therefore would have no surprise limits..

src/hotspot/share/oops/fieldInfo.inline.hpp line 79:

> 77:   fir.read_field_counts(&java_fields_count, &injected_fields_count);
> 78:   return java_fields_count;
> 79: }

Just as these accessors encapsulate the specific order of U5 ints in the header of the stream, I think you should also encapsulate the order of the name and signature, so that your binary-search comparator can read them without assuming their position in the U5 order.  See below.

src/hotspot/share/oops/fieldInfo.inline.hpp line 116:

> 114: }
> 115: 
> 116: inline void FieldInfoReader::read_field_info(FieldInfo& fi) {

I suggest you refactor this guy to use a new API point for the first two fields, `read_name_and_signature`.
Then you can reuse `read_name_and_signature` in the search comparator rather than use dead reckoning.

-------------

PR Review: https://git.openjdk.org/jdk/pull/24847#pullrequestreview-2866421214
PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2105924042
PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2105924465
PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2105923610
PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2105922088
PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2105922285
PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2105922865
PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2105923077


More information about the hotspot-dev mailing list