RFR: 8352075: Perf regression accessing fields [v8]

Sun May 25 07:21:53 UTC 2025

On Sat, 24 May 2025 20:44:06 GMT, John R Rose <jrose at openjdk.org> wrote:

>> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add search table validation
>
> src/hotspot/share/oops/fieldInfo.cpp line 252:
> 
>> 250: int FieldInfoReader::search_table_lookup(const Array<u1> *search_table, const Symbol *name, const Symbol *signature, ConstantPool *cp, int java_fields) {
>> 251:   UNSIGNED5::Reader<const u1*, int> r2(_r.array());
>> 252:   int low = 0, high = java_fields - 1;
> 
> This is probably correct, but I recommend a few changes:
> A. add an assert `low <= high` at the beginning (defend against surprise `java_fields` number).
> B. repeat the assert after each new calculation of high or low.
> C. declare high and low and mid as `uint32_t` to prove overflow (UB) impossible without the need to reason about integral range

I looked over the code base for other occurrences of binary search, and I see your code is not an outlier.

The tendency is to use int for high/mid/low, and there are cases where UB would be possible in the absence of extra knowledge about the possible ranges of the ints.  `InstanceKlass::quick_search` is especially scary, until one realizes that method lists are probably on the order of 2^16.  `vmSymbols::find_sid` is the same story; the number of symbols is much less than 2^30..

`GrowableArray::find_sorted` also uses int indexes, but uses a `uint` cast at just the right place when computing mid, to avoid UB, so can handle arrays sized near 2^31.

With those precedents, I'm not going to pick on this code here.  The recommended changes A/B/C are optional.

But I am going to think about what it would take to reduce the number of occurences of binary switch, and build a template that can re-implement at least some of them (probably including yours) with bullet-proof, unit-tested code.

Binary search over 3/4/5-byte packed records is easy, which is what you are doing here.  The use of shortened integers is clever.  I think that can be made a template also, and then used for more than one index, as I've observed before.

One good way to think about these variable-sized packed records is to treat them as physical `uint64_t` values, with high order zero bytes removed.  If you have a word or so of slop area at the bottom of the packed array, you can load 1..8 bytes in a single (misaligned) memory operation, loading garbage into unused bytes, and then using shift or mask to clear the garbage.  That may be faster than asking C++ to do a bunch of branchy logic and byte assembly on every access.  Loading an N-byte word is simply `w=*(uint64_t*)(p+N-8)>>(N*8)`.  Your packed structs have two fields x,y, but they can be reimagined as bitfields `x=w&((1<<xlen)-1)`, `y=w>>xlen`.  This would save half a byte per entry in your tables, on average, since you can often share a byte between the two fields.  And the decode logic would be simpler than the closure-based stuff you have now.

These thoughts are just another observation about the design space you are working in, not a request for changes, although I do think they would simplify your code (and make it more efficient).  This sort of careful design makes the most sense when packaged in a separate class or template,  that can be carefully reviewed and unit-tested in isolation.  If we were to refactor some old searchable data structures onto a more modern footing, I think we would go in such directions.

An interesting property of your jump tables is that they can several kinds of indexes.  You can sort them according to any key in the field-info.  The entries are uniformly index/position (x/y) pairs.  For example, if you sort numerically (either x or y), you get a jump table for random-access into the field-info, given the field id number.  Same data, different sort order, allows acces under different search keys.  Again, this observation is not very actionable now, but it suggeste (to me) that we have a viable mechanism here for several random access problems in the VM.

Kudos to you for suggesting the idea of truncated ints combined with binary search!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2106110163