RFR: 8352075: Perf regression accessing fields [v5]

Mon May 19 07:13:55 UTC 2025

On Fri, 16 May 2025 23:51:39 GMT, John R Rose <jrose at openjdk.org> wrote:

>> Radim Vansa has updated the pull request incrementally with five additional commits since the last revision:
>> 
>>  - Revert change in array.hpp
>>  - Revert changes in VerifyRawIndexesTest
>>  - Improve FieldInfo::print
>>  - Load constant for SORTED_FIELD_TABLE_THRESHOLD
>>  - Replace JumpTable with SortedFieldTable
>
> This change adds smaller "fast-forward" table to accelerate random-access queries within a larger stream.
> 
> I am not against this change, but I think it should be refactored (either now or later) into a library that can be used with other streamy data.  I'm thinking of debug-info, dependency lists, line number tables, reloc-info, or maybe PC descs as future clients of such a thing.
> 
> It could be a query accelerator (index), for any kind of "streamy data" that can get long, and needs occasional random access.
> 
> It is good to switch to binary search for big tables, while preserving the speed of linear search for small ones.  A similar technique can also be seen in `PcDescContainer::find_pc_desc_internal`.  Both binary-search algorithms are complicated and difficult to validate.   If we had a suitable library, we could tune it and test it carefully, and use it to simplify field streams, PC descs, and whatever else we need to work with.
> 
> BTW, it's sometimes useful to add a front-side cache for such searches.  This is a bookmark of the last query point.  This helps with queries which are correlated; you don't re-search from scratch on every step.  This is the function of `last_pc_desc` in the PC desc query; it can accelerate the binary search.
> 
> The fast-forward table here uses an ad hoc var-int scheme which is delicate and possibly buggy.
> 
> One way to do without the specialized table is to inject synchronization markers into the streamy data itself.  Note that the null byte is never used to encode UNSIGNED5 data.  Therefore, if you have a stream of UNSIGNED5 tokens, you can add boundaries encoded null bytes.   In the present application, you could perform binary search on the raw stream pointer, with each probe first re-synchronizing to the first null byte, and then decoding with a short sprint of linear access.  The placement of null bytes is a tuning tactic:  More null bytes add footprint but decrease the length of the short sprints.
> 
> I'm not asking for any changes at this point because I have not read the PR carefully enough.  And I don't intend to hold up reviews, but I do want us to put refactoring on our roadmap.

@rose00 While refactoring might be useful, this PR is trying to address a performance regression (in fact by improving the scenario significantly), and I'd prefer that to land in 25 rather than to delay just for the sake of better 'reusability'. And TBH I don't consider binary search algorithm too difficult to validate.

While a simple cache might be useful, I don't have a good example to validate its usefulness. And turning a read-only scenario into mutating scenario can turn out badly for multithreaded code, as in https://bugs.openjdk.org/browse/JDK-8180450 .

While you suggest 0 byte as synchronization markers, I've tried to synchronize the stream in #24713 and experiments have shown that while it improves things, it was not sufficient in my case. That's why I've abandoned that approach.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24847#issuecomment-2889870320