RFR: 8352075: Perf regression accessing fields [v28]
    Radim Vansa 
    rvansa at openjdk.org
       
    Tue Jun 10 15:18:38 UTC 2025
    
    
  
On Tue, 10 Jun 2025 14:30:06 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:
>> But then you'd need create an array of `Pair` in `create_search_table` and copy the data into that. You wouldn't need a Supplier at all, just pass the array and its length to the `fill` method. If you are worried about the virtual calls, I could do that.
>> 
>> Alternatively, we could replace virtual calls with templated `fill` method. In that case the Comparator should be a template method as well, since that one is even more perf-critical?
>
> I think you misunderstand me, as what I'm proposing wouldn't requre creating an array in `create_search_table`.
> 
> I'm saying that you do this:
> 
> ```c++
> // Note: we require the supplier to provide the elements in the final order as we can't easily sort
> // within this method - qsort() accepts only pure function as comparator.
> void PackedTableBuilder::fill(u1* table, size_t table_length, Supplier &supplier) const {
>   Pair<uint32_t, uint32_t> kvs[4];
> 
>   size_t offset = 0;
>   size_t len_read = 0;
>   while (len = supplier.next(kvs, 4)) {
>     // len tells you how many of the full capacity of kvs was used.
>   }
> }
> 
> 
> Now each call of  `Supplier::next` gives you up to 4 elements, quartering the amount of virtual calls necessary.
> 
>>Alternatively, we could replace virtual calls with templated fill method. In that case the Comparator should be a template method as well, since that one is even more perf-critical?
> 
> Sure, you can try that out.
> 
> To be clear, I am only asking: Are virtual calls expensive enough here that not having them, or amortizing their cost, something that makes the performance better?
The earlier version of this PR was not abstracting out PackedTable* so there were no virtual calls. In both the CCC.java and undisclosed customer reproducer I don't see a significant difference in performance - these involve whole JVM startup, so the efficiency of code with linear complexity is probably under the radar. If we want to optimize the hack out of it - yes, there would be space for that, maybe at the cost of maintainability and/or reusability.
TLDR: I don't have a (micro)benchmark that would prove one thing is better than the other.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2138174164
    
    
More information about the serviceability-dev
mailing list