RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v10]

Sat Oct 14 09:58:40 UTC 2023

On Fri, 13 Oct 2023 03:43:26 GMT, nahidasu <duke at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Touchup benchmark metadata
>
> Hello, I'm Nahida from **Derek’s** team.  We've conducted extensive testing on the patch using the specified test cases for both Shipilev and Mulugeta. We observed a similar trend across both benchmarks: as the number of SecondarySuperMissBackoff increases, the average time decreases. We ran both our Mulugeta test as well as the JMH test supplied in the patch with a larger machine with thread counts of 18, 60, and 240. For the Mulugeta’s test case, we explored three different scenarios:
> 
> • Equal Distribution (50% each): This represents the worst-case scenario.
> • Exclusive Interface Calls (0%): Signifying the best-case scenario.
> • 95% Same Interface, 5% Other Interface: Where 95% of the time is spent calling the same interface and 5% of the time on the other one.
> 
> Here, sharing **Derek’s** perspective on this data:
> 
> “We see that it takes very little contention (5%) for the default behavior to perform poorly, and in the uncontended case there is no downside for using a large Backoff value. So backoff values of 1,000 or even 10,000 seem reasonable. This makes sense, because in the perfect world for the secondary supercache there is no update to the secondary supercache. Note in a previous version of the Mulugeta benchmark we tried increasing the length of the interface array being searched, and it made little performance impact until the interface depth got silly (100+). HW prefetch, OOO cores etc can chew through an array search pretty well once they get started.”
> [JDK-8180450_Secondary-super-cache_8316180-patch.xlsx](https://github.com/openjdk/jdk/files/12889239/JDK-8180450_Secondary-super-cache_8316180-patch.xlsx)
> 
> 
> I've attached the Excel file for your reference. If require any additional information or specific details, please feel free to let me know.

@nahidasu 

Hi, I have quickly looked at the data of the benchmark from Mulugeta and I see it stop at C1 comp level.
I believe we should have a benchmark where C2 kicks-in too, because it likely won't have the amortization due to the stub call, hence increasing the contention (and by consequence can draw different conclusions/give another data point on the effectiveness of the backoff values).

The downside of using C2 is instead due to a nice improvement made by @rwestrel ie https://github.com/openjdk/jdk/pull/14375, which can remove (I didn't checked the Mulugeta benchmarking code yet) some of the checks thanks to a bimorphic guard check; meaning we should pollute the type profile till making sure C2 fully perform the type check, and keep on using the last stable bet.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1762782206