RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v10]

Fri Oct 13 03:49:25 UTC 2023

On Thu, 12 Oct 2023 14:48:35 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Work in progress, submitting for broader attention.
>> 
>> See more details in the bug and related issues.
>> 
>> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases.
>> 
>> This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in.
>> 
>> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity.
>> 
>> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong.
>> 
>> Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments.
>> 
>> Additional testing:
>>  - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3`
>>  - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Touchup benchmark metadata

Hello, I'm Nahida from **Derek’s** team.  We've conducted extensive testing on the patch using the specified test cases for both Shipilev and Mulugeta. We observed a similar trend across both benchmarks: as the number of SecondarySuperMissBackoff increases, the average time decreases. We ran both our Mulugeta test as well as the JMH test supplied in the patch with a larger machine with thread counts of 18, 60, and 240. For the Mulugeta’s test case, we explored three different scenarios:

• Equal Distribution (50% each): This represents the worst-case scenario.
• Exclusive Interface Calls (0%): Signifying the best-case scenario.
• 95% Same Interface, 5% Other Interface: Where 95% of the time is spent calling the same interface and 5% of the time on the other one.

Here, sharing **Derek’s** perspective on this data:

“We see that it takes very little contention (5%) for the default behavior to perform poorly, and in the uncontended case there is no downside for using a large Backoff value. So backoff values of 1,000 or even 10,000 seem reasonable. This makes sense, because in the perfect world for the secondary supercache there is no update to the secondary supercache. Note in a previous version of the Mulugeta benchmark we tried increasing the length of the interface array being searched, and it made little performance impact until the interface depth got silly (100+). HW prefetch, OOO cores etc can chew through an array search pretty well once they get started.”
[JDK-8180450_Secondary-super-cache_8316180-patch.xlsx](https://github.com/openjdk/jdk/files/12889239/JDK-8180450_Secondary-super-cache_8316180-patch.xlsx)

I've attached the Excel file for your reference. If require any additional information or specific details, please feel free to let me know.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1760714959