RFC: AArch64: Implementing spin pauses with ISB
Astigeevich, Evgeny
eastig at amazon.co.uk
Wed Aug 25 21:16:29 UTC 2021
Hi Andrew,
> How application dependent is this? Does it depend on how many threads are
> contending for a lock? Are there other places (intrinsic monitors, say)
> where we should do this?
IMHO, we've only scratched the surface of it. The problem is not well modelled by existing public benchmarks.
Yes, it is application dependent at some level. In case of Thread.onSpinWait it depends on how an application implements spin loops.
Applications having spin loops with several iterations would benefit from short onSpinWait (this is what we've got in customers' benchmarks). Applications calling onSpinWait only couple times would benefit from longer onSpinWait.
"How heavy thread contention should be, what other places", these are still open questions. To answer them we need to detect the issues which is the problem itself.
What we currently use is the trial-and-error approach.
Thanks,
Evgeny
On 21/08/2021, 11:08, "hotspot-dev on behalf of Andrew Haley" <hotspot-dev-retn at openjdk.java.net on behalf of aph-open at littlepinkcloud.com> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On 8/17/21 9:42 PM, Astigeevich, Evgeny wrote:
>
>> The ISB instruction wasn't intended to be used for that purpose...
>
> It might be a time for YIELD to be a real instruction, especially on Neoverse. High thread contention is a typical situation in server workloads.
> If it would be great if Neoverse architects consider this.
Maybe, but recent experience from Intel (where the delay was changed from 20-30
clocks to 200) causing regressions in some areas, suggests it's very problematic.
>> Your experiments were with one ISB - did you experiment at all with multiple ISBs? I'm curious as to what the overall effect would be.
>
> Yes, there were experiments with 2 ISBs. With 2 ISBs the performance improvements were less. Graviton 2 performance engineers' explanation of this is that spins should target 15-30ns. One ISB allows to be within these limits. Two and more ISBs get longer spins. It increases chances of an expensive code path and the OS to reschedule threads.
How application dependent is this? Does it depend on how many threads are
contending for a lock? Are there other places (intrinsic monitors, say)
where we should do this?
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
More information about the hotspot-dev
mailing list