RFR: 8316180: Thread-local backoff for secondary_super_cache updates
Martin Doerr
mdoerr at openjdk.org
Wed Sep 20 14:44:51 UTC 2023
On Wed, 13 Sep 2023 14:02:19 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> Work in progress, submitting for broader attention.
>
> See more details in the bug and related issues.
>
> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases.
>
> This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in.
>
> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity.
>
> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong.
>
> Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments.
>
> Additional testing:
> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3`
> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3`
PPC64 implementation:
diff --git a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
index 8942199610e..0bef1b3760a 100644
--- a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
+++ b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
@@ -2021,7 +2021,28 @@ void MacroAssembler::check_klass_subtype_slow_path(Register sub_klass,
b(fallthru);
bind(hit);
- std(super_klass, target_offset, sub_klass); // save result to cache
+ // Success. Try to cache the super we found and proceed in triumph.
+ uint32_t super_cache_backoff = checked_cast<uint32_t>(SecondarySuperMissBackoff);
+ if (super_cache_backoff > 0) {
+ Label L_skip;
+
+ lwz(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread);
+ addic_(temp, temp, -1);
+ stw(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread);
+ bgt(CCR0, L_skip);
+
+ load_const_optimized(temp, super_cache_backoff);
+ stw(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread);
+
+ std(super_klass, target_offset, sub_klass); // save result to cache
+
+ bind(L_skip);
+ if (L_success == nullptr && result_reg == noreg) {
+ crorc(CCR0, Assembler::equal, CCR0, Assembler::equal); // Restore CCR0 EQ
+ }
+ } else {
+ std(super_klass, target_offset, sub_klass); // save result to cache
+ }
if (result_reg != noreg) { li(result_reg, 0); } // load zero result (indicates a hit)
if (L_success != nullptr) { b(*L_success); }
else if (result_reg == noreg) { blr(); } // return with CR0.eq if neither label nor result reg provided
Power10 results (2 cores, SMT8, 3.55 GHz):
-XX:SecondarySuperMissBackoff=0
Benchmark Mode Cnt Score Error Units
SecondarySuperCache.contended avgt 15 1107.019 ? 16.206 ns/op
SecondarySuperCache.uncontended avgt 15 17.984 ? 0.164 ns/op
-XX:SecondarySuperMissBackoff=10
Benchmark Mode Cnt Score Error Units
SecondarySuperCache.contended avgt 15 431.557 ? 3.690 ns/op
SecondarySuperCache.uncontended avgt 15 17.870 ? 0.088 ns/op
-XX:SecondarySuperMissBackoff=100
Benchmark Mode Cnt Score Error Units
SecondarySuperCache.contended avgt 15 90.766 ? 0.196 ns/op
SecondarySuperCache.uncontended avgt 15 17.925 ? 0.239 ns/op
-XX:SecondarySuperMissBackoff=1000
Benchmark Mode Cnt Score Error Units
SecondarySuperCache.contended avgt 15 39.803 ? 0.369 ns/op
SecondarySuperCache.uncontended avgt 15 18.070 ? 0.337 ns/op
-XX:SecondarySuperMissBackoff=10000
Benchmark Mode Cnt Score Error Units
SecondarySuperCache.contended avgt 15 34.499 ? 0.451 ns/op
SecondarySuperCache.uncontended avgt 15 17.933 ? 0.165 ns/op
-------------
PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1727871239
More information about the hotspot-dev
mailing list