RFR: 8316180: Thread-local backoff for secondary_super_cache updates

Aleksey Shipilev shade at openjdk.org
Wed Sep 20 14:52:34 UTC 2023


On Wed, 20 Sep 2023 14:41:52 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> Work in progress, submitting for broader attention.
>> 
>> See more details in the bug and related issues.
>> 
>> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases.
>> 
>> This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in.
>> 
>> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity.
>> 
>> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong.
>> 
>> Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments.
>> 
>> Additional testing:
>>  - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3`
>>  - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3`
>
> PPC64 implementation:
> 
> diff --git a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
> index 8942199610e..0bef1b3760a 100644
> --- a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
> +++ b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
> @@ -2021,7 +2021,28 @@ void MacroAssembler::check_klass_subtype_slow_path(Register sub_klass,
>    b(fallthru);
>  
>    bind(hit);
> -  std(super_klass, target_offset, sub_klass); // save result to cache
> +  // Success. Try to cache the super we found and proceed in triumph.
> +  uint32_t super_cache_backoff = checked_cast<uint32_t>(SecondarySuperMissBackoff);
> +  if (super_cache_backoff > 0) {
> +    Label L_skip;
> +
> +    lwz(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread);
> +    addic_(temp, temp, -1);
> +    stw(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread);
> +    bgt(CCR0, L_skip);
> +
> +    load_const_optimized(temp, super_cache_backoff);
> +    stw(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread);
> +
> +    std(super_klass, target_offset, sub_klass); // save result to cache
> +
> +    bind(L_skip);
> +    if (L_success == nullptr && result_reg == noreg) {
> +      crorc(CCR0, Assembler::equal, CCR0, Assembler::equal); // Restore CCR0 EQ
> +    }
> +  } else {
> +    std(super_klass, target_offset, sub_klass); // save result to cache
> +  }
>    if (result_reg != noreg) { li(result_reg, 0); } // load zero result (indicates a hit)
>    if (L_success != nullptr) { b(*L_success); }
>    else if (result_reg == noreg) { blr(); } // return with CR0.eq if neither label nor result reg provided
> 
> 
> Power10 results (2 cores, SMT8, 3.55 GHz):
> -XX:SecondarySuperMissBackoff=0
> 
> Benchmark                        Mode  Cnt     Score    Error  Units
> SecondarySuperCache.contended    avgt   15  1107.019 ? 16.206  ns/op
> SecondarySuperCache.uncontended  avgt   15    17.984 ?  0.164  ns/op
> 
> 
> -XX:SecondarySuperMissBackoff=10
> 
> Benchmark                        Mode  Cnt    Score   Error  Units
> SecondarySuperCache.contended    avgt   15  431.557 ? 3.690  ns/op
> SecondarySuperCache.uncontended  avgt   15   17.870 ? 0.088  ns/op
> 
> 
> -XX:SecondarySuperMissBackoff=100
> 
> Benchmark                        Mode  Cnt   Score   Error  Units
> SecondarySuperCache.contended    avgt   15  90.766 ? 0.196  ns/op
> SecondarySuperCache.uncontended  avgt   15  17.925 ? 0.239  ns/op
> 
> 
> -XX:SecondarySuperMissBackoff=1000
> 
> Benchmark                        Mode  Cnt   Score   Error  Units
> SecondarySuperCache.c...

@TheRealMDoerr: Folded in, thank you!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1727876381


More information about the hotspot-dev mailing list