RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v10]

Mon Sep 4 08:09:56 UTC 2023

On Wed, 23 Aug 2023 11:57:22 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> In this simple micro benchmark:
>> 
>> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70
>> 
>> Performance drops sharply with polluted profile:
>> 
>> 
>> Benchmark                                         (typePollution)   Mode  Cnt     Score    Error   Units
>> RequireNonNullCheckcastScalability.isDuplicated1            false  thrpt   10  1453.372 ± 24.919  ops/us
>> 
>> 
>> to:
>> 
>> 
>> Benchmark                                         (typePollution)   Mode  Cnt   Score   Error   Units
>> RequireNonNullCheckcastScalability.isDuplicated1             true  thrpt   10  28.579 ± 2.280  ops/us
>> 
>> 
>> The test has 2 type checks to 2 different interfaces so caching with
>> `secondary_super_cache` doesn't help.
>> 
>> The micro-benchmark only uses 2 different concrete classes
>> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded
>> in profile data at the type checks. But c2 only take advantage of
>> profile data at type checks if they report a single class.
>> 
>> What I propose is that the full blown type check expanded in
>> `Phase::gen_subtype_check()` takes advantage of profile data. So in
>> the case of the micro benchmark, before checking the
>> `secondary_super_cache`, generated code checks whether the object
>> being type checked is a `DuplicatedContext` or a
>> `NonDuplicatedContext`.
>> 
>> This works fairly well on this micro benchmark:
>> 
>> 
>> Benchmark                                         (typePollution)   Mode  Cnt    Score    Error   Units
>> RequireNonNullCheckcastScalability.isDuplicated1             true  thrpt   10  871.224 ± 20.750  ops/us
>> 
>> 
>> It also scales much better if there are multiple threads running the
>> same test (`secondary_super_cache` doesn't scale well: see
>> JDK-8180450).
>> 
>> Now if the micro-benchmark is changed according to the comment:
>> 
>> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62
>> 
>> so the type check hits in the `secondary_super_cache`, the current
>> code performs much better:
>> 
>> 
>> Benchmark                                         (typePollution)   Mode  Cnt    Score    Error   Units
>> RequireNonNullCheckcastScalability.isDuplicated1             true  thrpt   10  871.224 ± 20.750  ops/us
>> 
>> 
>> but leveraging profiling as explained above performs even better:
>> 
>> 
>> Benchmark                   ...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision:
> 
>  - white spaces
>  - test case fix
>  - Merge branch 'master' into JDK-8308869
>  - riscv support
>  - improvements to test
>  - Merge branch 'master' into JDK-8308869
>  - never common SubTypeCheckNode nodes
>  - keep both ways of doing profile
>  - whitespace
>  - reworked change
>  - ... and 9 more: https://git.openjdk.org/jdk/compare/e5a66903...082efa2e

Great work, Roland. The changes look good to me. I just added some minor style suggestions of things I stumbled upon when reading the code.

What's the plan for PPC, S390, 32-bit ARM, Zero support?

src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 1300:

> 1298:   }
> 1299:   Label *success_target = success;
> 1300:   Label *failure_target = failure;

Suggestion:

  Label* success_target = success;
  Label* failure_target = failure;

src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 1429:

> 1427:     Label done;
> 1428:     Label *success_target = &done;
> 1429:     Label *failure_target = stub->entry();

Suggestion:

    Label* success_target = &done;
    Label* failure_target = stub->entry();

src/hotspot/cpu/riscv/c1_LIRAssembler_riscv.cpp line 1184:

> 1182:   }
> 1183:   Label *success_target = success;
> 1184:   Label *failure_target = failure;

Suggestion:

  Label* success_target = success;
  Label* failure_target = failure;

src/hotspot/cpu/riscv/c1_LIRAssembler_riscv.cpp line 2133:

> 2131:   Label  done;
> 2132:   Label *success_target = &done;
> 2133:   Label *failure_target = stub->entry();

Suggestion:

  Label* success_target = &done;
  Label* failure_target = stub->entry();

src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 1699:

> 1697:   }
> 1698:   Label *success_target = success;
> 1699:   Label *failure_target = failure;

Suggestion:

  Label* success_target = success;
  Label* failure_target = failure;

src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 1855:

> 1853:     Label done;
> 1854:     Label *success_target = &done;
> 1855:     Label *failure_target = stub->entry();

Suggestion:

    Label* success_target = &done;
    Label* failure_target = stub->entry();

src/hotspot/cpu/x86/interp_masm_x86.cpp line 1720:

> 1718:     increment_mdp_data_at(mdp, in_bytes(CounterData::count_offset()));
> 1719:   } else {
> 1720:     int non_profiled_offset = in_bytes(CounterData::count_offset());

Unused?

src/hotspot/cpu/x86/interp_masm_x86.cpp line 1720:

> 1718:     increment_mdp_data_at(mdp, in_bytes(CounterData::count_offset()));
> 1719:   } else {
> 1720:     int non_profiled_offset = in_bytes(CounterData::count_offset());

Unused?

src/hotspot/share/opto/c2_globals.hpp line 780:

> 778:   product(intx, TypeProfileSubTypeCheckCommonThreshold, 50,                 \
> 779:           "Use profile data at type check if together profiled types"       \
> 780:           "account for more than this threshold")                           \

I would remove "together".

src/hotspot/share/opto/graphKit.cpp line 2773:

> 2771: 
> 2772:   // Gather the various success & failures here
> 2773:   RegionNode *r_not_subtype = new RegionNode(3);

Suggestion:

  RegionNode* r_not_subtype = new RegionNode(3);

src/hotspot/share/opto/graphKit.cpp line 2775:

> 2773:   RegionNode *r_not_subtype = new RegionNode(3);
> 2774:   gvn.record_for_igvn(r_not_subtype);
> 2775:   RegionNode *r_ok_subtype = new RegionNode(4);

Suggestion:

  RegionNode* r_ok_subtype = new RegionNode(4);

src/hotspot/share/opto/graphKit.cpp line 2786:

> 2784:       if (!profile.has_receiver(i)) {
> 2785:         break;
> 2786:       }

Suggestion:

    for (int i = 0; profile.has_receiver(i); ++i) {

src/hotspot/share/opto/graphKit.cpp line 2795:

> 2793:         if (!profile.has_receiver(i)) {
> 2794:           break;
> 2795:         }

Suggestion:

      for (int i = 0; profile.has_receiver(i); ++i) {

src/hotspot/share/opto/ifnode.cpp line 1598:

> 1596:   }
> 1597:   // If the comparison is a subtype check, then SubTypeCheck nodes may have profile data attached to them and may be
> 1598:   // different nodes even-though they perform the same subtype check

This comment should go to the check in line 1606 below.

src/hotspot/share/opto/subtypenode.hpp line 65:

> 63: private:
> 64:   ciMethod* _method;
> 65:   int _bci;

Please add comments describing the purpose of these fields.

-------------

Marked as reviewed by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/14375#pullrequestreview-1608854638
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314570689
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314570457
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314569724
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314569538
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314568947
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314567439
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314565568
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314567116
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314507710
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314531000
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314531157
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314509452
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314510038
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314553523
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1314497398