RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v7]

Fri Jun 30 23:08:58 UTC 2023

On Thu, 29 Jun 2023 14:54:30 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> In this simple micro benchmark:
>> 
>> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70
>> 
>> Performance drops sharply with polluted profile:
>> 
>> 
>> Benchmark                                         (typePollution)   Mode  Cnt     Score    Error   Units
>> RequireNonNullCheckcastScalability.isDuplicated1            false  thrpt   10  1453.372 ± 24.919  ops/us
>> 
>> 
>> to:
>> 
>> 
>> Benchmark                                         (typePollution)   Mode  Cnt   Score   Error   Units
>> RequireNonNullCheckcastScalability.isDuplicated1             true  thrpt   10  28.579 ± 2.280  ops/us
>> 
>> 
>> The test has 2 type checks to 2 different interfaces so caching with
>> `secondary_super_cache` doesn't help.
>> 
>> The micro-benchmark only uses 2 different concrete classes
>> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded
>> in profile data at the type checks. But c2 only take advantage of
>> profile data at type checks if they report a single class.
>> 
>> What I propose is that the full blown type check expanded in
>> `Phase::gen_subtype_check()` takes advantage of profile data. So in
>> the case of the micro benchmark, before checking the
>> `secondary_super_cache`, generated code checks whether the object
>> being type checked is a `DuplicatedContext` or a
>> `NonDuplicatedContext`.
>> 
>> This works fairly well on this micro benchmark:
>> 
>> 
>> Benchmark                                         (typePollution)   Mode  Cnt    Score    Error   Units
>> RequireNonNullCheckcastScalability.isDuplicated1             true  thrpt   10  871.224 ± 20.750  ops/us
>> 
>> 
>> It also scales much better if there are multiple threads running the
>> same test (`secondary_super_cache` doesn't scale well: see
>> JDK-8180450).
>> 
>> Now if the micro-benchmark is changed according to the comment:
>> 
>> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62
>> 
>> so the type check hits in the `secondary_super_cache`, the current
>> code performs much better:
>> 
>> 
>> Benchmark                                         (typePollution)   Mode  Cnt    Score    Error   Units
>> RequireNonNullCheckcastScalability.isDuplicated1             true  thrpt   10  871.224 ± 20.750  ops/us
>> 
>> 
>> but leveraging profiling as explained above performs even better:
>> 
>> 
>> Benchmark                   ...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision:
> 
>  - whitespace
>  - reworked change
>  - Merge branch 'master' into JDK-8308869
>  - more test failures
>  - Merge branch 'master' into JDK-8308869
>  - whitespaces
>  - test failures
>  - review
>  - 32 bit fix
>  - white spaces
>  - ... and 1 more: https://git.openjdk.org/jdk/compare/03b7aff0...101399eb

Thanks, Roland. IR shape looks much better now.

> I also still think that preventing commoning is important so some path doesn't end up with profile data from some other path. 

I took a closer look at the relevant code (in particular, I forgot that`PhaseMacroExpand::expand_subtypecheck_node` creates a dedicated copy for each user) and now agree with you that commoning between unrelated paths is undesirable. Moreover, I'm in favor of completely disabling sharing for `SubTypeCheck` node. Considering `IfNode::search_identical()` handles `SubTypeCheck` case now, I don't see much value in special handling for nodes without associated bytecode location info.

> I still think it's important to change profile collection to have data that is as accurate as possible. If the change needs to be split, I think profile collection changes should go in first.

What's the plan if we agree on adjusting profile collection? Should all the platforms be updated all at once? If not, how is it intended to work during transition period?

test/hotspot/jtreg/compiler/c2/irTests/ProfileAtTypeCheck.java line 44:

> 42:         flags.add("-XX:TypeProfileSubTypeCheckCommonThreshold=90");
> 43:         if (!Platform.is32bit()) {
> 44:             flags.add("-XX:-UseCompressedClassPointers");

What's the purpose of `-XX:-UseCompressedClassPointers` on 64-bit platforms? Make it easier to match the IR?

-------------

PR Review: https://git.openjdk.org/jdk/pull/14375#pullrequestreview-1507603807
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1248116482