RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v7]
Vladimir Ivanov
vlivanov at openjdk.org
Fri Jun 30 23:08:58 UTC 2023
On Thu, 29 Jun 2023 14:54:30 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> In this simple micro benchmark:
>>
>> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70
>>
>> Performance drops sharply with polluted profile:
>>
>>
>> Benchmark (typePollution) Mode Cnt Score Error Units
>> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ± 24.919 ops/us
>>
>>
>> to:
>>
>>
>> Benchmark (typePollution) Mode Cnt Score Error Units
>> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ± 2.280 ops/us
>>
>>
>> The test has 2 type checks to 2 different interfaces so caching with
>> `secondary_super_cache` doesn't help.
>>
>> The micro-benchmark only uses 2 different concrete classes
>> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded
>> in profile data at the type checks. But c2 only take advantage of
>> profile data at type checks if they report a single class.
>>
>> What I propose is that the full blown type check expanded in
>> `Phase::gen_subtype_check()` takes advantage of profile data. So in
>> the case of the micro benchmark, before checking the
>> `secondary_super_cache`, generated code checks whether the object
>> being type checked is a `DuplicatedContext` or a
>> `NonDuplicatedContext`.
>>
>> This works fairly well on this micro benchmark:
>>
>>
>> Benchmark (typePollution) Mode Cnt Score Error Units
>> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ± 20.750 ops/us
>>
>>
>> It also scales much better if there are multiple threads running the
>> same test (`secondary_super_cache` doesn't scale well: see
>> JDK-8180450).
>>
>> Now if the micro-benchmark is changed according to the comment:
>>
>> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62
>>
>> so the type check hits in the `secondary_super_cache`, the current
>> code performs much better:
>>
>>
>> Benchmark (typePollution) Mode Cnt Score Error Units
>> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ± 20.750 ops/us
>>
>>
>> but leveraging profiling as explained above performs even better:
>>
>>
>> Benchmark ...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision:
>
> - whitespace
> - reworked change
> - Merge branch 'master' into JDK-8308869
> - more test failures
> - Merge branch 'master' into JDK-8308869
> - whitespaces
> - test failures
> - review
> - 32 bit fix
> - white spaces
> - ... and 1 more: https://git.openjdk.org/jdk/compare/03b7aff0...101399eb
Thanks, Roland. IR shape looks much better now.
> I also still think that preventing commoning is important so some path doesn't end up with profile data from some other path.
I took a closer look at the relevant code (in particular, I forgot that`PhaseMacroExpand::expand_subtypecheck_node` creates a dedicated copy for each user) and now agree with you that commoning between unrelated paths is undesirable. Moreover, I'm in favor of completely disabling sharing for `SubTypeCheck` node. Considering `IfNode::search_identical()` handles `SubTypeCheck` case now, I don't see much value in special handling for nodes without associated bytecode location info.
> I still think it's important to change profile collection to have data that is as accurate as possible. If the change needs to be split, I think profile collection changes should go in first.
What's the plan if we agree on adjusting profile collection? Should all the platforms be updated all at once? If not, how is it intended to work during transition period?
test/hotspot/jtreg/compiler/c2/irTests/ProfileAtTypeCheck.java line 44:
> 42: flags.add("-XX:TypeProfileSubTypeCheckCommonThreshold=90");
> 43: if (!Platform.is32bit()) {
> 44: flags.add("-XX:-UseCompressedClassPointers");
What's the purpose of `-XX:-UseCompressedClassPointers` on 64-bit platforms? Make it easier to match the IR?
-------------
PR Review: https://git.openjdk.org/jdk/pull/14375#pullrequestreview-1507603807
PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1248116482
More information about the hotspot-compiler-dev
mailing list