RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class
Roland Westrelin
roland at openjdk.org
Thu Jun 8 11:34:07 UTC 2023
In this simple micro benchmark:
https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70
Performance drops sharply with polluted profile:
Benchmark (typePollution) Mode Cnt Score Error Units
RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ± 24.919 ops/us
to:
Benchmark (typePollution) Mode Cnt Score Error Units
RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ± 2.280 ops/us
The test has 2 type checks to 2 different interfaces so caching with
`secondary_super_cache` doesn't help.
The micro-benchmark only uses 2 different concrete classes
(`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded
in profile data at the type checks. But c2 only take advantage of
profile data at type checks if they report a single class.
What I propose is that the full blown type check expanded in
`Phase::gen_subtype_check()` takes advantage of profile data. So in
the case of the micro benchmark, before checking the
`secondary_super_cache`, generated code checks whether the object
being type checked is a `DuplicatedContext` or a
`NonDuplicatedContext`.
This works fairly well on this micro benchmark:
Benchmark (typePollution) Mode Cnt Score Error Units
RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ± 20.750 ops/us
It also scales much better if there are multiple threads running the
same test (`secondary_super_cache` doesn't scale well: see
JDK-8180450).
Now if the micro-benchmark is changed according to the comment:
https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62
so the type check hits in the `secondary_super_cache`, the current
code performs much better:
Benchmark (typePollution) Mode Cnt Score Error Units
RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ± 20.750 ops/us
but leveraging profiling as explained above performs even better:
Benchmark (typePollution) Mode Cnt Score Error Units
RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 1165.474 ± 70.171 ops/us
I think it's actually likely that there's a performance advantage even
if profiling sees more than 2 types at a type check unless the profile
is heavily polluted. The problem is that, the way current profile data
is collected, we can't tell if the profile is heavily polluted
because, unlike profiling at virtual calls, there's no counter for non
recorded types. The `count` field is used to count failed type checks
instead. JVMCI added a `nonprofiled_count`. I thought about using that
one but it seems after looking at the way c2 uses the failed type
check count that it would be simpler to simply collect profile data at
type checks the way it's done at virtual calls. Indeed, C2 uses the
unique class reported by profile data only if there was no failed type
checks recorded in profile data but:
- at checkcasts, it also checks that it can prove the check would
statically fold. That last check seems to be the one that matters.
- at instanceof, AFAICT, a profiled type that causes the instanceof to
fail is as valuable as one that makes it succeed so it would be
better to ignore failures reported by profiling.
I also discussed this briefly with Tom and he said graal doesn't need
the failed type check count.
So, in the patch I propose, I changed the way profile data is
collected so it works the same it does at virtual call. If this patch
is accepted, I'll need help with platforms other than x86 and aarch64.
I also modified the JVMCI code. BTW, I also wonder if
`VirtualCallData.getMethodProfile()` is not obsolete.
Finally, I changed `Phase::gen_subtype_check()` so it emits the extra
checks. That method is now called at macro expansion when profile data
is not longer available. So I attached profile data to the
`SubTypeCheck` node. For each profile data entry, 2 edges are added:
one for the klass, one for the profile frequency.
Because `SubTypeCheck` now has extra edges, it can happen that 2
`SubTypeCheck` nodes that perform the same subtype check don't common
during `IGVN` which can get in the way of some optimizations. I had to
make some adjustments to the logic of split if and code that looks for
dominating identical checks because of that.
-------------
Commit messages:
- white spaces
- fix & test
Changes: https://git.openjdk.org/jdk/pull/14375/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8308869
Stats: 849 lines in 24 files changed: 523 ins; 232 del; 94 mod
Patch: https://git.openjdk.org/jdk/pull/14375.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/14375/head:pull/14375
PR: https://git.openjdk.org/jdk/pull/14375
More information about the hotspot-compiler-dev
mailing list