RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5]

Martin Doerr mdoerr at openjdk.org
Fri May 16 09:36:12 UTC 2025


On Fri, 16 May 2025 07:44:53 GMT, Roberto Castañeda Lozano <rcastanedalo at openjdk.org> wrote:

>> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335).
>> 
>> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands:
>> 
>> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753)
>> 
>> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates.
>> 
>> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks.
>> 
>> #### Testing
>> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode).
>
> Roberto Castañeda Lozano has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Replace control type with PhaseCFG::is_CFG test

Thanks for implementing it and thanks for the ping. It basically works on PPC64, but one IR rule is failing:

Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "static java.lang.Object compiler.gcbarriers.TestImplicitNullChecks.testLoadVolatile(compiler.gcbarriers.TestImplicitNullChecks$OuterWithVolatileField)" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={FINAL_CODE}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#NULL_CHECK#_", "1"}, applyIfPlatformOr={}, applyIfPlatform={"aarch64", "false"}, failOn={}, applyIfOr={"UseZGC", "true", "UseG1GC", "true"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "Final Code":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\d+(\s){2}(NullCheck.*)+(\s){2}===.*)"
           - Failed comparison: [found] 0 = 1 [given]
           - No nodes matched!


This is probably because PPC64 uses a membar_volatile before volatile load, so the graph looks differently:

         33  Prolog  ===  [[ ]]   [2380000000033]
          9  MachProj  === 10  [[ 8 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85)
 R3      11  MachProj  === 10  [[ 8 26 ]] #5  Oop:compiler/gcbarriers/TestImplicitNullChecks$OuterWithVolatileField * !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85)
         12  MachProj  === 10  [[ 4 17 ]] #1/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85)
         13  MachProj  === 10  [[ 4 21 ]] #2/unmatched  Memory: @BotPTR *+bot, idx=Bot; !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85)
 R1      14  MachProj  === 10  [[ 4 2 17 ]] #3 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85)
         15  MachProj  === 10  [[ 4 17 ]] #4 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85)
          0  Con  === 10  [[ ]]  #top
          8  zeroCheckP_reg_imm0  === 9 11  [[ 7 22 ]] P=0.000001, C=-1.000000 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85)

BB#002:
         31  Region  === 31 22  [[ 31 21 26 ]] 
         21  membar_volatile  === 31 0 13 0 0  [[ 20 23 ]]  !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85)
         20  MachProj  === 21  [[ 19 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85)
         23  MachProj  === 21  [[ 19 26 ]] #2/unmatched  Memory: @BotPTR *+bot, idx=Bot; !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85)
 R15     26  loadN_ac  === 31 23 11  [[ 25 19 ]] #12/0x000000000000000c Volatile!narrowoop: java/lang/Object *
         19  unnecessary_membar_acquire  === 20 0 23 0 0  |26 0  [[ 18 24 ]]  !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85)
         18  MachProj  === 19  [[ 17 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85)
         24  MachProj  === 19  [[ 17 ]] #2/unmatched  Memory: @BotPTR *+bot, idx=Bot; !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85)
 R3      25  decodeN_unscaled  === _ 26  [[ 17 ]] java/lang/Object *  Oop:java/lang/Object * !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85)
         34  Epilog  ===  [[ ]]   [2380000000034]
         17  Ret  === 18 12 24 14 15 25  [[ 1 ]] 

BB#003:
         30  Region  === 30 7  [[ 30 4 ]] 
 R3      16  loadConI16  === 1  [[ 4 ]] #-10/0xfffffff6
          6  ConP  === 10  [[ 4 ]]  #null
          4  CallStaticJavaDirect  === 30 12 13 14 15 16 0 6  [[ 5 3 32 ]] Static wrapper for: uncommon_trap(reason='null_check' action='maybe_recompile') # void ( int ) C=0.000100 TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) reexecute !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85)
          5  MachProj  === 4  [[ ]] #10005/fat
          3  MachProj  === 4  [[ 2 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85)
 R14     32  MachProj  === 4  [[ ]] #6/fat
          2  ShouldNotReachHere  === 3 0 0 14 0  [[ 1 ]]  !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85)


I guess it's not worth stepping over the memory barrier. Disabling this rule for PPC64 should be ok, too.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2886188487


More information about the hotspot-gc-dev mailing list