RFR: 8283566: G1: Improve G1BarrierSet::enqueue performance [v3]

Aleksey Shipilev shade at openjdk.java.net
Wed Mar 30 10:11:36 UTC 2022


> While looking at startup/warmup benchmarks for VarHandles, I noticed that `G1BarrierSet::enqueue` is not inlined and quite hot. Its uses also load the oop first, and then check if queues are active. This could be improved a bit. This only matters for the barrier paths that VM takes, which is the case for `java.lang.invoke` VM infra. 
> 
> Not sure about `enqueue_oop` name, I did it to avoid accidental overload clash with `enqueue(T* dst)`. Open for suggestions.
> 
> On a simple test:
> 
> 
> import java.lang.invoke.MethodHandles;
> import java.lang.invoke.VarHandle;
> 
> public class VHWarmup {
> 
>     static final int SIZE = 1_000_000;
> 
>     private VarHandle[] vhs;
>     private int x;
> 
>     public static void main(String... args) throws Exception {
>         new VHWarmup().run();
>     }
> 
>     public void run() throws Exception {
>         vhs = new VarHandle[SIZE];
>         for (int c = 0; c < SIZE; c++) {
>             VarHandle vh = MethodHandles.lookup().findVarHandle(VHWarmup.class, "x", int.class);
>             vh.get(this);
>             vhs[c] = vh;
>         }
>     }
> }
> 
> 
> Baseline:
> 
> 
> $ perf stat -r 10 build/linux-x86_64-server-release/images/jdk/bin/java -Xms128m -Xmx128m VHWarmup > /dev/null
> 
>  Performance counter stats for 'build/linux-x86_64-server-release/images/jdk/bin/java -Xms128m -Xmx128m VHWarmup' (10 runs):
> 
>           1,392.13 msec task-clock                #    1.183 CPUs utilized            ( +-  0.45% )
>                887      context-switches          #    0.637 K/sec                    ( +-  1.05% )
>                  6      cpu-migrations            #    0.004 K/sec                    ( +- 14.72% )
>             31,755      page-faults               #    0.023 M/sec                    ( +-  0.17% )
>      4,708,582,477      cycles                    #    3.382 GHz                      ( +-  0.44% )  (50.46%)
>        168,230,638      stalled-cycles-frontend   #    3.57% frontend cycles idle     ( +-  2.75% )  (50.35%)
>        790,033,580      stalled-cycles-backend    #   16.78% backend cycles idle      ( +-  1.82% )  (50.03%)
>      9,094,235,238      instructions              #    1.93  insn per cycle         
>                                                   #    0.09  stalled cycles per insn  ( +-  0.22% )  (49.54%)
>      1,716,367,002      branches                  # 1232.909 M/sec                    ( +-  0.38% )  (49.65%)
>          3,908,772      branch-misses             #    0.23% of all branches          ( +-  2.89% )  (49.97%)
> 
>            1.17642 +- 0.00616 seconds time elapsed  ( +-  0.52% )
> 
> 
> Patched:
> 
> 
> $ perf stat -r 10 build/linux-x86_64-server-release/images/jdk/bin/java -Xms128m -Xmx128m VHWarmup > /dev/null
> 
>  Performance counter stats for 'build/linux-x86_64-server-release/images/jdk/bin/java -Xms128m -Xmx128m VHWarmup' (10 runs):
> 
>           1,288.69 msec task-clock                #    1.203 CPUs utilized            ( +-  0.73% )
>                860      context-switches          #    0.667 K/sec                    ( +-  1.38% )
>                  7      cpu-migrations            #    0.006 K/sec                    ( +- 11.96% )
>             31,830      page-faults               #    0.025 M/sec                    ( +-  0.13% )
>      4,331,508,315      cycles                    #    3.361 GHz                      ( +-  0.72% )  (50.01%)
>        151,638,797      stalled-cycles-frontend   #    3.50% frontend cycles idle     ( +-  2.94% )  (49.56%)
>        604,797,159      stalled-cycles-backend    #   13.96% backend cycles idle      ( +-  4.17% )  (49.70%)
>      8,517,580,297      instructions              #    1.97  insn per cycle         
>                                                   #    0.07  stalled cycles per insn  ( +-  0.50% )  (49.99%)
>      1,600,179,319      branches                  # 1241.714 M/sec                    ( +-  0.46% )  (50.44%)
>          4,129,679      branch-misses             #    0.26% of all branches          ( +-  4.39% )  (50.30%)
> 
>            1.07125 +- 0.00822 seconds time elapsed  ( +-  0.77% )
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `tier1`
>  - [x] Linux x86_64 fastdebug `tier2`
>  - [x] SPECjvm2008 shows no regressions

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:

 - Merge branch 'master' into JDK-8283566-g1-enqueue
 - enqueue_preloaded*
 - enqueue_loc
 - Merge branch 'master' into JDK-8283566-g1-enqueue
 - Cleanups
 - More complete patch
 - Inlining

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7921/files
  - new: https://git.openjdk.java.net/jdk/pull/7921/files/cdfd7891..b64f25a0

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7921&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7921&range=01-02

  Stats: 4823 lines in 168 files changed: 3654 ins; 674 del; 495 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7921.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7921/head:pull/7921

PR: https://git.openjdk.java.net/jdk/pull/7921



More information about the hotspot-gc-dev mailing list