RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit

Aleksey Shipilev shade at openjdk.org
Wed Mar 27 15:56:22 UTC 2024


On Wed, 27 Mar 2024 05:58:34 GMT, Joshua Cao <duke at openjdk.org> wrote:

> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186.
> 
> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR.
> 
> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled.
> 
> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256).
> 
> Passes hotspot tier1 locally on a Linux machine.
> 
> ### Benchmarks
> 
> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance.
> 
> Baseline:
> 
> Result "org.renaissance.jdk.streams.JmhParMnemonics.run":
>   N = 25
>   mean =   3309.611 ±(99.9%) 86.699 ms/op
> 
>   Histogram, ms/op:
>     [3000.000, 3050.000) = 0
>     [3050.000, 3100.000) = 4
>     [3100.000, 3150.000) = 1
>     [3150.000, 3200.000) = 0
>     [3200.000, 3250.000) = 0
>     [3250.000, 3300.000) = 0
>     [3300.000, 3350.000) = 9
>     [3350.000, 3400.000) = 6
>     [3400.000, 3450.000) = 5
> 
>   Percentiles, ms/op:
>       p(0.0000) =   3069.910 ms/op
>      p(50.0000) =   3348.140 ms/op
>      ...

I propose we also add this benchmark that verifies barrier costs and coalescing: [ConstructorBarriers.txt](https://github.com/openjdk/jdk/files/14775850/ConstructorBarriers.txt). Maybe these also should be the IR tests. The benchmarks show that most combinations with `final`-s improve, and scalar replaced objects also still work (and probably eliminate all the barriers).

On my Graviton 3 instance:


Benchmark                                          Mode  Cnt   Score   Error  Units

# Before
ConstructorBarriers.escaping_finalFinal            avgt    9   9.097 ± 0.032  ns/op
ConstructorBarriers.escaping_finalPlain            avgt    9   9.120 ± 0.101  ns/op
ConstructorBarriers.escaping_finalVolatile         avgt    9  11.590 ± 0.088  ns/op
ConstructorBarriers.escaping_plainFinal            avgt    9   9.113 ± 0.037  ns/op
ConstructorBarriers.escaping_plainPlain            avgt    9   7.627 ± 0.155  ns/op
ConstructorBarriers.escaping_plainVolatile         avgt    9  13.055 ± 0.180  ns/op
ConstructorBarriers.escaping_volatileFinal         avgt    9  10.650 ± 0.112  ns/op
ConstructorBarriers.escaping_volatilePlain         avgt    9  13.074 ± 0.156  ns/op
ConstructorBarriers.escaping_volatileVolatile      avgt    9  13.546 ± 0.100  ns/op

ConstructorBarriers.non_escaping_finalFinal        avgt    9   2.220 ± 0.006  ns/op
ConstructorBarriers.non_escaping_finalPlain        avgt    9   2.214 ± 0.014  ns/op
ConstructorBarriers.non_escaping_finalVolatile     avgt    9   2.232 ± 0.035  ns/op
ConstructorBarriers.non_escaping_plainFinal        avgt    9   2.222 ± 0.004  ns/op
ConstructorBarriers.non_escaping_plainPlain        avgt    9   2.234 ± 0.036  ns/op
ConstructorBarriers.non_escaping_plainVolatile     avgt    9   2.230 ± 0.019  ns/op
ConstructorBarriers.non_escaping_volatileFinal     avgt    9   2.232 ± 0.018  ns/op
ConstructorBarriers.non_escaping_volatilePlain     avgt    9   2.220 ± 0.033  ns/op
ConstructorBarriers.non_escaping_volatileVolatile  avgt    9   2.232 ± 0.019  ns/op

# After
ConstructorBarriers.escaping_finalFinal            avgt    9   5.939 ± 0.035  ns/op  ; improves
ConstructorBarriers.escaping_finalPlain            avgt    9   5.945 ± 0.033  ns/op  ; improves
ConstructorBarriers.escaping_finalVolatile         avgt    9  10.997 ± 0.050  ns/op  ; improves
ConstructorBarriers.escaping_plainFinal            avgt    9   5.923 ± 0.061  ns/op  ; improves
ConstructorBarriers.escaping_plainPlain            avgt    9   7.687 ± 0.101  ns/op
ConstructorBarriers.escaping_plainVolatile         avgt    9  13.039 ± 0.206  ns/op
ConstructorBarriers.escaping_volatileFinal         avgt    9  10.568 ± 0.104  ns/op
ConstructorBarriers.escaping_volatilePlain         avgt    9  13.061 ± 0.158  ns/op
ConstructorBarriers.escaping_volatileVolatile      avgt    9  13.572 ± 0.174  ns/op

ConstructorBarriers.non_escaping_finalFinal        avgt    9   2.212 ± 0.019  ns/op
ConstructorBarriers.non_escaping_finalPlain        avgt    9   2.231 ± 0.041  ns/op
ConstructorBarriers.non_escaping_finalVolatile     avgt    9   2.239 ± 0.045  ns/op
ConstructorBarriers.non_escaping_plainFinal        avgt    9   2.224 ± 0.018  ns/op
ConstructorBarriers.non_escaping_plainPlain        avgt    9   2.214 ± 0.024  ns/op
ConstructorBarriers.non_escaping_plainVolatile     avgt    9   2.226 ± 0.029  ns/op
ConstructorBarriers.non_escaping_volatileFinal     avgt    9   2.239 ± 0.029  ns/op
ConstructorBarriers.non_escaping_volatilePlain     avgt    9   2.230 ± 0.039  ns/op
ConstructorBarriers.non_escaping_volatileVolatile  avgt    9   2.235 ± 0.030  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2023118942


More information about the hotspot-compiler-dev mailing list