RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit
Joshua Cao
duke at openjdk.org
Tue Apr 2 06:40:59 UTC 2024
On Wed, 27 Mar 2024 15:52:37 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186.
>>
>> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR.
>>
>> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled.
>>
>> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256).
>>
>> Passes hotspot tier1 locally on a Linux machine.
>>
>> ### Benchmarks
>>
>> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance.
>>
>> Baseline:
>>
>> Result "org.renaissance.jdk.streams.JmhParMnemonics.run":
>> N = 25
>> mean = 3309.611 ±(99.9%) 86.699 ms/op
>>
>> Histogram, ms/op:
>> [3000.000, 3050.000) = 0
>> [3050.000, 3100.000) = 4
>> [3100.000, 3150.000) = 1
>> [3150.000, 3200.000) = 0
>> [3200.000, 3250.000) = 0
>> [3250.000, 3300.000) = 0
>> [3300.000, 3350.000) = 9
>> [3350.000, 3400.000) = 6
>> [3400.000, 3450.000) = 5
>>
>> Percentiles, ms/op:
>> p(0...
>
> I propose we also add this benchmark that verifies barrier costs and coalescing: [ConstructorBarriers.txt](https://github.com/openjdk/jdk/files/14775850/ConstructorBarriers.txt). Maybe these also should be the IR tests. The benchmarks show that most combinations with `final`-s improve, and scalar replaced objects also still work (and probably eliminate all the barriers).
>
> On my Graviton 3 instance:
>
>
> Benchmark Mode Cnt Score Error Units
>
> # Before
> ConstructorBarriers.escaping_finalFinal avgt 9 9.097 ± 0.032 ns/op
> ConstructorBarriers.escaping_finalPlain avgt 9 9.120 ± 0.101 ns/op
> ConstructorBarriers.escaping_finalVolatile avgt 9 11.590 ± 0.088 ns/op
> ConstructorBarriers.escaping_plainFinal avgt 9 9.113 ± 0.037 ns/op
> ConstructorBarriers.escaping_plainPlain avgt 9 7.627 ± 0.155 ns/op
> ConstructorBarriers.escaping_plainVolatile avgt 9 13.055 ± 0.180 ns/op
> ConstructorBarriers.escaping_volatileFinal avgt 9 10.650 ± 0.112 ns/op
> ConstructorBarriers.escaping_volatilePlain avgt 9 13.074 ± 0.156 ns/op
> ConstructorBarriers.escaping_volatileVolatile avgt 9 13.546 ± 0.100 ns/op
>
> ConstructorBarriers.non_escaping_finalFinal avgt 9 2.220 ± 0.006 ns/op
> ConstructorBarriers.non_escaping_finalPlain avgt 9 2.214 ± 0.014 ns/op
> ConstructorBarriers.non_escaping_finalVolatile avgt 9 2.232 ± 0.035 ns/op
> ConstructorBarriers.non_escaping_plainFinal avgt 9 2.222 ± 0.004 ns/op
> ConstructorBarriers.non_escaping_plainPlain avgt 9 2.234 ± 0.036 ns/op
> ConstructorBarriers.non_escaping_plainVolatile avgt 9 2.230 ± 0.019 ns/op
> ConstructorBarriers.non_escaping_volatileFinal avgt 9 2.232 ± 0.018 ns/op
> ConstructorBarriers.non_escaping_volatilePlain avgt 9 2.220 ± 0.033 ns/op
> ConstructorBarriers.non_escaping_volatileVolatile avgt 9 2.232 ± 0.019 ns/op
>
> # After
> ConstructorBarriers.escaping_finalFinal avgt 9 5.939 ± 0.035 ns/op ; improves
> ConstructorBarriers.escaping_finalPlain avgt 9 5.945 ± 0.033 ns/op ; improves
> ConstructorBarriers.escaping_finalVolatile avgt 9 10.997 ± 0.050 ns/op ; improves
> ConstructorBarriers.escaping_plainFinal avgt 9 5.923 ± 0.061 ns/op ; improves
> ConstructorBarriers.escaping_plainPlain avgt 9 7.687 ± 0.101 ns/op
> ConstructorB...
Benchmark results on my graviton instances see similar improvements to @shipilev 's
Before:
Benchmark Mode Cnt Score Error Units
ConstructorBarriers.escaping_finalFinal avgt 3 9.229 ± 1.101 ns/op
ConstructorBarriers.escaping_finalPlain avgt 3 9.150 ± 0.191 ns/op
ConstructorBarriers.escaping_finalVolatile avgt 3 11.542 ± 1.259 ns/op
ConstructorBarriers.escaping_plainFinal avgt 3 9.132 ± 0.261 ns/op
ConstructorBarriers.escaping_plainPlain avgt 3 7.610 ± 0.575 ns/op
ConstructorBarriers.escaping_plainVolatile avgt 3 13.024 ± 0.460 ns/op
ConstructorBarriers.escaping_volatileFinal avgt 3 10.697 ± 1.567 ns/op
ConstructorBarriers.escaping_volatilePlain avgt 3 13.156 ± 0.593 ns/op
ConstructorBarriers.escaping_volatileVolatile avgt 3 13.707 ± 0.742 ns/op
ConstructorBarriers.non_escaping_finalFinal avgt 3 2.218 ± 0.299 ns/op
ConstructorBarriers.non_escaping_finalPlain avgt 3 2.243 ± 0.124 ns/op
ConstructorBarriers.non_escaping_finalVolatile avgt 3 2.227 ± 0.032 ns/op
ConstructorBarriers.non_escaping_plainFinal avgt 3 2.226 ± 0.208 ns/op
ConstructorBarriers.non_escaping_plainPlain avgt 3 2.229 ± 0.112 ns/op
ConstructorBarriers.non_escaping_plainVolatile avgt 3 2.239 ± 0.400 ns/op
ConstructorBarriers.non_escaping_volatileFinal avgt 3 2.255 ± 0.259 ns/op
ConstructorBarriers.non_escaping_volatilePlain avgt 3 2.206 ± 0.098 ns/op
ConstructorBarriers.non_escaping_volatileVolatile avgt 3 2.203 ± 0.099 ns/op
After:
Benchmark Mode Cnt Score Error Units
ConstructorBarriers.escaping_finalFinal avgt 3 5.919 ± 0.787 ns/op
ConstructorBarriers.escaping_finalPlain avgt 3 5.949 ± 0.117 ns/op
ConstructorBarriers.escaping_finalVolatile avgt 3 10.947 ± 1.353 ns/op
ConstructorBarriers.escaping_plainFinal avgt 3 5.897 ± 0.039 ns/op
ConstructorBarriers.escaping_plainPlain avgt 3 7.737 ± 3.529 ns/op
ConstructorBarriers.escaping_plainVolatile avgt 3 13.182 ± 0.289 ns/op
ConstructorBarriers.escaping_volatileFinal avgt 3 10.951 ± 0.535 ns/op
ConstructorBarriers.escaping_volatilePlain avgt 3 13.086 ± 0.258 ns/op
ConstructorBarriers.escaping_volatileVolatile avgt 3 13.765 ± 2.114 ns/op
ConstructorBarriers.non_escaping_finalFinal avgt 3 2.234 ± 0.064 ns/op
ConstructorBarriers.non_escaping_finalPlain avgt 3 2.226 ± 0.298 ns/op
ConstructorBarriers.non_escaping_finalVolatile avgt 3 2.212 ± 0.085 ns/op
ConstructorBarriers.non_escaping_plainFinal avgt 3 2.214 ± 0.033 ns/op
ConstructorBarriers.non_escaping_plainPlain avgt 3 2.226 ± 0.114 ns/op
ConstructorBarriers.non_escaping_plainVolatile avgt 3 2.220 ± 0.042 ns/op
ConstructorBarriers.non_escaping_volatileFinal avgt 3 2.244 ± 0.146 ns/op
ConstructorBarriers.non_escaping_volatilePlain avgt 3 2.235 ± 0.083 ns/op
ConstructorBarriers.non_escaping_volatileVolatile avgt 3 2.230 ± 0.056 ns/op
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2031186738
More information about the hotspot-compiler-dev
mailing list