RFR: Load coops base shift from AOTRuntimeConstants in AOT code [v5]
Andrew Dinn
adinn at openjdk.org
Thu Sep 26 13:27:50 UTC 2024
On Thu, 26 Sep 2024 10:52:02 GMT, Andrew Dinn <adinn at openjdk.org> wrote:
>> This PR modifies AOT compiled method code to load compressed oops base and shift constants via the AOTRuntimeConstants area rather than encode them as immediates. It also unlatches the currently forced setting of UseCompatibleCompressedOops, allowing the heap to be allocated wherever it will fit.
>
> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision:
>
> Use the force to wrangle register sets
I tested any effects on performance by running the javac new workflow benchmark on my Linux/M2(aarch64) release build. Each timing is an average over 10 runs with runs for the different cases interleaved to amortize variation thanks to external factors. I ran the benchmark 3 times and found that the timings were not very consistent, leaving little room to pin anything down to the patch.
jvm1 = without coops
jvm2 = with coops patch
Run 1:
============================== jvm1 ============================
[1_xoff] Premain JDK (CDS disabled) 261.08 ms
[1_xon ] Premain JDK (CDS enabled) 144.01 ms
[1_td ] Premain Prototype (CDS + Training Data) 127.42 ms
[1_aot ] Premain Prototype (CDS + Training Data + AOT) 78.80 ms
============================== jvm2 ============================
[2_xoff] Premain JDK (CDS disabled) 256.45 ms
[2_xon ] Premain JDK (CDS enabled) 145.14 ms
[2_td ] Premain Prototype (CDS + Training Data) 125.20 ms
[2_aot ] Premain Prototype (CDS + Training Data + AOT) 80.53 ms
================================================================
Run 2:
============================== jvm1 ============================
[1_xoff] Premain JDK (CDS disabled) 261.16 ms
[1_xon ] Premain JDK (CDS enabled) 149.71 ms
[1_td ] Premain Prototype (CDS + Training Data) 134.25 ms
[1_aot ] Premain Prototype (CDS + Training Data + AOT) 84.16 ms
============================== jvm2 ============================
[2_xoff] Premain JDK (CDS disabled) 262.24 ms
[2_xon ] Premain JDK (CDS enabled) 160.62 ms
[2_td ] Premain Prototype (CDS + Training Data) 134.09 ms
[2_aot ] Premain Prototype (CDS + Training Data + AOT) 85.02 ms
================================================================
Run 3:
============================== jvm1 ============================
[1_xoff] Premain JDK (CDS disabled) 266.63 ms
[1_xon ] Premain JDK (CDS enabled) 152.51 ms
[1_td ] Premain Prototype (CDS + Training Data) 131.13 ms
[1_aot ] Premain Prototype (CDS + Training Data + AOT) 89.70 ms
============================== jvm2 ============================
[2_xoff] Premain JDK (CDS disabled) 257.33 ms
[2_xon ] Premain JDK (CDS enabled) 147.52 ms
[2_td ] Premain Prototype (CDS + Training Data) 128.13 ms
[2_aot ] Premain Prototype (CDS + Training Data + AOT) 86.08 ms
================================================================
This is running on bare metal with 64GB of RAM. So, the heap is allocated at 0x400000000 in boh jvm1 and jvm2 i.e. non-AOT code uses a zero base with shift 3. Only the 2_aot case can be affected by the patch but there is no noticeable difference wrt to 1_aot modulo the existing variation in results.
I reran with -Xms24M, allowing jvm2 to allocate the heap in low memory i.e. with zero for both shift and base (jvm1 still is still forced to allocate at an address > 4GB and use a shift). If anything this could only improve performance for all jvm2 cases relative to jvm1 (but less so for the 2_aot case).
Run 1:
============================== jvm1 ============================
[1_xoff] Premain JDK (CDS disabled) 273.05 ms
[1_xon ] Premain JDK (CDS enabled) 148.72 ms
[1_td ] Premain Prototype (CDS + Training Data) 131.19 ms
[1_aot ] Premain Prototype (CDS + Training Data + AOT) 80.10 ms
============================== jvm2 ============================
[2_xoff] Premain JDK (CDS disabled) 264.14 ms
[2_xon ] Premain JDK (CDS enabled) 159.66 ms
[2_td ] Premain Prototype (CDS + Training Data) 124.52 ms
[2_aot ] Premain Prototype (CDS + Training Data + AOT) 75.03 ms
================================================================
Run 2:
============================== jvm1 ============================
[1_xoff] Premain JDK (CDS disabled) 267.48 ms
[1_xon ] Premain JDK (CDS enabled) 163.18 ms
[1_td ] Premain Prototype (CDS + Training Data) 136.67 ms
[1_aot ] Premain Prototype (CDS + Training Data + AOT) 86.41 ms
============================== jvm2 ============================
[2_xoff] Premain JDK (CDS disabled) 268.73 ms
[2_xon ] Premain JDK (CDS enabled) 157.70 ms
[2_td ] Premain Prototype (CDS + Training Data) 136.69 ms
[2_aot ] Premain Prototype (CDS + Training Data + AOT) 90.39 ms
================================================================
Run 3:
============================== jvm1 ============================
[1_xoff] Premain JDK (CDS disabled) 268.36 ms
[1_xon ] Premain JDK (CDS enabled) 161.01 ms
[1_td ] Premain Prototype (CDS + Training Data) 134.43 ms
[1_aot ] Premain Prototype (CDS + Training Data + AOT) 91.79 ms
============================== jvm2 ============================
[2_xoff] Premain JDK (CDS disabled) 260.27 ms
[2_xon ] Premain JDK (CDS enabled) 165.10 ms
[2_td ] Premain Prototype (CDS + Training Data) 138.31 ms
[2_aot ] Premain Prototype (CDS + Training Data + AOT) 88.61 ms
================================================================
Once again there is no noticeable difference modulo the existing variation.
The results are not very convincing given the varying timings. However, as a tentative conclusion we can say:
1) The patch incurs no measurable loss in performance when the JVM is force to use same coops mode
2) The patch does not enable any measurable gain in performance by allowing the JVM is force to use a more efficient coops mode.
So, I think we can drop this patch and stick with enforcing a compatible oops mode.
-------------
PR Comment: https://git.openjdk.org/leyden/pull/20#issuecomment-2376969818
More information about the leyden-dev
mailing list