RFR: 8342826: Improve performance of oopDesc::klass() after JDK-8305895 [v5]

Fri Nov 15 12:20:44 UTC 2024

On Fri, 15 Nov 2024 11:03:48 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Before intergration of JEP 450, a number of minor regressions have been identified. The root-cause of those is the additional flag-check for UseCompactObjectHeaders in a couple of hot code-paths. This change addresses those cases by introducing a new helper class ObjLayout, which initializes some global state depending on the flags, and use that state later, instead of loading and checking multiple flags in hot paths.
>> 
>> This solution is not great. The real fix will eventually be to get rid of UseCompressedClassPointers in a first step, and later also get rid of UseCompactObjectHeaders, and settle on a single object layout. But we are not there, yet, and it will take several (or many) releases to get there. In the meantime, the proposed change eliminates the remaining known regressions.
>> 
>> Relevant benchmarks:
>> 
>> DaCapo:pmd (less is better)
>> pre-jep450:  703.67ms
>> mainline:  729.38ms
>> alwaysinline: 730.45ms
>> jdk8342826:  704.25ms
>> 
>> CryptoRsa (more is better)
>> pre-jep450: 9315.719 ops/min
>> mainline: 10109.509 ops/min
>> alwaysinline: 10232.120
>> jdk8342826: 10272.161 ops/min
>> 
>> 
>> Throw.throwWith64Frames microbenchmark
>> Before JEP 450 (605396280d5ea225828da4ed688068334a15e122)
>> Throw.throwWith64Frames  avgt   40  3943.690 ± 15.456  ns/op
>> Mainline
>> Throw.throwWith64Frames  avgt   40  4083.029 ± 12.044  ns/op
>> JDK-8342826
>> Throw.throwWith64Frames  avgt   40  3973.082 ± 12.956  ns/op
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert "Reorder 'Compact' vs 'Compressed'"
>   
>   This reverts commit 94275040e36982a36f9463e98189c0184fd5bbc8.

FWIW, I ran some experiments with CryptoRSA and:

java  -server -Xmx4g -Xms4g -XX:+AlwaysPreTouch -Xlog:gc -jar ./SPECjvm2008.jar --showversion crypto.rsa -it 60 -wt 30 -ikv


mainline:
8820.19 ops/m

PR (94275040e36 Reorder 'Compact' vs 'Compressed'):
9679.77 ops/m

PR but skip 94275040e36:
9921.99 ops/m

mainline inline klass():
9594.34 ops/m

mainline inline klass() + restructure klass()
10032.58 ops/m

The last experiment was with the odd-looking

@@ -95,12 +95,14 @@ void oopDesc::init_mark() {
 }

 Klass* oopDesc::klass() const {
-  if (UseCompactObjectHeaders) {
-    return mark().klass();
-  } else if (UseCompressedClassPointers) {
-     return CompressedKlassPointers::decode_not_null(_metadata._compressed_klass);
+  if (!UseCompactObjectHeaders) {
+    if (!UseCompressedClassPointers) {
+      return _metadata._klass;
+    } else {
+      return CompressedKlassPointers::decode_not_null(_metadata._compressed_klass);
+    }
   } else {
-    return _metadata._klass;
+    return mark().klass();
   }
 }

I wanted to test if messing around with the order here would give me similar performance differences that you were seeing when you changed the order in the PR.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22020#issuecomment-2478691445