RFR: 8262896: [macos_aarch64] Crash in jni_fast_GetLongField

Tue Apr 13 10:22:57 UTC 2021

On Sat, 10 Apr 2021 10:24:42 GMT, Andrew Haley <aph at openjdk.org> wrote:

> How much does this gain over turning off UseFastJNIAccessors?

The benchmark is https://github.com/AntonKozlov/macos-aarch64-transition-bench/commit/6de6c410e7884c290ec3e85c0d7fa9339e254192.

For loopCnt=1:

Before, +UseFastJNIAccessors:
MyBenchmark.testGetField          1  thrpt   25  201397788.701 ± 907059.494  ops/s

Before, -UseFastJNIAccessors:
MyBenchmark.testGetField          1  thrpt   25  20435101.708 ± 233303.518  ops/s

After, +UseFastJNIAccessors:
MyBenchmark.testGetField          1  thrpt   25  151830846.914 ± 654947.292  ops/s

After, -UseFastJNIAccessors:
MyBenchmark.testGetField          1  thrpt   25  20278690.117 ± 287142.720  ops/s

After with ThreadWXEnable commented out, +UseFastJNIAccessors:
MyBenchmark.testGetField          1  thrpt   25  165277289.798 ± 939646.095  ops/s

After with ThreadWXEnable and  JavaThread::thread_from_jni_environment commented out, +UseFastJNIAccessors:
MyBenchmark.testGetField          1  thrpt   25  187846151.217 ± 1014919.707  ops/s

In summary, the fast accessor is ~10x faster now, before the patch. 
The code of this patch is not optimal, it introduces ~25% performance penalty, and only ~7% is W^X overhead. Indirection costs ~7% and JavaThread::thread_from_jni_environment is another ~11%. I was going to look at performance side after fixing the annoying failure.

> I guess it doesn't much matter because you keep track of the current mode and only flip it if you really need to, which is rare?

In most cases GetXXXField is called from WXExec. So the overhead should be only checking current thread mode.

It's also sad that the cost is constant while we rarely call these accessors from WXWrite.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3422