RFR: 8262896: [macos_aarch64] Crash in jni_fast_GetLongField

Tue Apr 13 10:55:00 UTC 2021

On Fri, 9 Apr 2021 18:25:10 GMT, Anton Kozlov <akozlov at openjdk.org> wrote:

> Hi, please review a fix for a random crash on macos/aarch64.
> 
> By default, GetXXXField JNI Interface implementation is a generated function (-XX:+UseFastJNIAccessors). Usually the function is called by JNI code running in WXExec mode and everything is fine. But sometime we attempt to call it in WXWrite context, like in the stack trace attached to the bug:
> 
> 
> v  ~BufferBlob::jni_fast_GetLongField
> V  [libjvm.dylib+0x7a6538]  Perf_Detach+0x168
> j  jdk.internal.perf.Perf.detach(Ljava/nio/ByteBuffer;)V+0 java.base at 17-internal
> j  jdk.internal.perf.Perf$CleanerAction.run()V+8 java.base at 17-internal
> j  jdk.internal.ref.CleanerImpl$PhantomCleanableRef.performCleanup()V+4 java.base at 17-internal
> j  jdk.internal.ref.PhantomCleanable.clean()V+12 java.base at 17-internal
> j  jdk.internal.ref.CleanerImpl.run()V+57 java.base at 17-internal
> j  java.lang.Thread.run()V+11 java.base at 17-internal
> j  jdk.internal.misc.InnocuousThread.run()V+20 java.base at 17-internal
> v  ~StubRoutines::call_stub
> 
> 
> One way to fix the bug is to ensure WXExec mode before calling GetXXXField, but it depends on finding and fixing all such cases. 
> 
> This patch instead adds additional actions to GetXXXField implementation to ensure correct W^X mode regardless if it is called from WXWrite or WXExec mode.

Hi David,

> > v  ~BufferBlob::jni_fast_GetLongField
> > V  [libjvm.dylib+0x7a6538]  Perf_Detach+0x168
> 
> I'm struggling to see how we get to the jni_fast_GetLongField from
> Perf_Detach ??

Perf_Detach calls e.g GetDirectBufferAddress https://github.com/openjdk/jdk/blob/1935655622226421797ea9109bebd4a00fe60402/src/hotspot/share/prims/perf.cpp#L114, which calls GetLongField https://github.com/openjdk/jdk/blob/1935655622226421797ea9109bebd4a00fe60402/src/hotspot/share/prims/jni.cpp#L3058

> Once again I'm finding it very hard to understand what the actual rules
> are for these W^X mode changes and what code requires what.

Sorry for this. Probably there should be a document or comment in the code, but I didn't find a right place for that. But your understanding below is totally correct.

> IIUC the normal(?) mode is WXExec and we use that when executing JITed or other native code. 

Right, since most of the time we execute generated code.

> But if we enter the VM we need to switch to WXWrite mode.

Not a strong necessity, but a choice. It is done to avoid going back and forth between W^X modes while executing VM code, also to avoid many subtle issues similar to this one but in much bigger counts.

> for some reason these generated fast-accessors need to be run in WXExec
> mode. 

It just impossible to execute generated code without switching to WXExec.

> So IIUC the transitions that are already in place for executing
> "VM code" are going to be wrong any time that VM code ends up executing
> something like a fast-accessor - is that the case? 

Partially, yes. It's not totally wrong, since VM code still may need to write code cache, i.e. during deoptimization happening randomly. 

> And what other code
> is in the same category as these generated fast-accessors?

There are rather small number of such cases (I found these by grepping WXExec)
* SafeFetch32/N 
* StubRoutines::UnsafeArrayCopy_stub 
* ProgrammableInvoker::invoke_native

Hope it clarifies minor details.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3422