RFR: 8202381: (Solaris) SIGBUS in # V [libjvm.so+0xcee494] jni_GetIntField+0x224

Erik Österlund erik.osterlund at oracle.com
Mon Apr 30 13:32:03 UTC 2018


Hi,

It looks like I broke jniFastGetField on SPARC with my changes to 
modularize this feature (8200235). Sorry about that. And the backout of 
that (8202386) didn't go in due to maintenance. So here is a patch to 
fix the problem instead.

Webrev:
http://cr.openjdk.java.net/~eosterlund/8202381/webrev.00/

Bug ID:
https://bugs.openjdk.java.net/browse/JDK-8202381

My (long version) analysis on the architectures (that I made changes to) 
of what is going on:

== SPARC ==

This is where the SIGBUS happened. We used to get the jobject/jweak in 
O1, clear the jweak tag in O1, and resolve the handle by loading from O1 
to O5. This was in fact already broken, since 8176100 "[REDO][REDO] G1 
Needs pre barrier on dereference of weak JNI handles". The registers 
used to pass the arguments into the C function (O0, O1 and O2) must not 
be touched throughout the call, despite being caller saved, as we have 
not saved the register window. By clearing the jweak in O1, the wrong 
access functions are used in the slow path if the speculative load is 
invalidated, either due to the second safepoint counter not being able 
to the first sampled safepoint counter, or the load of the int popping 
in to the signal handler. I made this problem worse in my 8200235 patch 
by not just clearing O1, but also resolving into O1, which clobbered the 
register even further. This caused intermittent failures in the 
following sequence of events: 1) the safepoint counter is read, and we 
are not in a safepoint, 2) the jobject/jweak is resolved, clobbering O1, 
3) a safepoint is triggered, 4) after resolving the jobject/jweak, the 
safepoint counter is sampled again with a new number, causing a call to 
the slow path where O1 no longer contains the jobject/jweak.

In my proposed changes, this was remedied by moving O1 to O5 first, then 
clearing the jweak value (in O5), then resolving (in O5) and then 
performing the speculative load with all passed C arguments untouched in 
their O1, O2 and O3 registers, as required for the fallback code.

== x86_64 ==

On x86_64, the passed in jobject/jweak is already moved to a separate 
scratch register. Therefore, the problem of clearing the jweak tag in 
the jobject does not apply here. However, rscratch1 was the register I 
picked as a temporary register for resolving. After reading this code 
more carefully, I realize this is the r10 register, which coincides with 
the register used to remember the old counter. So using that temporary 
register is not safe. I chose a new register instead: r8. This is, like 
the other registers used in this code, a caller saved register, that 
does not intersect with the input registers of the function arguments. 
That makes it safe to use in this context. I added DEBUG_ONLY code for 
clobbering this temporary register on x86_64, to make sure we more 
easily catch such problems.

== AArch64 ==

On AArch64, the jobject/jweak is already in r3 when cleared (which does 
not intersect with the registers used in the C calling convention, and 
is caller saved). Therefore, the clearing of the jweak tag there is 
correct, and so is the resolution. The temporary register used is r8, 
which is used as a temporary register by surrounding code in this 
context (and is caller saved).

Apart from this, I also added a jni_env parameter to 
try_resolve_jobject_in_native, which just passes c_rarg0 containing the 
jni environment, without doing anything to it. We need to read from this 
register in ZGC, and I would prefer not to hardcode in the backend that 
c_rarg0 always refers to the jni environment in this context.

There are still remaining problems in ARM that the jweak tag is cleared 
in r1, which will cause the slow path to resolve jweaks as jobjects. 
However, I would like to not address that issue in this fix, as my 
original changes did not touch that ARM code. I would like to instead 
file a bug for that to be solved by another brave soul who wants to 
change the jniFastGetField code in the ARM port.

Testing:
I ran the httpclient tests (that failed because of this) with 
--test-repeat 100, and ran that 3 times. No problems discovered. I am 
also finishing up hs-tier1-3 and jdk-tier1-3.

Thanks,
/Erik


More information about the hotspot-dev mailing list