RFR: 8202381: (Solaris) SIGBUS in # V [libjvm.so+0xcee494] jni_GetIntField+0x224
Erik Österlund
erik.osterlund at oracle.com
Mon Apr 30 15:53:20 UTC 2018
Hi Dan,
On 2018-04-30 17:36, Daniel D. Daugherty wrote:
> On 4/30/18 9:32 AM, Erik Österlund wrote:
>> Hi,
>>
>> It looks like I broke jniFastGetField on SPARC with my changes to
>> modularize this feature (8200235). Sorry about that. And the backout
>> of that (8202386) didn't go in due to maintenance. So here is a patch
>> to fix the problem instead.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8202381/webrev.00/
>
> src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.cpp
> src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.hpp
> src/hotspot/cpu/aarch64/jniFastGetField_aarch64.cpp
> Addition of the placeholder 'jni_env'; you went to the trouble
> of changing 'robj' -> 'obj' on the other platforms, but not here
> which is inconsistent.
Fixed. I changed it here too.
Full:
http://cr.openjdk.java.net/~eosterlund/8202381/webrev.01/
Incremental:
http://cr.openjdk.java.net/~eosterlund/8202381/webrev.00_01/
> src/hotspot/cpu/sparc/gc/shared/barrierSetAssembler_sparc.cpp
> src/hotspot/cpu/sparc/gc/shared/barrierSetAssembler_sparc.hpp
> Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
>
> src/hotspot/cpu/sparc/jniFastGetField_sparc.cpp
> Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
>
I changed the name because no other register in these files have the r
prefix for the register variables (as they do in the jniFastGetField
files), and I simply could not keep my fingers from fixing that
inconsistency.
On SPARC, unlike x86_64, the temporary register (G3) gets immediately
clobbered after calling the try_resolve_jobject_in_native function.
Therefore, I did not bother clobbering it afterwards as it will already
be clobbered. On x86_64 on the other hand, the temporary register (r8)
is not clobbered by the rest of the code, so I decided to explicitly
clobber it so that any mistakes here become more easily reproducible.
> L74: // Both O5 and G3 are clobbered by
> try_resolve_jobject_in_native.
> You are passing 'G3' as 'tmp'. I don't see where/how in
> try_resolve_jobject_in_native() that the 'tmp' parameter gets
> clobbered.
>
It is used by ZGC only to materialize a mask for catching stale pointers.
> src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.cpp
> src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp
> Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
>
> src/hotspot/cpu/x86/jniFastGetField_x86_64.cpp
> L50: static const Register rtmp = r8;
> Nit: These seem to be sorted by register number and not by
> variable name.
>
Fixed the sorting.
> Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
As mentioned, the robj -> obj change was for consistency with the other
code in these barrier set files.
>
> Thumbs up on the technical part of this change. My comments are
> just about style and about the comments. You should make sure
> that you get a review from either Kim or David H. since they
> were on the original review team for JDK-8200235.
Thanks for the Review Dan!
Thanks,
/Erik
> Dan
>
>
>>
>> Bug ID:
>> https://bugs.openjdk.java.net/browse/JDK-8202381
>>
>> My (long version) analysis on the architectures (that I made changes
>> to) of what is going on:
>>
>> == SPARC ==
>>
>> This is where the SIGBUS happened. We used to get the jobject/jweak
>> in O1, clear the jweak tag in O1, and resolve the handle by loading
>> from O1 to O5. This was in fact already broken, since 8176100
>> "[REDO][REDO] G1 Needs pre barrier on dereference of weak JNI
>> handles". The registers used to pass the arguments into the C
>> function (O0, O1 and O2) must not be touched throughout the call,
>> despite being caller saved, as we have not saved the register window.
>> By clearing the jweak in O1, the wrong access functions are used in
>> the slow path if the speculative load is invalidated, either due to
>> the second safepoint counter not being able to the first sampled
>> safepoint counter, or the load of the int popping in to the signal
>> handler. I made this problem worse in my 8200235 patch by not just
>> clearing O1, but also resolving into O1, which clobbered the register
>> even further. This caused intermittent failures in the following
>> sequence of events: 1) the safepoint counter is read, and we are not
>> in a safepoint, 2) the jobject/jweak is resolved, clobbering O1, 3) a
>> safepoint is triggered, 4) after resolving the jobject/jweak, the
>> safepoint counter is sampled again with a new number, causing a call
>> to the slow path where O1 no longer contains the jobject/jweak.
>>
>> In my proposed changes, this was remedied by moving O1 to O5 first,
>> then clearing the jweak value (in O5), then resolving (in O5) and
>> then performing the speculative load with all passed C arguments
>> untouched in their O1, O2 and O3 registers, as required for the
>> fallback code.
>>
>> == x86_64 ==
>>
>> On x86_64, the passed in jobject/jweak is already moved to a separate
>> scratch register. Therefore, the problem of clearing the jweak tag in
>> the jobject does not apply here. However, rscratch1 was the register
>> I picked as a temporary register for resolving. After reading this
>> code more carefully, I realize this is the r10 register, which
>> coincides with the register used to remember the old counter. So
>> using that temporary register is not safe. I chose a new register
>> instead: r8. This is, like the other registers used in this code, a
>> caller saved register, that does not intersect with the input
>> registers of the function arguments. That makes it safe to use in
>> this context. I added DEBUG_ONLY code for clobbering this temporary
>> register on x86_64, to make sure we more easily catch such problems.
>>
>> == AArch64 ==
>>
>> On AArch64, the jobject/jweak is already in r3 when cleared (which
>> does not intersect with the registers used in the C calling
>> convention, and is caller saved). Therefore, the clearing of the
>> jweak tag there is correct, and so is the resolution. The temporary
>> register used is r8, which is used as a temporary register by
>> surrounding code in this context (and is caller saved).
>>
>> Apart from this, I also added a jni_env parameter to
>> try_resolve_jobject_in_native, which just passes c_rarg0 containing
>> the jni environment, without doing anything to it. We need to read
>> from this register in ZGC, and I would prefer not to hardcode in the
>> backend that c_rarg0 always refers to the jni environment in this
>> context.
>>
>> There are still remaining problems in ARM that the jweak tag is
>> cleared in r1, which will cause the slow path to resolve jweaks as
>> jobjects. However, I would like to not address that issue in this
>> fix, as my original changes did not touch that ARM code. I would like
>> to instead file a bug for that to be solved by another brave soul who
>> wants to change the jniFastGetField code in the ARM port.
>>
>> Testing:
>> I ran the httpclient tests (that failed because of this) with
>> --test-repeat 100, and ran that 3 times. No problems discovered. I am
>> also finishing up hs-tier1-3 and jdk-tier1-3.
>>
>> Thanks,
>> /Erik
>
More information about the hotspot-dev
mailing list