RFR: 8202381: (Solaris) SIGBUS in # V [libjvm.so+0xcee494] jni_GetIntField+0x224

Erik Österlund erik.osterlund at oracle.com
Mon Apr 30 15:53:20 UTC 2018


Hi Dan,

On 2018-04-30 17:36, Daniel D. Daugherty wrote:
> On 4/30/18 9:32 AM, Erik Österlund wrote:
>> Hi,
>>
>> It looks like I broke jniFastGetField on SPARC with my changes to 
>> modularize this feature (8200235). Sorry about that. And the backout 
>> of that (8202386) didn't go in due to maintenance. So here is a patch 
>> to fix the problem instead.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8202381/webrev.00/
>
> src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.cpp
> src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.hpp
> src/hotspot/cpu/aarch64/jniFastGetField_aarch64.cpp
>     Addition of the placeholder 'jni_env'; you went to the trouble
>     of changing 'robj' -> 'obj' on the other platforms, but not here
>     which is inconsistent.

Fixed. I changed it here too.

Full:
http://cr.openjdk.java.net/~eosterlund/8202381/webrev.01/

Incremental:
http://cr.openjdk.java.net/~eosterlund/8202381/webrev.00_01/

> src/hotspot/cpu/sparc/gc/shared/barrierSetAssembler_sparc.cpp
> src/hotspot/cpu/sparc/gc/shared/barrierSetAssembler_sparc.hpp
>     Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
>
> src/hotspot/cpu/sparc/jniFastGetField_sparc.cpp
>     Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
>

I changed the name because no other register in these files have the r 
prefix for the register variables (as they do in the jniFastGetField 
files), and I simply could not keep my fingers from fixing that 
inconsistency.

On SPARC, unlike x86_64, the temporary register (G3) gets immediately 
clobbered after calling the try_resolve_jobject_in_native function. 
Therefore, I did not bother clobbering it afterwards as it will already 
be clobbered. On x86_64 on the other hand, the temporary register (r8) 
is not clobbered by the rest of the code, so I decided to explicitly 
clobber it so that any mistakes here become more easily reproducible.

>     L74:   // Both O5 and G3 are clobbered by 
> try_resolve_jobject_in_native.
>         You are passing 'G3' as 'tmp'. I don't see where/how in
>         try_resolve_jobject_in_native() that the 'tmp' parameter gets
>         clobbered.
>

It is used by ZGC only to materialize a mask for catching stale pointers.

> src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.cpp
> src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp
>     Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
>
> src/hotspot/cpu/x86/jniFastGetField_x86_64.cpp
>     L50: static const Register rtmp          = r8;
>         Nit: These seem to be sorted by register number and not by
>              variable name.
>

Fixed the sorting.

>     Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?

As mentioned, the robj -> obj change was for consistency with the other 
code in these barrier set files.

>
> Thumbs up on the technical part of this change. My comments are
> just about style and about the comments. You should make sure
> that you get a review from either Kim or David H. since they
> were on the original review team for JDK-8200235.

Thanks for the Review Dan!

Thanks,
/Erik

> Dan
>
>
>>
>> Bug ID:
>> https://bugs.openjdk.java.net/browse/JDK-8202381
>>
>> My (long version) analysis on the architectures (that I made changes 
>> to) of what is going on:
>>
>> == SPARC ==
>>
>> This is where the SIGBUS happened. We used to get the jobject/jweak 
>> in O1, clear the jweak tag in O1, and resolve the handle by loading 
>> from O1 to O5. This was in fact already broken, since 8176100 
>> "[REDO][REDO] G1 Needs pre barrier on dereference of weak JNI 
>> handles". The registers used to pass the arguments into the C 
>> function (O0, O1 and O2) must not be touched throughout the call, 
>> despite being caller saved, as we have not saved the register window. 
>> By clearing the jweak in O1, the wrong access functions are used in 
>> the slow path if the speculative load is invalidated, either due to 
>> the second safepoint counter not being able to the first sampled 
>> safepoint counter, or the load of the int popping in to the signal 
>> handler. I made this problem worse in my 8200235 patch by not just 
>> clearing O1, but also resolving into O1, which clobbered the register 
>> even further. This caused intermittent failures in the following 
>> sequence of events: 1) the safepoint counter is read, and we are not 
>> in a safepoint, 2) the jobject/jweak is resolved, clobbering O1, 3) a 
>> safepoint is triggered, 4) after resolving the jobject/jweak, the 
>> safepoint counter is sampled again with a new number, causing a call 
>> to the slow path where O1 no longer contains the jobject/jweak.
>>
>> In my proposed changes, this was remedied by moving O1 to O5 first, 
>> then clearing the jweak value (in O5), then resolving (in O5) and 
>> then performing the speculative load with all passed C arguments 
>> untouched in their O1, O2 and O3 registers, as required for the 
>> fallback code.
>>
>> == x86_64 ==
>>
>> On x86_64, the passed in jobject/jweak is already moved to a separate 
>> scratch register. Therefore, the problem of clearing the jweak tag in 
>> the jobject does not apply here. However, rscratch1 was the register 
>> I picked as a temporary register for resolving. After reading this 
>> code more carefully, I realize this is the r10 register, which 
>> coincides with the register used to remember the old counter. So 
>> using that temporary register is not safe. I chose a new register 
>> instead: r8. This is, like the other registers used in this code, a 
>> caller saved register, that does not intersect with the input 
>> registers of the function arguments. That makes it safe to use in 
>> this context. I added DEBUG_ONLY code for clobbering this temporary 
>> register on x86_64, to make sure we more easily catch such problems.
>>
>> == AArch64 ==
>>
>> On AArch64, the jobject/jweak is already in r3 when cleared (which 
>> does not intersect with the registers used in the C calling 
>> convention, and is caller saved). Therefore, the clearing of the 
>> jweak tag there is correct, and so is the resolution. The temporary 
>> register used is r8, which is used as a temporary register by 
>> surrounding code in this context (and is caller saved).
>>
>> Apart from this, I also added a jni_env parameter to 
>> try_resolve_jobject_in_native, which just passes c_rarg0 containing 
>> the jni environment, without doing anything to it. We need to read 
>> from this register in ZGC, and I would prefer not to hardcode in the 
>> backend that c_rarg0 always refers to the jni environment in this 
>> context.
>>
>> There are still remaining problems in ARM that the jweak tag is 
>> cleared in r1, which will cause the slow path to resolve jweaks as 
>> jobjects. However, I would like to not address that issue in this 
>> fix, as my original changes did not touch that ARM code. I would like 
>> to instead file a bug for that to be solved by another brave soul who 
>> wants to change the jniFastGetField code in the ARM port.
>>
>> Testing:
>> I ran the httpclient tests (that failed because of this) with 
>> --test-repeat 100, and ran that 3 times. No problems discovered. I am 
>> also finishing up hs-tier1-3 and jdk-tier1-3.
>>
>> Thanks,
>> /Erik
>



More information about the hotspot-dev mailing list