RFR: 8202381: (Solaris) SIGBUS in # V [libjvm.so+0xcee494] jni_GetIntField+0x224

Daniel D. Daugherty daniel.daugherty at oracle.com
Mon Apr 30 15:57:45 UTC 2018


On 4/30/18 11:53 AM, Erik Österlund wrote:
> Hi Dan,
>
> On 2018-04-30 17:36, Daniel D. Daugherty wrote:
>> On 4/30/18 9:32 AM, Erik Österlund wrote:
>>> Hi,
>>>
>>> It looks like I broke jniFastGetField on SPARC with my changes to 
>>> modularize this feature (8200235). Sorry about that. And the backout 
>>> of that (8202386) didn't go in due to maintenance. So here is a 
>>> patch to fix the problem instead.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8202381/webrev.00/
>>
>> src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.cpp
>> src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.hpp
>> src/hotspot/cpu/aarch64/jniFastGetField_aarch64.cpp
>>     Addition of the placeholder 'jni_env'; you went to the trouble
>>     of changing 'robj' -> 'obj' on the other platforms, but not here
>>     which is inconsistent.
>
> Fixed. I changed it here too.
>
> Full:
> http://cr.openjdk.java.net/~eosterlund/8202381/webrev.01/
>
> Incremental:
> http://cr.openjdk.java.net/~eosterlund/8202381/webrev.00_01/
>
>> src/hotspot/cpu/sparc/gc/shared/barrierSetAssembler_sparc.cpp
>> src/hotspot/cpu/sparc/gc/shared/barrierSetAssembler_sparc.hpp
>>     Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
>>
>> src/hotspot/cpu/sparc/jniFastGetField_sparc.cpp
>>     Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
>>
>
> I changed the name because no other register in these files have the r 
> prefix for the register variables (as they do in the jniFastGetField 
> files), and I simply could not keep my fingers from fixing that 
> inconsistency.

Sounds good. Thanks for explaining.


> On SPARC, unlike x86_64, the temporary register (G3) gets immediately 
> clobbered after calling the try_resolve_jobject_in_native function.

How/where? I just don't see it, but my SPARC assembly is rusty...


> Therefore, I did not bother clobbering it afterwards as it will 
> already be clobbered. On x86_64 on the other hand, the temporary 
> register (r8) is not clobbered by the rest of the code, so I decided 
> to explicitly clobber it so that any mistakes here become more easily 
> reproducible.

Yup. I liked that change.


>
>>     L74:   // Both O5 and G3 are clobbered by 
>> try_resolve_jobject_in_native.
>>         You are passing 'G3' as 'tmp'. I don't see where/how in
>>         try_resolve_jobject_in_native() that the 'tmp' parameter gets
>>         clobbered.
>>
>
> It is used by ZGC only to materialize a mask for catching stale pointers.

So G3 only gets clobbered with ZGC (which is not yet in this code base)?


>
>> src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.cpp
>> src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp
>>     Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
>>
>> src/hotspot/cpu/x86/jniFastGetField_x86_64.cpp
>>     L50: static const Register rtmp          = r8;
>>         Nit: These seem to be sorted by register number and not by
>>              variable name.
>>
>
> Fixed the sorting.
>
>>     Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
>
> As mentioned, the robj -> obj change was for consistency with the 
> other code in these barrier set files.
>
>>
>> Thumbs up on the technical part of this change. My comments are
>> just about style and about the comments. You should make sure
>> that you get a review from either Kim or David H. since they
>> were on the original review team for JDK-8200235.
>
> Thanks for the Review Dan!

You're welcome. It would be good to get this fix in today if at all
possible... Once your fix is in, we can close my BACKOUT bug as "will
not fix"...

Dan


>
> Thanks,
> /Erik
>
>> Dan
>>
>>
>>>
>>> Bug ID:
>>> https://bugs.openjdk.java.net/browse/JDK-8202381
>>>
>>> My (long version) analysis on the architectures (that I made changes 
>>> to) of what is going on:
>>>
>>> == SPARC ==
>>>
>>> This is where the SIGBUS happened. We used to get the jobject/jweak 
>>> in O1, clear the jweak tag in O1, and resolve the handle by loading 
>>> from O1 to O5. This was in fact already broken, since 8176100 
>>> "[REDO][REDO] G1 Needs pre barrier on dereference of weak JNI 
>>> handles". The registers used to pass the arguments into the C 
>>> function (O0, O1 and O2) must not be touched throughout the call, 
>>> despite being caller saved, as we have not saved the register 
>>> window. By clearing the jweak in O1, the wrong access functions are 
>>> used in the slow path if the speculative load is invalidated, either 
>>> due to the second safepoint counter not being able to the first 
>>> sampled safepoint counter, or the load of the int popping in to the 
>>> signal handler. I made this problem worse in my 8200235 patch by not 
>>> just clearing O1, but also resolving into O1, which clobbered the 
>>> register even further. This caused intermittent failures in the 
>>> following sequence of events: 1) the safepoint counter is read, and 
>>> we are not in a safepoint, 2) the jobject/jweak is resolved, 
>>> clobbering O1, 3) a safepoint is triggered, 4) after resolving the 
>>> jobject/jweak, the safepoint counter is sampled again with a new 
>>> number, causing a call to the slow path where O1 no longer contains 
>>> the jobject/jweak.
>>>
>>> In my proposed changes, this was remedied by moving O1 to O5 first, 
>>> then clearing the jweak value (in O5), then resolving (in O5) and 
>>> then performing the speculative load with all passed C arguments 
>>> untouched in their O1, O2 and O3 registers, as required for the 
>>> fallback code.
>>>
>>> == x86_64 ==
>>>
>>> On x86_64, the passed in jobject/jweak is already moved to a 
>>> separate scratch register. Therefore, the problem of clearing the 
>>> jweak tag in the jobject does not apply here. However, rscratch1 was 
>>> the register I picked as a temporary register for resolving. After 
>>> reading this code more carefully, I realize this is the r10 
>>> register, which coincides with the register used to remember the old 
>>> counter. So using that temporary register is not safe. I chose a new 
>>> register instead: r8. This is, like the other registers used in this 
>>> code, a caller saved register, that does not intersect with the 
>>> input registers of the function arguments. That makes it safe to use 
>>> in this context. I added DEBUG_ONLY code for clobbering this 
>>> temporary register on x86_64, to make sure we more easily catch such 
>>> problems.
>>>
>>> == AArch64 ==
>>>
>>> On AArch64, the jobject/jweak is already in r3 when cleared (which 
>>> does not intersect with the registers used in the C calling 
>>> convention, and is caller saved). Therefore, the clearing of the 
>>> jweak tag there is correct, and so is the resolution. The temporary 
>>> register used is r8, which is used as a temporary register by 
>>> surrounding code in this context (and is caller saved).
>>>
>>> Apart from this, I also added a jni_env parameter to 
>>> try_resolve_jobject_in_native, which just passes c_rarg0 containing 
>>> the jni environment, without doing anything to it. We need to read 
>>> from this register in ZGC, and I would prefer not to hardcode in the 
>>> backend that c_rarg0 always refers to the jni environment in this 
>>> context.
>>>
>>> There are still remaining problems in ARM that the jweak tag is 
>>> cleared in r1, which will cause the slow path to resolve jweaks as 
>>> jobjects. However, I would like to not address that issue in this 
>>> fix, as my original changes did not touch that ARM code. I would 
>>> like to instead file a bug for that to be solved by another brave 
>>> soul who wants to change the jniFastGetField code in the ARM port.
>>>
>>> Testing:
>>> I ran the httpclient tests (that failed because of this) with 
>>> --test-repeat 100, and ran that 3 times. No problems discovered. I 
>>> am also finishing up hs-tier1-3 and jdk-tier1-3.
>>>
>>> Thanks,
>>> /Erik
>>
>



More information about the hotspot-dev mailing list