RFR: 8202381: (Solaris) SIGBUS in # V [libjvm.so+0xcee494] jni_GetIntField+0x224
Daniel D. Daugherty
daniel.daugherty at oracle.com
Mon Apr 30 15:36:14 UTC 2018
On 4/30/18 9:32 AM, Erik Österlund wrote:
> Hi,
>
> It looks like I broke jniFastGetField on SPARC with my changes to
> modularize this feature (8200235). Sorry about that. And the backout
> of that (8202386) didn't go in due to maintenance. So here is a patch
> to fix the problem instead.
>
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8202381/webrev.00/
src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.cpp
src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.hpp
src/hotspot/cpu/aarch64/jniFastGetField_aarch64.cpp
Addition of the placeholder 'jni_env'; you went to the trouble
of changing 'robj' -> 'obj' on the other platforms, but not here
which is inconsistent.
src/hotspot/cpu/sparc/gc/shared/barrierSetAssembler_sparc.cpp
src/hotspot/cpu/sparc/gc/shared/barrierSetAssembler_sparc.hpp
Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
src/hotspot/cpu/sparc/jniFastGetField_sparc.cpp
Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
L74: // Both O5 and G3 are clobbered by
try_resolve_jobject_in_native.
You are passing 'G3' as 'tmp'. I don't see where/how in
try_resolve_jobject_in_native() that the 'tmp' parameter gets
clobbered.
src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.cpp
src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp
Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
src/hotspot/cpu/x86/jniFastGetField_x86_64.cpp
L50: static const Register rtmp = r8;
Nit: These seem to be sorted by register number and not by
variable name.
Addition of the placeholder 'jni_env'; why rename 'robj' -> 'obj'?
Thumbs up on the technical part of this change. My comments are
just about style and about the comments. You should make sure
that you get a review from either Kim or David H. since they
were on the original review team for JDK-8200235.
Dan
>
> Bug ID:
> https://bugs.openjdk.java.net/browse/JDK-8202381
>
> My (long version) analysis on the architectures (that I made changes
> to) of what is going on:
>
> == SPARC ==
>
> This is where the SIGBUS happened. We used to get the jobject/jweak in
> O1, clear the jweak tag in O1, and resolve the handle by loading from
> O1 to O5. This was in fact already broken, since 8176100 "[REDO][REDO]
> G1 Needs pre barrier on dereference of weak JNI handles". The
> registers used to pass the arguments into the C function (O0, O1 and
> O2) must not be touched throughout the call, despite being caller
> saved, as we have not saved the register window. By clearing the jweak
> in O1, the wrong access functions are used in the slow path if the
> speculative load is invalidated, either due to the second safepoint
> counter not being able to the first sampled safepoint counter, or the
> load of the int popping in to the signal handler. I made this problem
> worse in my 8200235 patch by not just clearing O1, but also resolving
> into O1, which clobbered the register even further. This caused
> intermittent failures in the following sequence of events: 1) the
> safepoint counter is read, and we are not in a safepoint, 2) the
> jobject/jweak is resolved, clobbering O1, 3) a safepoint is triggered,
> 4) after resolving the jobject/jweak, the safepoint counter is sampled
> again with a new number, causing a call to the slow path where O1 no
> longer contains the jobject/jweak.
>
> In my proposed changes, this was remedied by moving O1 to O5 first,
> then clearing the jweak value (in O5), then resolving (in O5) and then
> performing the speculative load with all passed C arguments untouched
> in their O1, O2 and O3 registers, as required for the fallback code.
>
> == x86_64 ==
>
> On x86_64, the passed in jobject/jweak is already moved to a separate
> scratch register. Therefore, the problem of clearing the jweak tag in
> the jobject does not apply here. However, rscratch1 was the register I
> picked as a temporary register for resolving. After reading this code
> more carefully, I realize this is the r10 register, which coincides
> with the register used to remember the old counter. So using that
> temporary register is not safe. I chose a new register instead: r8.
> This is, like the other registers used in this code, a caller saved
> register, that does not intersect with the input registers of the
> function arguments. That makes it safe to use in this context. I added
> DEBUG_ONLY code for clobbering this temporary register on x86_64, to
> make sure we more easily catch such problems.
>
> == AArch64 ==
>
> On AArch64, the jobject/jweak is already in r3 when cleared (which
> does not intersect with the registers used in the C calling
> convention, and is caller saved). Therefore, the clearing of the jweak
> tag there is correct, and so is the resolution. The temporary register
> used is r8, which is used as a temporary register by surrounding code
> in this context (and is caller saved).
>
> Apart from this, I also added a jni_env parameter to
> try_resolve_jobject_in_native, which just passes c_rarg0 containing
> the jni environment, without doing anything to it. We need to read
> from this register in ZGC, and I would prefer not to hardcode in the
> backend that c_rarg0 always refers to the jni environment in this
> context.
>
> There are still remaining problems in ARM that the jweak tag is
> cleared in r1, which will cause the slow path to resolve jweaks as
> jobjects. However, I would like to not address that issue in this fix,
> as my original changes did not touch that ARM code. I would like to
> instead file a bug for that to be solved by another brave soul who
> wants to change the jniFastGetField code in the ARM port.
>
> Testing:
> I ran the httpclient tests (that failed because of this) with
> --test-repeat 100, and ran that 3 times. No problems discovered. I am
> also finishing up hs-tier1-3 and jdk-tier1-3.
>
> Thanks,
> /Erik
More information about the hotspot-dev
mailing list