RFR(S) 8162369: PPC64: Wrong ucontext used after SIGTRAP while in HTM critical section

Doerr, Martin martin.doerr at sap.com
Wed Aug 3 13:17:39 UTC 2016


Hi Gustavo,

thank you very much for providing the webrev and doing experiments.
A couple of questions came into my mind when taking a look at it:

1. A future C++ compiler or runtime code may use HTM, too. Should your new code get invoked in such a case? Or is it only for transactions which were started in the code cache (UseRTMLocking)?

2. I was wondering why you only check for zombie_not_entrant. There are many cases in which a signal may occur during a transaction (e.g. implicit null check, range check, stack overflow check, safepoint poll, ... and possibly more in the future). What about them? I guess continuing execution in the context of the 'treclaim'ed transaction is incorrect.

3. I think reporting errors which occur during transactions is cool. But if it's too complex, I could live without it. Other platforms don't support this either (they just abort by hardware).

I was not involved in the earlier discussion. Maybe I missed something.

Thanks and best regards,
Martin


-----Original Message-----
From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] 
Sent: Dienstag, 2. August 2016 22:39
To: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net
Cc: Thomas Stüfe <thomas.stuefe at gmail.com>; Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Breno Leitao <brenohl at br.ibm.com>
Subject: RFR(S) 8162369: PPC64: Wrong ucontext used after SIGTRAP while in HTM critical section
Importance: High

Hi,

Could the following webrev be hosted and reviewed, please?

Webrev: http://81.de.7a9f.ip4.static.sl-reverse.com/8162369/v1
CR: https://bugs.openjdk.java.net/browse/JDK-8162369


Thomas, I've tested the proposed fix under two different (unexpected)
error conditions that can happen when in an HTM transaction:
a SIGSEGV and a SIGILL.

Since reporting pc at tbegin+4 is misleading, in both cases the correct
context for error reporting is the second (transactional). Then:

1) in a segmentation fault in the middle of a transaction it will be
reported correctly, like this:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00003fff4c2dffac, pid=31046, tid=31047

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000000

Hence when we inspect the offending location indicated by the pc value in
the hs_err log we find the right root cause of the SIGSEGV:

(gdb) disas 0x00003fff4c2dffac-4,+8
Dump of assembler code from 0x3fff4c2dffa8 to 0x3fff4c2dffb0:
   0x00003fff4c2dffa8:	li      r15,0
   0x00003fff4c2dffac:	ld      r15,0(r15)
End of assembler dump.

Tested by the injection of a load from memory position 0x0, accordingly to
this patch: https://paste.fedoraproject.org/400032/
Full log hs_err log: http://paste.fedoraproject.org/400035/raw/

2) in an illegal instruction in the middle of a transaction it will be
reported correctly as well, like this:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x00003fff7c2dd930, pid=21140, tid=21141

siginfo: si_signo: 4 (SIGILL), si_code: 1 (ILL_ILLOPC), si_addr: 0x00003fff7c2dd930

Hence when we inspect the offending location indicated by the pc value we
find the right offending instruction:

(gdb) disas 0x00003fff7c2dd930-4,+8
Dump of assembler code from 0x3fff7c2dd92c to 0x3fff7c2dd934:
   0x00003fff7c2dd92c:	cmpwi   cr6,r0,1
   0x00003fff7c2dd930:	.long 0xea2f0013
End of assembler dump.

Tested by the injection of a illegal instruction, accordingly to this
patch: https://paste.fedoraproject.org/400080/
Full log hs_err log: https://paste.fedoraproject.org/400073/raw/


Martin, although the detection using is_tbegin() is equivalent to checking
the uc_link pointer, I've decided to use the detection suggested by the
Linux kernel documentation, i.e checking the uc_link pointer plus the MSR
TS bits. As you said it's not an issue since uc_link isn't new and the msr
register isn't also new. So this kind of check will compile on both PPC64
LE and BE fine.

Thank you for sponsoring the change.

No regression was observed. One additional test now passes:
Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java

Thank you!


Best regards,
Gustavo



More information about the ppc-aix-port-dev mailing list