RFR: 8240956: SEGV in DwarfParser::process_dwarf after JDK-8234624
Yasumasa Suenaga
suenaga at oss.nttdata.com
Tue Mar 24 00:08:09 UTC 2020
Hi Serguei,
Thanks for your comment!
I uploaded new webrev:
http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.04/
Also I pushed it to submit repo:
http://hg.openjdk.java.net/jdk/submit/rev/fade6a949bd1
On 2020/03/24 7:39, serguei.spitsyn at oracle.com wrote:
> Hi Yasumasa,
>
> The mach5 tier5 testing looks good.
> The serviceability/sa/ClhsdbPstack.java is failed without fix and is not failed with it.
>
> Thanks,
> Serguei
>
>
> On 3/23/20 10:18, serguei.spitsyn at oracle.com wrote:
>> Hi Yasumasa,
>>
>> I looked at you changes.
>> It is hard to understand if this fully solves the issue.
>>
>> http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.03/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/amd64/LinuxAMD64CFrame.java.frames.html
>>
>> @@ -34,10 +34,11 @@
>>
>> public static LinuxAMD64CFrame getTopFrame(LinuxDebugger dbg, Address rip, ThreadContext context) {
>> Address libptr = dbg.findLibPtrByAddress(rip);
>> Address cfa = context.getRegisterAsAddress(AMD64ThreadContext.RBP);
>> DwarfParser dwarf = null;
>> + boolean unsupportedDwarf = false;
>>
>> if (libptr != null) { // Native frame
>> try {
>> dwarf = new DwarfParser(libptr);
>> dwarf.processDwarf(rip);
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> @@ -45,24 +46,33 @@
>> !dwarf.isBPOffsetAvailable())
>> ? context.getRegisterAsAddress(AMD64ThreadContext.RBP)
>> : context.getRegisterAsAddress(dwarf.getCFARegister())
>> .addOffsetTo(dwarf.getCFAOffset());
>> } catch (DebuggerException e) {
>> - // Bail out to Java frame case
>> + if (dwarf != null) {
>> + // DWARF processing should succeed when the frame is native
>> + // but it might fail if CIE has language personality routine
>> + // and/or LSDA.
>> + dwarf = null;
>> + unsupportedDwarf = true;
>> + } else {
>> + throw e;
>> + }
>> }
>> }
>>
>> return (cfa == null) ? null
>> - : new LinuxAMD64CFrame(dbg, cfa, rip, dwarf);
>> + : new LinuxAMD64CFrame(dbg, cfa, rip, dwarf, !unsupportedDwarf);
>> }
>>
>> @@ -121,13 +131,25 @@
>> }
>>
>> return isValidFrame(nextCFA, context) ? nextCFA : null;
>> }
>>
>> - private DwarfParser getNextDwarf(Address nextPC) {
>> - DwarfParser nextDwarf = null;
>> + @Override
>> + public CFrame sender(ThreadProxy thread) {
>> + if (!possibleNext) {
>> + return null;
>> + }
>> +
>> + ThreadContext context = thread.getContext();
>> +
>> + Address nextPC = getNextPC(dwarf != null);
>> + if (nextPC == null) {
>> + return null;
>> + }
>>
>> + DwarfParser nextDwarf = null;
>> + boolean unsupportedDwarf = false;
>> if ((dwarf != null) && dwarf.isIn(nextPC)) {
>> nextDwarf = dwarf;
>> } else {
>> Address libptr = dbg.findLibPtrByAddress(nextPC);
>> if (libptr != null) {
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> @@ -138,33 +160,29 @@
>> }
>> }
>> }
>>
>> if (nextDwarf != null) {
>> + try {
>> nextDwarf.processDwarf(nextPC);
>> + } catch (DebuggerException e) {
>> + // DWARF processing should succeed when the frame is native
>> + // but it might fail if CIE has language personality routine
>> + // and/or LSDA.
>> + nextDwarf = null;
>> + unsupportedDwarf = true;
>> }
>>
>> This fix looks like a hack.
>> Should we just propagate the Debugging exception instead of trying to maintain unsupportedDwarf flag?
DwarfParser::processDwarf would throw DebuggerException if it cannot find DWARF which relates to PC.
PC at this point is for next frame. So current frame (`this` object) is valid, and it should be processed.
>> Also, I don't like that DWARF-specific abbreviations (like CIE, IDE,LSDA, etc.) are used without any comments explaining them.
>> The code has to be generally readable without looking into the DWARF spec each time.
I added comments for them in this webrev.
Thanks,
Yasumasa
>> I'm submitting mach5 jobs to make sure the issue has been resolved with your fix.
>>
>> Thanks,
>> Serguei
>>
>>
>> On 3/20/20 17:55, Yasumasa Suenaga wrote:
>>> Thanks Chris!
>>> I'm waiting for reviewers for this change.
>>>
>>>
>>> Yasumasa
>>>
>>>
>>> On 2020/03/21 4:23, Chris Plummer wrote:
>>>> Hi Yasumasa,
>>>>
>>>> The failure is due to JDK-8231634, so not something you need to worry about.
>>>>
>>>> thanks,
>>>>
>>>> Chris
>>>>
>>>> On 3/20/20 2:45 AM, Yasumasa Suenaga wrote:
>>>>> Hi Chris,
>>>>>
>>>>> I uploaded new webrev which includes reverting change for ProblemList:
>>>>>
>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.03/
>>>>>
>>>>> I tested it on submit repo (mach5-one-ysuenaga-JDK-8240956-2-20200320-0814-9586301),
>>>>> but it has failed in ClhsdbJstackXcompStress.java.
>>>>> However I think it is not caused by this change because ClhsdbJstackXcompStress.java tests `jhsdb jstack`, not mixed mode, it would not parse DWARF.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2020/03/20 13:55, Chris Plummer wrote:
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> The test has been problem listed so please add undoing this to your webrev. Here's the diff that problem listed it:
>>>>>>
>>>>>> diff --git a/test/hotspot/jtreg/ProblemList.txt b/test/hotspot/jtreg/ProblemList.txt
>>>>>> --- a/test/hotspot/jtreg/ProblemList.txt
>>>>>> +++ b/test/hotspot/jtreg/ProblemList.txt
>>>>>> @@ -115,7 +115,7 @@
>>>>>> serviceability/sa/ClhsdbPrintAll.java 8193639 solaris-all
>>>>>> serviceability/sa/ClhsdbPrintAs.java 8193639 solaris-all
>>>>>> serviceability/sa/ClhsdbPrintStatics.java 8193639 solaris-all
>>>>>> -serviceability/sa/ClhsdbPstack.java 8193639 solaris-all
>>>>>> +serviceability/sa/ClhsdbPstack.java 8193639,8240956 solaris-all,linux-all
>>>>>> serviceability/sa/ClhsdbRegionDetailsScanOopsForG1.java 8193639 solaris-all
>>>>>> serviceability/sa/ClhsdbScanOops.java 8193639,8235220,8230731 solaris-all,linux-x64,macosx-x64,windows-x64
>>>>>> serviceability/sa/ClhsdbSource.java 8193639 solaris-all
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> On 3/16/20 5:07 AM, Yasumasa Suenaga wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> This webrev has passed submit repo (mach5-one-ysuenaga-JDK-8240956-20200316-0924-9487169) and additional tests.
>>>>>>> So please review it:
>>>>>>>
>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8240956
>>>>>>> webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.02/
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2020/03/16 21:03, Yasumasa Suenaga wrote:
>>>>>>>> Thank you so much, David!
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2020/03/16 21:01, David Holmes wrote:
>>>>>>>>> On 16/03/2020 9:46 pm, David Holmes wrote:
>>>>>>>>>> On 16/03/2020 7:20 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>> Hi David,
>>>>>>>>>>>
>>>>>>>>>>> I missed loop condition, so I fixed it and pushed to submit repo.
>>>>>>>>>>> Could you try again?
>>>>>>>>>>>
>>>>>>>>>>> http://hg.openjdk.java.net/jdk/submit/rev/9c148df17f23
>>>>>>>>>>>
>>>>>>>>>>> webrev is here:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.02/
>>>>>>>>>>
>>>>>>>>>> Test job resubmitted. Will advise results if it completes before I go to bed :)
>>>>>>>>>
>>>>>>>>> Seems to have passed okay.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot!
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2020/03/16 16:17, David Holmes wrote:
>>>>>>>>>>>> Sorry it is still crashing.
>>>>>>>>>>>>
>>>>>>>>>>>> #
>>>>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>>>>>> #
>>>>>>>>>>>> # SIGSEGV (0xb) at pc=0x00007f98ec01e94e, pid=13702, tid=13704
>>>>>>>>>>>> #
>>>>>>>>>>>> # JRE version: Java(TM) SE Runtime Environment (15.0) (fastdebug build 15-internal+0-2020-03-16-0640217.suenaga.source)
>>>>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 15-internal+0-2020-03-16-0640217.suenaga.source, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
>>>>>>>>>>>> # Problematic frame:
>>>>>>>>>>>> # C [libsaproc.so+0x494e] DwarfParser::process_dwarf(unsigned long)+0x4e
>>>>>>>>>>>> #
>>>>>>>>>>>>
>>>>>>>>>>>> Same as before.
>>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>> On 16/03/2020 4:57 pm, David Holmes wrote:
>>>>>>>>>>>>> On 16/03/2020 4:51 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>> On 2020/03/16 15:43, Chris Plummer wrote:
>>>>>>>>>>>>>>> BTW, if you submit it to the submit repo, we can then go and run additional internal tests (and even more builds) using that job.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for that tip Chris!
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've pushed the change to submit repo, but I've not yet received the result.
>>>>>>>>>>>>>> I will share you when I get job ID.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We can see the id. Just need to wait for the builds to complete before submitting the additional tests.
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 3/15/20 11:36 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you for testing it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I updated webrev to avoid bailout to Java frame when DWARF has language personality routine or LSDA.
>>>>>>>>>>>>>>>> Could you try it?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.01/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It works well on my Fedora 31 and Oracle Linux 7.7 .
>>>>>>>>>>>>>>>> I've pushed it to submit repo.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Diff from webrev.00 is here:
>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/submit/rev/6f11cd275652
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2020/03/16 13:12, David Holmes wrote:
>>>>>>>>>>>>>>>>> Correction ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 16/03/2020 12:53 pm, David Holmes wrote:
>>>>>>>>>>>>>>>>>> On 16/03/2020 12:17 pm, David Holmes wrote:
>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I can't review this as I know nothing about the code, but I'm putting the patch through our internal testing.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sorry but the crashes still exist:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> #
>>>>>>>>>>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>>>>>>>>>>>> #
>>>>>>>>>>>>>>>>>> # SIGSEGV (0xb) at pc=0x00007fcdd403894e, pid=16948, tid=16949
>>>>>>>>>>>>>>>>>> #
>>>>>>>>>>>>>>>>>> # JRE version: Java(TM) SE Runtime Environment (15.0) (fastdebug build 15-internal+0-2020-03-16-0214474.david.holmes.jdk-dev)
>>>>>>>>>>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 15-internal+0-2020-03-16-0214474.david.holmes.jdk-dev, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
>>>>>>>>>>>>>>>>>> # Problematic frame:
>>>>>>>>>>>>>>>>>> # C [libsaproc.so+0x494e] DwarfParser::process_dwarf(unsigned long)+0x4e
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> in fact they seem worse as the test seems to always crash now.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Not worse - sorry. I see 6 failures out of 119 runs of the test in linux-x64. I don't see a pattern as to where it fails versus passes.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It doesn't fail for me locally.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 14/03/2020 11:35 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8240956
>>>>>>>>>>>>>>>>>>>> webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.00/
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> JDK-8234624 introduced DWARF parser in SA for unwinding native frames in jstack mixed mode.
>>>>>>>>>>>>>>>>>>>> However some error has seen intermittently after that.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I investigated the cause of this, I found two concerns:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> A: lack of buffer (.eh_frame section data) range check
>>>>>>>>>>>>>>>>>>>> B: Language personality routine and Language Specific Data Area (LSDA) are not considered
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I addd range check for .eh_frame processing, and ignore personality routine and LSDA in this webrev.
>>>>>>>>>>>>>>>>>>>> Also I added bailout code if DWARF processing is failed due to these concerns.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This change has passed all tests on submit repo (mach5-one-ysuenaga-JDK-8240956-20200313-1518-9434671),
>>>>>>>>>>>>>>>>>>>> also I tested it on my Fedora 31 box and Oracle Linux 7.7 container.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>
More information about the serviceability-dev
mailing list