PING: RFR: JDK-8151815: Could not parse core image with JSnap.
David Holmes
david.holmes at oracle.com
Wed Oct 18 12:28:00 UTC 2017
On 18/10/2017 8:26 PM, serguei.spitsyn at oracle.com wrote:
> Hi David,
>
> Thank you for jumping to this review and helping Yasumasa to sort it out!
> I've just discovered that this issue was already on the table for
> several months without a significant progress.
>
>
> On 10/18/17 02:48, David Holmes wrote:
>> Hi Serguei
>>
>> On 18/10/2017 7:25 PM, serguei.spitsyn at oracle.com wrote:
>>> Hi Yasumasa,
>>>
>>> Sorry for a quite late participation.
>>>
>>> I looked at the previous webrevs and think that this one is much better.
>>>
>>> Some concern is if we need any kind of synchronization here, e.g. CAS.
>>> But it depends on the PerfMemory class usage.
>>>
>>> Should we make the static variables '_initialized' and '_destroyed'
>>> volatile?
>>
>> For good measure - yes.
>>
>>> Also, the '_initialized' is set to 1 with:
>>> 159 OrderAccess::release_store(&_initialized, 1);
>>>
>>> Should we do the same to set the '_destroyed'?:
>>> 200 _destroyed = true;
>>
>> There is a benign initialization race but we need the release_store to
>> ensure all the data fields can be read if _initialized is seen as
>> true. But what is missing is a load_acquire() in is_initialized() to
>> ensure we synchronize with that store!
>
> Yes, I noticed that the load_acquire() is missed. :|
>
>>
>> There is also a potential for a destruction race (if multiple aborts
>> happens concurrently in different threads) but that also seems benign.
>> In this case there is no data being set so the store to _destroyed
>> does not need to be a release_store.
>
> I'm not convinced yet this is benign as the PerfMemory::destroy() has
> this call:
> 197 delete_memory_region();
Yes though most of its work ends up being no-ops.
>
> Now, I started thinking about the asserts that call the is_useable().
> Should they be returns instead?
I think this is a somewhat confused chunk of code. It's only
fractionally thread-safe yet once in use could be in use concurrently
with an aborting thread that calls destroy(). I don't think there is any
simple fix for this. If we're in the process of crashing does it really
matter if we trigger a secondary crash due to this?
The problems with this code go way beyond what Yasumasa is trying to
address with the JSnap problem and I would not want to put it back on
him to try and come up with an overall solution.
> Then the is_destroyed() would better to have the load_acquire().
You could add a load_acquire and do the store_release. It certainly
would not hurt, but I don't think it would actually benefit anything either.
Cheers,
David
> Just interested to know what do you think on this.
>
> Thanks,
> Serguei
>
>>
>> Cheers,
>> David
>>
>>>
>>> Thanks,
>>> Serguei
>>>
>>>
>>> On 10/18/17 00:39, Yasumasa Suenaga wrote:
>>>> Hi David,
>>>>
>>>> Thank you for your comment.
>>>> I uploaded new webrev:
>>>>
>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.07/
>>>>
>>>> Serguei, please comment about this :-)
>>>>
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>>
>>>> 2017-10-18 16:09 GMT+09:00 David Holmes<david.holmes at oracle.com>:
>>>>> Hi Yasumasa,
>>>>>
>>>>> On 18/10/2017 4:34 PM, Yasumasa Suenaga wrote:
>>>>>> Hi David,
>>>>>>
>>>>>>> I don't think we need the extra fields, just ensure the existing
>>>>>>> ones
>>>>>>> can't
>>>>>>> be accessed (other than by the tools) after destroy is called.
>>>>>>
>>>>>> I've added PerfMemory::is_useable() to check whether we can access to
>>>>>> PerfMemory.
>>>>>> I think this webrev prevent to access to PerfMemory after
>>>>>> destroy() call.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.06/
>>>>>
>>>>> This:
>>>>>
>>>>> 90 void PerfMemory::initialize() {
>>>>> 91
>>>>> 92 if (_prologue != NULL)
>>>>> 93 // initialization already performed
>>>>> 94 return;
>>>>>
>>>>> shouldn't check _prologue, but is_initialized().
>>>>>
>>>>> 213 assert(is_useable(), "called before initialization");
>>>>>
>>>>> -> "called before init or after destroy"
>>>>>
>>>>> Could add a similar assert in PerfMemory::mark_updated().
>>>>>
>>>>> Let's see what Serguei thinks. :)
>>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> 2017-10-18 13:44 GMT+09:00 David Holmes<david.holmes at oracle.com>:
>>>>>>> On 18/10/2017 2:27 PM, Yasumasa Suenaga wrote:
>>>>>>>>
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> 2017-10-18 12:55 GMT+09:00 David Holmes<david.holmes at oracle.com>:
>>>>>>>>>
>>>>>>>>> On 18/10/2017 12:37 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi David,
>>>>>>>>>>
>>>>>>>>>>> With your changes you no longer null out _prologue so the
>>>>>>>>>>> assertion
>>>>>>>>>>> would
>>>>>>>>>>> now not fail and we'd proceed to access the deleted memory
>>>>>>>>>>> region!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Linux, PerfMemory::delete_memory_region() does not call
>>>>>>>>>> munmap()
>>>>>>>>>> for PerfMemory.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Perhaps not but there are still other actions that happen and
>>>>>>>>> the point
>>>>>>>>> is
>>>>>>>>> we should not be able to continue to use PerfMemory once it has
>>>>>>>>> been
>>>>>>>>> destroyed (even if the destruction is only logical).
>>>>>>>>
>>>>>>>>
>>>>>>>> I received same comment from Dmitry in the past, but we couldn't
>>>>>>>> decide how should we do.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2016-May/019728.html
>>>>>>>>
>>>>>>>>
>>>>>>>> In that discussion, I uploaded another webrev which adds other
>>>>>>>> fields
>>>>>>>> for
>>>>>>>> JSnap.
>>>>>>>> Is it suitable?
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.02/
>>>>>>>
>>>>>>>
>>>>>>> I don't think we need the extra fields, just ensure the existing
>>>>>>> ones
>>>>>>> can't
>>>>>>> be accessed (other than by the tools) after destroy is called.
>>>>>>>
>>>>>>>>>>> I'm unclear why you no longer clear all the fields set during
>>>>>>>>>>> initialization?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> PerfMemory.java in jdk.hotspot.agent needs these field values.
>>>>>>>>>> `jhsdb jsnap --core` is failed if they are cleared.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm not familiar with these tools. When do we produce a core
>>>>>>>>> file after
>>>>>>>>> calling PerfMemory::destroy ?
>>>>>>>>
>>>>>>>>
>>>>>>>> PerfMemory::destroy() is called before aborting.
>>>>>>>
>>>>>>>
>>>>>>> Ah - right. I assume we need to close off the perfdata file
>>>>>>> before we
>>>>>>> abort.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>>> -----------------------
>>>>>>>> #0 perfMemory_exit ()
>>>>>>>> at
>>>>>>>>
>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/share/vm/runtime/perfMemory.cpp:80
>>>>>>>>
>>>>>>>> #1 0x00007f99b091c949 in os::shutdown ()
>>>>>>>> at
>>>>>>>>
>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1483
>>>>>>>>
>>>>>>>> #2 0x00007f99b091c980 in os::abort (dump_core=<optimized out>)
>>>>>>>> at
>>>>>>>>
>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1503
>>>>>>>>
>>>>>>>> #3 0x00007f99b0b689c3 in VMError::report_and_die (
>>>>>>>> this=this at entry=0x7ffcacf40b50)
>>>>>>>> at
>>>>>>>>
>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060
>>>>>>>>
>>>>>>>> #4 0x00007f99b0926f04 in JVM_handle_linux_signal
>>>>>>>> (sig=sig at entry=11,
>>>>>>>> info=info at entry=0x7ffcacf40df0,
>>>>>>>> ucVoid=ucVoid at entry=0x7ffcacf40cc0,
>>>>>>>> abort_if_unrecognized=abort_if_unrecognized at entry=1)
>>>>>>>> at
>>>>>>>>
>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541
>>>>>>>>
>>>>>>>> -----------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>>>>> But it seems to me that there are various checks of
>>>>>>>>>>> _prologue that should really be checking is_initialized() and/or
>>>>>>>>>>> is_destroyed() as a guard.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Should I change all assertions for _prologue?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Assertions and direct guards. Checking _prologue is a
>>>>>>>>> placeholder for
>>>>>>>>> the
>>>>>>>>> real check.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2017-10-18 10:53 GMT+09:00 David Holmes<david.holmes at oracle.com>:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>
>>>>>>>>>>> By chance we ran into this bug which I analysed yesterday:
>>>>>>>>>>>
>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8189390
>>>>>>>>>>>
>>>>>>>>>>> We hit the assertion:
>>>>>>>>>>>
>>>>>>>>>>> # Internal Error
>>>>>>>>>>> (/open/src/hotspot/share/runtime/perfMemory.cpp:216),
>>>>>>>>>>> pid=17874, tid=17875
>>>>>>>>>>> # assert(_prologue != __null) failed: called before
>>>>>>>>>>> initialization
>>>>>>>>>>> #
>>>>>>>>>>>
>>>>>>>>>>> which is misleading because it can fail if called before
>>>>>>>>>>> initialization,
>>>>>>>>>>> or
>>>>>>>>>>> after PerfMemory::destroy has been called.
>>>>>>>>>>>
>>>>>>>>>>> With your changes you no longer null out _prologue so the
>>>>>>>>>>> assertion
>>>>>>>>>>> would
>>>>>>>>>>> now not fail and we'd proceed to access the deleted memory
>>>>>>>>>>> region!
>>>>>>>>>>>
>>>>>>>>>>> I'm unclear why you no longer clear all the fields set during
>>>>>>>>>>> initialization? But it seems to me that there are various
>>>>>>>>>>> checks of
>>>>>>>>>>> _prologue that should really be checking is_initialized() and/or
>>>>>>>>>>> is_destroyed() as a guard.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 16/10/2017 11:25 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> PING:
>>>>>>>>>>>>
>>>>>>>>>>>> Could you review it?
>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.05/
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2017/10/03 13:18, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I added gtest unit test case for this change in new webrev:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.05/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could you review it?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2017-09-27 0:01 GMT+09:00 Yasumasa
>>>>>>>>>>>>> Suenaga<yasuenag at gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I uploaded new webrev to be adapted to jdk10/hs:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.04/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2017/09/21 7:45, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> PING:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Have you checked this issue?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.03/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2017/07/01 23:43, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> PING:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Have you checked this issue?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2017/06/13 14:10, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I want to discuss about JDK-8151815: Could not parse
>>>>>>>>>>>>>>>>> core image
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> JSnap.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In last year, I found JSnap cannot parse coredump and
>>>>>>>>>>>>>>>>> I've sent
>>>>>>>>>>>>>>>>> review
>>>>>>>>>>>>>>>>> request for it as JDK-8151815. However it has not been
>>>>>>>>>>>>>>>>> reviewed
>>>>>>>>>>>>>>>>> yet
>>>>>>>>>>>>>>>>> [1].
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We've discussed about safety implementation, but we
>>>>>>>>>>>>>>>>> could not
>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>> consensus.
>>>>>>>>>>>>>>>>> IMHO all SA tools should be handled java processes and
>>>>>>>>>>>>>>>>> core
>>>>>>>>>>>>>>>>> images,
>>>>>>>>>>>>>>>>> and PerfCounter value is useful. So I fix this issue.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I uploaded new webrev for this issue. I think this
>>>>>>>>>>>>>>>>> patch is
>>>>>>>>>>>>>>>>> safety
>>>>>>>>>>>>>>>>> because new flag PerfMemory::_destroyed guards double
>>>>>>>>>>>>>>>>> free, and
>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>> members in PerfMemory is accessible (they are not
>>>>>>>>>>>>>>>>> munmap'ed)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.03/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can you cooperate?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2016-April/019480.html
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>
>
More information about the serviceability-dev
mailing list