PING: RFR: JDK-8151815: Could not parse core image with JSnap.
David Holmes
david.holmes at oracle.com
Wed Oct 18 12:34:12 UTC 2017
Just to clarify ...
On 18/10/2017 10:28 PM, David Holmes wrote:
> On 18/10/2017 8:26 PM, serguei.spitsyn at oracle.com wrote:
>> Hi David,
>>
>> Thank you for jumping to this review and helping Yasumasa to sort it out!
>> I've just discovered that this issue was already on the table for
>> several months without a significant progress.
>>
>>
>> On 10/18/17 02:48, David Holmes wrote:
>>> Hi Serguei
>>>
>>> On 18/10/2017 7:25 PM, serguei.spitsyn at oracle.com wrote:
>>>> Hi Yasumasa,
>>>>
>>>> Sorry for a quite late participation.
>>>>
>>>> I looked at the previous webrevs and think that this one is much
>>>> better.
>>>>
>>>> Some concern is if we need any kind of synchronization here, e.g. CAS.
>>>> But it depends on the PerfMemory class usage.
>>>>
>>>> Should we make the static variables '_initialized' and '_destroyed'
>>>> volatile?
>>>
>>> For good measure - yes.
>>>
>>>> Also, the '_initialized' is set to 1 with:
>>>> 159 OrderAccess::release_store(&_initialized, 1);
>>>>
>>>> Should we do the same to set the '_destroyed'?:
>>>> 200 _destroyed = true;
>>>
>>> There is a benign initialization race but we need the release_store
>>> to ensure all the data fields can be read if _initialized is seen as
>>> true. But what is missing is a load_acquire() in is_initialized() to
>>> ensure we synchronize with that store!
>>
>> Yes, I noticed that the load_acquire() is missed. :|
>>
>>>
>>> There is also a potential for a destruction race (if multiple aborts
>>> happens concurrently in different threads) but that also seems
>>> benign. In this case there is no data being set so the store to
>>> _destroyed does not need to be a release_store.
>>
>> I'm not convinced yet this is benign as the PerfMemory::destroy() has
>> this call:
>> 197 delete_memory_region();
>
> Yes though most of its work ends up being no-ops.
>
>>
>> Now, I started thinking about the asserts that call the is_useable().
>> Should they be returns instead?
>
> I think this is a somewhat confused chunk of code. It's only
> fractionally thread-safe yet once in use could be in use concurrently
> with an aborting thread that calls destroy(). I don't think there is any
> simple fix for this. If we're in the process of crashing does it really
> matter if we trigger a secondary crash due to this?
It doesn't matter if we do:
assert(is_usable(),...);
// continue
or
if (!is_usable()) return;
// continue
because as soon as we have checked is_usable() and abort happening in
another thread may have changed that by calling destroy.
This code is basically broken if we hit an abort path instead of a
normal VM shutdown.
David
-----
> The problems with this code go way beyond what Yasumasa is trying to
> address with the JSnap problem and I would not want to put it back on
> him to try and come up with an overall solution.
>
>> Then the is_destroyed() would better to have the load_acquire().
>
> You could add a load_acquire and do the store_release. It certainly
> would not hurt, but I don't think it would actually benefit anything
> either.
>
> Cheers,
> David
>
>> Just interested to know what do you think on this.
>>
>> Thanks,
>> Serguei
>>
>>>
>>> Cheers,
>>> David
>>>
>>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>>
>>>> On 10/18/17 00:39, Yasumasa Suenaga wrote:
>>>>> Hi David,
>>>>>
>>>>> Thank you for your comment.
>>>>> I uploaded new webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.07/
>>>>>
>>>>> Serguei, please comment about this :-)
>>>>>
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>
>>>>> 2017-10-18 16:09 GMT+09:00 David Holmes<david.holmes at oracle.com>:
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> On 18/10/2017 4:34 PM, Yasumasa Suenaga wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>>> I don't think we need the extra fields, just ensure the existing
>>>>>>>> ones
>>>>>>>> can't
>>>>>>>> be accessed (other than by the tools) after destroy is called.
>>>>>>>
>>>>>>> I've added PerfMemory::is_useable() to check whether we can
>>>>>>> access to
>>>>>>> PerfMemory.
>>>>>>> I think this webrev prevent to access to PerfMemory after
>>>>>>> destroy() call.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.06/
>>>>>>
>>>>>> This:
>>>>>>
>>>>>> 90 void PerfMemory::initialize() {
>>>>>> 91
>>>>>> 92 if (_prologue != NULL)
>>>>>> 93 // initialization already performed
>>>>>> 94 return;
>>>>>>
>>>>>> shouldn't check _prologue, but is_initialized().
>>>>>>
>>>>>> 213 assert(is_useable(), "called before initialization");
>>>>>>
>>>>>> -> "called before init or after destroy"
>>>>>>
>>>>>> Could add a similar assert in PerfMemory::mark_updated().
>>>>>>
>>>>>> Let's see what Serguei thinks. :)
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> 2017-10-18 13:44 GMT+09:00 David Holmes<david.holmes at oracle.com>:
>>>>>>>> On 18/10/2017 2:27 PM, Yasumasa Suenaga wrote:
>>>>>>>>>
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> 2017-10-18 12:55 GMT+09:00 David Holmes<david.holmes at oracle.com>:
>>>>>>>>>>
>>>>>>>>>> On 18/10/2017 12:37 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi David,
>>>>>>>>>>>
>>>>>>>>>>>> With your changes you no longer null out _prologue so the
>>>>>>>>>>>> assertion
>>>>>>>>>>>> would
>>>>>>>>>>>> now not fail and we'd proceed to access the deleted memory
>>>>>>>>>>>> region!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Linux, PerfMemory::delete_memory_region() does not call
>>>>>>>>>>> munmap()
>>>>>>>>>>> for PerfMemory.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Perhaps not but there are still other actions that happen and
>>>>>>>>>> the point
>>>>>>>>>> is
>>>>>>>>>> we should not be able to continue to use PerfMemory once it
>>>>>>>>>> has been
>>>>>>>>>> destroyed (even if the destruction is only logical).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I received same comment from Dmitry in the past, but we couldn't
>>>>>>>>> decide how should we do.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2016-May/019728.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In that discussion, I uploaded another webrev which adds other
>>>>>>>>> fields
>>>>>>>>> for
>>>>>>>>> JSnap.
>>>>>>>>> Is it suitable?
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.02/
>>>>>>>>
>>>>>>>>
>>>>>>>> I don't think we need the extra fields, just ensure the existing
>>>>>>>> ones
>>>>>>>> can't
>>>>>>>> be accessed (other than by the tools) after destroy is called.
>>>>>>>>
>>>>>>>>>>>> I'm unclear why you no longer clear all the fields set during
>>>>>>>>>>>> initialization?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> PerfMemory.java in jdk.hotspot.agent needs these field values.
>>>>>>>>>>> `jhsdb jsnap --core` is failed if they are cleared.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'm not familiar with these tools. When do we produce a core
>>>>>>>>>> file after
>>>>>>>>>> calling PerfMemory::destroy ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> PerfMemory::destroy() is called before aborting.
>>>>>>>>
>>>>>>>>
>>>>>>>> Ah - right. I assume we need to close off the perfdata file
>>>>>>>> before we
>>>>>>>> abort.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----------------------
>>>>>>>>> #0 perfMemory_exit ()
>>>>>>>>> at
>>>>>>>>>
>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/share/vm/runtime/perfMemory.cpp:80
>>>>>>>>>
>>>>>>>>> #1 0x00007f99b091c949 in os::shutdown ()
>>>>>>>>> at
>>>>>>>>>
>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1483
>>>>>>>>>
>>>>>>>>> #2 0x00007f99b091c980 in os::abort (dump_core=<optimized out>)
>>>>>>>>> at
>>>>>>>>>
>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1503
>>>>>>>>>
>>>>>>>>> #3 0x00007f99b0b689c3 in VMError::report_and_die (
>>>>>>>>> this=this at entry=0x7ffcacf40b50)
>>>>>>>>> at
>>>>>>>>>
>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060
>>>>>>>>>
>>>>>>>>> #4 0x00007f99b0926f04 in JVM_handle_linux_signal
>>>>>>>>> (sig=sig at entry=11,
>>>>>>>>> info=info at entry=0x7ffcacf40df0,
>>>>>>>>> ucVoid=ucVoid at entry=0x7ffcacf40cc0,
>>>>>>>>> abort_if_unrecognized=abort_if_unrecognized at entry=1)
>>>>>>>>> at
>>>>>>>>>
>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541
>>>>>>>>>
>>>>>>>>> -----------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>> But it seems to me that there are various checks of
>>>>>>>>>>>> _prologue that should really be checking is_initialized()
>>>>>>>>>>>> and/or
>>>>>>>>>>>> is_destroyed() as a guard.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Should I change all assertions for _prologue?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Assertions and direct guards. Checking _prologue is a
>>>>>>>>>> placeholder for
>>>>>>>>>> the
>>>>>>>>>> real check.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2017-10-18 10:53 GMT+09:00 David
>>>>>>>>>>> Holmes<david.holmes at oracle.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>
>>>>>>>>>>>> By chance we ran into this bug which I analysed yesterday:
>>>>>>>>>>>>
>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8189390
>>>>>>>>>>>>
>>>>>>>>>>>> We hit the assertion:
>>>>>>>>>>>>
>>>>>>>>>>>> # Internal Error
>>>>>>>>>>>> (/open/src/hotspot/share/runtime/perfMemory.cpp:216),
>>>>>>>>>>>> pid=17874, tid=17875
>>>>>>>>>>>> # assert(_prologue != __null) failed: called before
>>>>>>>>>>>> initialization
>>>>>>>>>>>> #
>>>>>>>>>>>>
>>>>>>>>>>>> which is misleading because it can fail if called before
>>>>>>>>>>>> initialization,
>>>>>>>>>>>> or
>>>>>>>>>>>> after PerfMemory::destroy has been called.
>>>>>>>>>>>>
>>>>>>>>>>>> With your changes you no longer null out _prologue so the
>>>>>>>>>>>> assertion
>>>>>>>>>>>> would
>>>>>>>>>>>> now not fail and we'd proceed to access the deleted memory
>>>>>>>>>>>> region!
>>>>>>>>>>>>
>>>>>>>>>>>> I'm unclear why you no longer clear all the fields set during
>>>>>>>>>>>> initialization? But it seems to me that there are various
>>>>>>>>>>>> checks of
>>>>>>>>>>>> _prologue that should really be checking is_initialized()
>>>>>>>>>>>> and/or
>>>>>>>>>>>> is_destroyed() as a guard.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 16/10/2017 11:25 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> PING:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could you review it?
>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.05/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2017/10/03 13:18, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I added gtest unit test case for this change in new webrev:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.05/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could you review it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2017-09-27 0:01 GMT+09:00 Yasumasa
>>>>>>>>>>>>>> Suenaga<yasuenag at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I uploaded new webrev to be adapted to jdk10/hs:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.04/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2017/09/21 7:45, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> PING:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Have you checked this issue?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.03/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2017/07/01 23:43, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> PING:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Have you checked this issue?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 2017/06/13 14:10, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I want to discuss about JDK-8151815: Could not parse
>>>>>>>>>>>>>>>>>> core image
>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> JSnap.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In last year, I found JSnap cannot parse coredump and
>>>>>>>>>>>>>>>>>> I've sent
>>>>>>>>>>>>>>>>>> review
>>>>>>>>>>>>>>>>>> request for it as JDK-8151815. However it has not been
>>>>>>>>>>>>>>>>>> reviewed
>>>>>>>>>>>>>>>>>> yet
>>>>>>>>>>>>>>>>>> [1].
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We've discussed about safety implementation, but we
>>>>>>>>>>>>>>>>>> could not
>>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>> consensus.
>>>>>>>>>>>>>>>>>> IMHO all SA tools should be handled java processes and
>>>>>>>>>>>>>>>>>> core
>>>>>>>>>>>>>>>>>> images,
>>>>>>>>>>>>>>>>>> and PerfCounter value is useful. So I fix this issue.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I uploaded new webrev for this issue. I think this
>>>>>>>>>>>>>>>>>> patch is
>>>>>>>>>>>>>>>>>> safety
>>>>>>>>>>>>>>>>>> because new flag PerfMemory::_destroyed guards double
>>>>>>>>>>>>>>>>>> free, and
>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>> members in PerfMemory is accessible (they are not
>>>>>>>>>>>>>>>>>> munmap'ed)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.03/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Can you cooperate?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2016-April/019480.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>
>>
More information about the serviceability-dev
mailing list