PING: RFR: JDK-8151815: Could not parse core image with JSnap.
serguei.spitsyn at oracle.com
serguei.spitsyn at oracle.com
Wed Oct 18 18:17:30 UTC 2017
On 10/18/17 05:34, David Holmes wrote:
> Just to clarify ...
>
> On 18/10/2017 10:28 PM, David Holmes wrote:
>> On 18/10/2017 8:26 PM, serguei.spitsyn at oracle.com wrote:
>>> Hi David,
>>>
>>> Thank you for jumping to this review and helping Yasumasa to sort it
>>> out!
>>> I've just discovered that this issue was already on the table for
>>> several months without a significant progress.
>>>
>>>
>>> On 10/18/17 02:48, David Holmes wrote:
>>>> Hi Serguei
>>>>
>>>> On 18/10/2017 7:25 PM, serguei.spitsyn at oracle.com wrote:
>>>>> Hi Yasumasa,
>>>>>
>>>>> Sorry for a quite late participation.
>>>>>
>>>>> I looked at the previous webrevs and think that this one is much
>>>>> better.
>>>>>
>>>>> Some concern is if we need any kind of synchronization here, e.g.
>>>>> CAS.
>>>>> But it depends on the PerfMemory class usage.
>>>>>
>>>>> Should we make the static variables '_initialized' and
>>>>> '_destroyed' volatile?
>>>>
>>>> For good measure - yes.
>>>>
>>>>> Also, the '_initialized' is set to 1 with:
>>>>> 159 OrderAccess::release_store(&_initialized, 1);
>>>>>
>>>>> Should we do the same to set the '_destroyed'?:
>>>>> 200 _destroyed = true;
>>>>
>>>> There is a benign initialization race but we need the release_store
>>>> to ensure all the data fields can be read if _initialized is seen
>>>> as true. But what is missing is a load_acquire() in
>>>> is_initialized() to ensure we synchronize with that store!
>>>
>>> Yes, I noticed that the load_acquire() is missed. :|
>>>
>>>>
>>>> There is also a potential for a destruction race (if multiple
>>>> aborts happens concurrently in different threads) but that also
>>>> seems benign. In this case there is no data being set so the store
>>>> to _destroyed does not need to be a release_store.
>>>
>>> I'm not convinced yet this is benign as the PerfMemory::destroy()
>>> has this call:
>>> 197 delete_memory_region();
>>
>> Yes though most of its work ends up being no-ops.
>>
>>>
>>> Now, I started thinking about the asserts that call the is_useable().
>>> Should they be returns instead?
>>
>> I think this is a somewhat confused chunk of code. It's only
>> fractionally thread-safe yet once in use could be in use concurrently
>> with an aborting thread that calls destroy(). I don't think there is
>> any simple fix for this. If we're in the process of crashing does it
>> really matter if we trigger a secondary crash due to this?
>
> It doesn't matter if we do:
>
> assert(is_usable(),...);
> // continue
>
> or
>
> if (!is_usable()) return;
> // continue
>
> because as soon as we have checked is_usable() and abort happening in
> another thread may have changed that by calling destroy.
>
> This code is basically broken if we hit an abort path instead of a
> normal VM shutdown.
Agreed.
Thanks,
Serguei
>
> David
> -----
>
>> The problems with this code go way beyond what Yasumasa is trying to
>> address with the JSnap problem and I would not want to put it back on
>> him to try and come up with an overall solution.
>>
>>> Then the is_destroyed() would better to have the load_acquire().
>>
>> You could add a load_acquire and do the store_release. It certainly
>> would not hurt, but I don't think it would actually benefit anything
>> either.
>>
>> Cheers,
>> David
>>
>>> Just interested to know what do you think on this.
>>>
>>> Thanks,
>>> Serguei
>>>
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>
>>>>>
>>>>> On 10/18/17 00:39, Yasumasa Suenaga wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> Thank you for your comment.
>>>>>> I uploaded new webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.07/
>>>>>>
>>>>>> Serguei, please comment about this :-)
>>>>>>
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2017-10-18 16:09 GMT+09:00 David Holmes<david.holmes at oracle.com>:
>>>>>>> Hi Yasumasa,
>>>>>>>
>>>>>>> On 18/10/2017 4:34 PM, Yasumasa Suenaga wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>>> I don't think we need the extra fields, just ensure the
>>>>>>>>> existing ones
>>>>>>>>> can't
>>>>>>>>> be accessed (other than by the tools) after destroy is called.
>>>>>>>>
>>>>>>>> I've added PerfMemory::is_useable() to check whether we can
>>>>>>>> access to
>>>>>>>> PerfMemory.
>>>>>>>> I think this webrev prevent to access to PerfMemory after
>>>>>>>> destroy() call.
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.06/
>>>>>>>
>>>>>>> This:
>>>>>>>
>>>>>>> 90 void PerfMemory::initialize() {
>>>>>>> 91
>>>>>>> 92 if (_prologue != NULL)
>>>>>>> 93 // initialization already performed
>>>>>>> 94 return;
>>>>>>>
>>>>>>> shouldn't check _prologue, but is_initialized().
>>>>>>>
>>>>>>> 213 assert(is_useable(), "called before initialization");
>>>>>>>
>>>>>>> -> "called before init or after destroy"
>>>>>>>
>>>>>>> Could add a similar assert in PerfMemory::mark_updated().
>>>>>>>
>>>>>>> Let's see what Serguei thinks. :)
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>> 2017-10-18 13:44 GMT+09:00 David Holmes<david.holmes at oracle.com>:
>>>>>>>>> On 18/10/2017 2:27 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>
>>>>>>>>>> Hi David,
>>>>>>>>>>
>>>>>>>>>> 2017-10-18 12:55 GMT+09:00 David
>>>>>>>>>> Holmes<david.holmes at oracle.com>:
>>>>>>>>>>>
>>>>>>>>>>> On 18/10/2017 12:37 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>
>>>>>>>>>>>>> With your changes you no longer null out _prologue so the
>>>>>>>>>>>>> assertion
>>>>>>>>>>>>> would
>>>>>>>>>>>>> now not fail and we'd proceed to access the deleted memory
>>>>>>>>>>>>> region!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Linux, PerfMemory::delete_memory_region() does not call
>>>>>>>>>>>> munmap()
>>>>>>>>>>>> for PerfMemory.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Perhaps not but there are still other actions that happen
>>>>>>>>>>> and the point
>>>>>>>>>>> is
>>>>>>>>>>> we should not be able to continue to use PerfMemory once it
>>>>>>>>>>> has been
>>>>>>>>>>> destroyed (even if the destruction is only logical).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I received same comment from Dmitry in the past, but we couldn't
>>>>>>>>>> decide how should we do.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2016-May/019728.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In that discussion, I uploaded another webrev which adds
>>>>>>>>>> other fields
>>>>>>>>>> for
>>>>>>>>>> JSnap.
>>>>>>>>>> Is it suitable?
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.02/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't think we need the extra fields, just ensure the
>>>>>>>>> existing ones
>>>>>>>>> can't
>>>>>>>>> be accessed (other than by the tools) after destroy is called.
>>>>>>>>>
>>>>>>>>>>>>> I'm unclear why you no longer clear all the fields set during
>>>>>>>>>>>>> initialization?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> PerfMemory.java in jdk.hotspot.agent needs these field values.
>>>>>>>>>>>> `jhsdb jsnap --core` is failed if they are cleared.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm not familiar with these tools. When do we produce a core
>>>>>>>>>>> file after
>>>>>>>>>>> calling PerfMemory::destroy ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> PerfMemory::destroy() is called before aborting.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ah - right. I assume we need to close off the perfdata file
>>>>>>>>> before we
>>>>>>>>> abort.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----------------------
>>>>>>>>>> #0 perfMemory_exit ()
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/share/vm/runtime/perfMemory.cpp:80
>>>>>>>>>>
>>>>>>>>>> #1 0x00007f99b091c949 in os::shutdown ()
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1483
>>>>>>>>>>
>>>>>>>>>> #2 0x00007f99b091c980 in os::abort (dump_core=<optimized out>)
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1503
>>>>>>>>>>
>>>>>>>>>> #3 0x00007f99b0b689c3 in VMError::report_and_die (
>>>>>>>>>> this=this at entry=0x7ffcacf40b50)
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060
>>>>>>>>>>
>>>>>>>>>> #4 0x00007f99b0926f04 in JVM_handle_linux_signal
>>>>>>>>>> (sig=sig at entry=11,
>>>>>>>>>> info=info at entry=0x7ffcacf40df0,
>>>>>>>>>> ucVoid=ucVoid at entry=0x7ffcacf40cc0,
>>>>>>>>>> abort_if_unrecognized=abort_if_unrecognized at entry=1)
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541
>>>>>>>>>>
>>>>>>>>>> -----------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> But it seems to me that there are various checks of
>>>>>>>>>>>>> _prologue that should really be checking is_initialized()
>>>>>>>>>>>>> and/or
>>>>>>>>>>>>> is_destroyed() as a guard.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Should I change all assertions for _prologue?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Assertions and direct guards. Checking _prologue is a
>>>>>>>>>>> placeholder for
>>>>>>>>>>> the
>>>>>>>>>>> real check.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2017-10-18 10:53 GMT+09:00 David
>>>>>>>>>>>> Holmes<david.holmes at oracle.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>
>>>>>>>>>>>>> By chance we ran into this bug which I analysed yesterday:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8189390
>>>>>>>>>>>>>
>>>>>>>>>>>>> We hit the assertion:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # Internal Error
>>>>>>>>>>>>> (/open/src/hotspot/share/runtime/perfMemory.cpp:216),
>>>>>>>>>>>>> pid=17874, tid=17875
>>>>>>>>>>>>> # assert(_prologue != __null) failed: called before
>>>>>>>>>>>>> initialization
>>>>>>>>>>>>> #
>>>>>>>>>>>>>
>>>>>>>>>>>>> which is misleading because it can fail if called before
>>>>>>>>>>>>> initialization,
>>>>>>>>>>>>> or
>>>>>>>>>>>>> after PerfMemory::destroy has been called.
>>>>>>>>>>>>>
>>>>>>>>>>>>> With your changes you no longer null out _prologue so the
>>>>>>>>>>>>> assertion
>>>>>>>>>>>>> would
>>>>>>>>>>>>> now not fail and we'd proceed to access the deleted memory
>>>>>>>>>>>>> region!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm unclear why you no longer clear all the fields set during
>>>>>>>>>>>>> initialization? But it seems to me that there are various
>>>>>>>>>>>>> checks of
>>>>>>>>>>>>> _prologue that should really be checking is_initialized()
>>>>>>>>>>>>> and/or
>>>>>>>>>>>>> is_destroyed() as a guard.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 16/10/2017 11:25 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PING:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could you review it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.05/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2017/10/03 13:18, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I added gtest unit test case for this change in new webrev:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.05/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Could you review it?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2017-09-27 0:01 GMT+09:00 Yasumasa
>>>>>>>>>>>>>>> Suenaga<yasuenag at gmail.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I uploaded new webrev to be adapted to jdk10/hs:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.04/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2017/09/21 7:45, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> PING:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Have you checked this issue?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.03/
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 2017/07/01 23:43, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> PING:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Have you checked this issue?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 2017/06/13 14:10, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I want to discuss about JDK-8151815: Could not parse
>>>>>>>>>>>>>>>>>>> core image
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> JSnap.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In last year, I found JSnap cannot parse coredump
>>>>>>>>>>>>>>>>>>> and I've sent
>>>>>>>>>>>>>>>>>>> review
>>>>>>>>>>>>>>>>>>> request for it as JDK-8151815. However it has not
>>>>>>>>>>>>>>>>>>> been reviewed
>>>>>>>>>>>>>>>>>>> yet
>>>>>>>>>>>>>>>>>>> [1].
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We've discussed about safety implementation, but we
>>>>>>>>>>>>>>>>>>> could not
>>>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>> consensus.
>>>>>>>>>>>>>>>>>>> IMHO all SA tools should be handled java processes
>>>>>>>>>>>>>>>>>>> and core
>>>>>>>>>>>>>>>>>>> images,
>>>>>>>>>>>>>>>>>>> and PerfCounter value is useful. So I fix this issue.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I uploaded new webrev for this issue. I think this
>>>>>>>>>>>>>>>>>>> patch is
>>>>>>>>>>>>>>>>>>> safety
>>>>>>>>>>>>>>>>>>> because new flag PerfMemory::_destroyed guards
>>>>>>>>>>>>>>>>>>> double free, and
>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>> members in PerfMemory is accessible (they are not
>>>>>>>>>>>>>>>>>>> munmap'ed)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.03/
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Can you cooperate?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2016-April/019480.html
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>
>>>
More information about the serviceability-dev
mailing list