PING: RFR: JDK-8151815: Could not parse core image with JSnap.

David Holmes david.holmes at oracle.com
Wed Oct 18 12:34:12 UTC 2017


Just to clarify ...

On 18/10/2017 10:28 PM, David Holmes wrote:
> On 18/10/2017 8:26 PM, serguei.spitsyn at oracle.com wrote:
>> Hi David,
>>
>> Thank you for jumping to this review and helping Yasumasa to sort it out!
>> I've just discovered that this issue was already on the table for 
>> several months without a significant progress.
>>
>>
>> On 10/18/17 02:48, David Holmes wrote:
>>> Hi Serguei
>>>
>>> On 18/10/2017 7:25 PM, serguei.spitsyn at oracle.com wrote:
>>>> Hi Yasumasa,
>>>>
>>>> Sorry for a quite late participation.
>>>>
>>>> I looked at the previous webrevs and think that this one is much 
>>>> better.
>>>>
>>>> Some concern is if we need any kind of synchronization here, e.g. CAS.
>>>> But it depends on the PerfMemory class usage.
>>>>
>>>> Should we make the static variables '_initialized' and '_destroyed' 
>>>> volatile?
>>>
>>> For good measure - yes.
>>>
>>>> Also, the '_initialized' is set to 1 with:
>>>>     159    OrderAccess::release_store(&_initialized, 1);
>>>>
>>>> Should we do the same to set the '_destroyed'?:
>>>> 200 _destroyed = true;
>>>
>>> There is a benign initialization race but we need the release_store 
>>> to ensure all the data fields can be read if _initialized is seen as 
>>> true. But what is missing is a load_acquire() in is_initialized() to 
>>> ensure we synchronize with that store!
>>
>> Yes, I noticed that the load_acquire() is missed. :|
>>
>>>
>>> There is also a potential for a destruction race (if multiple aborts 
>>> happens concurrently in different threads) but that also seems 
>>> benign. In this case there is no data being set so the store to 
>>> _destroyed does not need to be a release_store.
>>
>> I'm not convinced yet this is benign as the PerfMemory::destroy() has 
>> this call:
>>    197 delete_memory_region();
> 
> Yes though most of its work ends up being no-ops.
> 
>>
>> Now, I started thinking about the asserts that call the is_useable().
>> Should they be returns instead?
> 
> I think this is a somewhat confused chunk of code. It's only 
> fractionally thread-safe yet once in use could be in use concurrently 
> with an aborting thread that calls destroy(). I don't think there is any 
> simple fix for this. If we're in the process of crashing does it really 
> matter if we trigger a secondary crash due to this?

It doesn't matter if we do:

assert(is_usable(),...);
// continue

or

if (!is_usable()) return;
// continue

because as soon as we have checked is_usable() and abort happening in 
another thread may have changed that by calling destroy.

This code is basically broken if we hit an abort path instead of a 
normal VM shutdown.

David
-----

> The problems with this code go way beyond what Yasumasa is trying to 
> address with the JSnap problem and I would not want to put it back on 
> him to try and come up with an overall solution.
> 
>> Then the is_destroyed() would better to have the load_acquire().
> 
> You could add a load_acquire and do the store_release. It certainly 
> would not hurt, but I don't think it would actually benefit anything 
> either.
> 
> Cheers,
> David
> 
>> Just interested to know what do you think on this.
>>
>> Thanks,
>> Serguei
>>
>>>
>>> Cheers,
>>> David
>>>
>>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>>
>>>> On 10/18/17 00:39, Yasumasa Suenaga wrote:
>>>>> Hi David,
>>>>>
>>>>> Thank you for your comment.
>>>>> I uploaded new webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.07/
>>>>>
>>>>> Serguei, please comment about this :-)
>>>>>
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>
>>>>> 2017-10-18 16:09 GMT+09:00 David Holmes<david.holmes at oracle.com>:
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> On 18/10/2017 4:34 PM, Yasumasa Suenaga wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>>> I don't think we need the extra fields, just ensure the existing 
>>>>>>>> ones
>>>>>>>> can't
>>>>>>>> be accessed (other than by the tools) after destroy is called.
>>>>>>>
>>>>>>> I've added PerfMemory::is_useable() to check whether we can 
>>>>>>> access to
>>>>>>> PerfMemory.
>>>>>>> I think this webrev prevent to access to PerfMemory after 
>>>>>>> destroy() call.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.06/
>>>>>>
>>>>>> This:
>>>>>>
>>>>>>    90 void PerfMemory::initialize() {
>>>>>>    91
>>>>>>    92   if (_prologue != NULL)
>>>>>>    93     // initialization already performed
>>>>>>    94     return;
>>>>>>
>>>>>> shouldn't check _prologue, but is_initialized().
>>>>>>
>>>>>>   213   assert(is_useable(), "called before initialization");
>>>>>>
>>>>>> -> "called before init or after destroy"
>>>>>>
>>>>>> Could add a similar assert in PerfMemory::mark_updated().
>>>>>>
>>>>>> Let's see what Serguei thinks. :)
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> 2017-10-18 13:44 GMT+09:00 David Holmes<david.holmes at oracle.com>:
>>>>>>>> On 18/10/2017 2:27 PM, Yasumasa Suenaga wrote:
>>>>>>>>>
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> 2017-10-18 12:55 GMT+09:00 David Holmes<david.holmes at oracle.com>:
>>>>>>>>>>
>>>>>>>>>> On 18/10/2017 12:37 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi David,
>>>>>>>>>>>
>>>>>>>>>>>> With your changes you no longer null out _prologue so the 
>>>>>>>>>>>> assertion
>>>>>>>>>>>> would
>>>>>>>>>>>> now not fail and we'd proceed to access the deleted memory 
>>>>>>>>>>>> region!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Linux, PerfMemory::delete_memory_region() does not call 
>>>>>>>>>>> munmap()
>>>>>>>>>>> for PerfMemory.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Perhaps not but there are still other actions that happen and 
>>>>>>>>>> the point
>>>>>>>>>> is
>>>>>>>>>> we should not be able to continue to use PerfMemory once it 
>>>>>>>>>> has been
>>>>>>>>>> destroyed (even if the destruction is only logical).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I received same comment from Dmitry in the past, but we couldn't
>>>>>>>>> decide how should we do.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2016-May/019728.html 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In that discussion, I uploaded another webrev which adds other 
>>>>>>>>> fields
>>>>>>>>> for
>>>>>>>>> JSnap.
>>>>>>>>> Is it suitable?
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.02/
>>>>>>>>
>>>>>>>>
>>>>>>>> I don't think we need the extra fields, just ensure the existing 
>>>>>>>> ones
>>>>>>>> can't
>>>>>>>> be accessed (other than by the tools) after destroy is called.
>>>>>>>>
>>>>>>>>>>>> I'm unclear why you no longer clear all the fields set during
>>>>>>>>>>>> initialization?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> PerfMemory.java in jdk.hotspot.agent needs these field values.
>>>>>>>>>>> `jhsdb jsnap --core` is failed if they are cleared.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'm not familiar with these tools. When do we produce a core 
>>>>>>>>>> file after
>>>>>>>>>> calling PerfMemory::destroy ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> PerfMemory::destroy() is called before aborting.
>>>>>>>>
>>>>>>>>
>>>>>>>> Ah - right. I assume we need to close off the perfdata file 
>>>>>>>> before we
>>>>>>>> abort.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----------------------
>>>>>>>>> #0  perfMemory_exit ()
>>>>>>>>>        at
>>>>>>>>>
>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/share/vm/runtime/perfMemory.cpp:80 
>>>>>>>>>
>>>>>>>>> #1  0x00007f99b091c949 in os::shutdown ()
>>>>>>>>>        at
>>>>>>>>>
>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1483 
>>>>>>>>>
>>>>>>>>> #2  0x00007f99b091c980 in os::abort (dump_core=<optimized out>)
>>>>>>>>>        at
>>>>>>>>>
>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1503 
>>>>>>>>>
>>>>>>>>> #3  0x00007f99b0b689c3 in VMError::report_and_die (
>>>>>>>>>        this=this at entry=0x7ffcacf40b50)
>>>>>>>>>        at
>>>>>>>>>
>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060 
>>>>>>>>>
>>>>>>>>> #4  0x00007f99b0926f04 in JVM_handle_linux_signal 
>>>>>>>>> (sig=sig at entry=11,
>>>>>>>>>        info=info at entry=0x7ffcacf40df0,
>>>>>>>>> ucVoid=ucVoid at entry=0x7ffcacf40cc0,
>>>>>>>>> abort_if_unrecognized=abort_if_unrecognized at entry=1)
>>>>>>>>>        at
>>>>>>>>>
>>>>>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.144-7.b01.fc26.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541 
>>>>>>>>>
>>>>>>>>> -----------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>> But it seems to me that there are various checks of
>>>>>>>>>>>> _prologue that should really be checking is_initialized() 
>>>>>>>>>>>> and/or
>>>>>>>>>>>> is_destroyed() as a guard.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Should I change all assertions for _prologue?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Assertions and direct guards. Checking _prologue is a 
>>>>>>>>>> placeholder for
>>>>>>>>>> the
>>>>>>>>>> real check.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2017-10-18 10:53 GMT+09:00 David 
>>>>>>>>>>> Holmes<david.holmes at oracle.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>
>>>>>>>>>>>> By chance we ran into this bug which I analysed yesterday:
>>>>>>>>>>>>
>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8189390
>>>>>>>>>>>>
>>>>>>>>>>>> We hit the assertion:
>>>>>>>>>>>>
>>>>>>>>>>>> #  Internal Error
>>>>>>>>>>>> (/open/src/hotspot/share/runtime/perfMemory.cpp:216),
>>>>>>>>>>>> pid=17874, tid=17875
>>>>>>>>>>>> #  assert(_prologue != __null) failed: called before 
>>>>>>>>>>>> initialization
>>>>>>>>>>>> #
>>>>>>>>>>>>
>>>>>>>>>>>> which is misleading because it can fail if called before
>>>>>>>>>>>> initialization,
>>>>>>>>>>>> or
>>>>>>>>>>>> after PerfMemory::destroy has been called.
>>>>>>>>>>>>
>>>>>>>>>>>> With your changes you no longer null out _prologue so the 
>>>>>>>>>>>> assertion
>>>>>>>>>>>> would
>>>>>>>>>>>> now not fail and we'd proceed to access the deleted memory 
>>>>>>>>>>>> region!
>>>>>>>>>>>>
>>>>>>>>>>>> I'm unclear why you no longer clear all the fields set during
>>>>>>>>>>>> initialization? But it seems to me that there are various 
>>>>>>>>>>>> checks of
>>>>>>>>>>>> _prologue that should really be checking is_initialized() 
>>>>>>>>>>>> and/or
>>>>>>>>>>>> is_destroyed() as a guard.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 16/10/2017 11:25 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> PING:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could you review it?
>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.05/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2017/10/03 13:18, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I added gtest unit test case for this change in new webrev:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.05/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could you review it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2017-09-27 0:01 GMT+09:00 Yasumasa 
>>>>>>>>>>>>>> Suenaga<yasuenag at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I uploaded new webrev to be adapted to jdk10/hs:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.04/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2017/09/21 7:45, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> PING:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Have you checked this issue?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.03/ 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2017/07/01 23:43, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> PING:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Have you checked this issue?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 2017/06/13 14:10, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I want to discuss about JDK-8151815: Could not parse 
>>>>>>>>>>>>>>>>>> core image
>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> JSnap.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In last year, I found JSnap cannot parse coredump and 
>>>>>>>>>>>>>>>>>> I've sent
>>>>>>>>>>>>>>>>>> review
>>>>>>>>>>>>>>>>>> request for it as JDK-8151815. However it has not been 
>>>>>>>>>>>>>>>>>> reviewed
>>>>>>>>>>>>>>>>>> yet
>>>>>>>>>>>>>>>>>> [1].
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We've discussed about safety implementation, but we 
>>>>>>>>>>>>>>>>>> could not
>>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>> consensus.
>>>>>>>>>>>>>>>>>> IMHO all SA tools should be handled java processes and 
>>>>>>>>>>>>>>>>>> core
>>>>>>>>>>>>>>>>>> images,
>>>>>>>>>>>>>>>>>> and PerfCounter value is useful. So I fix this issue.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I uploaded new webrev for this issue. I think this 
>>>>>>>>>>>>>>>>>> patch is
>>>>>>>>>>>>>>>>>> safety
>>>>>>>>>>>>>>>>>> because new flag PerfMemory::_destroyed guards double 
>>>>>>>>>>>>>>>>>> free, and
>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>> members in PerfMemory is accessible (they are not 
>>>>>>>>>>>>>>>>>> munmap'ed)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8151815/webrev.03/ 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Can you cooperate?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2016-April/019480.html 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>
>>


More information about the serviceability-dev mailing list