[14] RFR 8235829: graal crashes with Zombie.java test
serguei.spitsyn at oracle.com
serguei.spitsyn at oracle.com
Wed Dec 18 18:33:08 UTC 2019
Hi Coleen,
Just wanted to confirm the webrev V2 version looks okay to me.
Sorry for replying on the wrong mailing thread.
Thanks,
Serguei
On 12/18/19 08:42, coleen.phillimore at oracle.com wrote:
>
>
> On 12/18/19 8:45 AM, David Holmes wrote:
>> Thanks for the additional info Coleen!
>>
>> Just to add a bit more to the initialization history. The
>> ServiceThread is a generalization of the LowMemoryDetectorThread that
>> was part of the management API, and so it was initialized in
>> Management::initialize. When it turned into the ServiceThread - to
>> process JVMTI deferred events in addition to the low-memory-detector
>> events - the initialization placement remained the same. Then later
>> the INCLUDE_MANAGEMENT guards were added (JDK-7189254, October 2012).
>> Later still we started adding other items of work for the
>> ServiceThread. The earliest was the AllocationContextService
>> notification in September 2014 but as that no longer exists I can't
>> tell if that was the first non-management related use. Then the
>> StringTable use was added 18 months ago - which definitely was
>> outside the realm of the management API. So that is when the
>> MinimalVM was first "broken". So it is good that is fixed.
>>
>> With regard to the placement in the initialization order, my
>> remaining concern was with JVMTI event processing that might happen
>> via events generated very early in the init sequence. But you have
>> now modified things so that we will only process events in the LIVE
>> phase, which only activates after all the class library
>> initialization is complete.
>>
>> So overall I'm no longer significantly concerned about the change to
>> the initialization order as I think you have it all covered. Thanks
>> for bearing with me and all the off-list discussion.
>
> Thank you for having this discussion with me and provoking me to
> recheck the ServiceThread. I think we can do further work to
> future-proof initialization order but the design needs to be improved.
>
> Thanks for reviewing,
> Coleen
>>
>> Cheers,
>> David
>> -----
>>
>> On 18/12/2019 1:27 am, coleen.phillimore at oracle.com wrote:
>>>
>>>
>>> On 12/16/19 11:04 PM, David Holmes wrote:
>>>> Clarification ...
>>>>
>>>> On 17/12/2019 12:40 pm, coleen.phillimore at oracle.com wrote:
>>>>>
>>>>> Short answer below.
>>>>>
>>>>> On 12/16/19 5:51 PM, David Holmes wrote:
>>>>>> Hi Coleen,
>>>>>>
>>>>>> A quick initial response ...
>>>>>>
>>>>>> On 16/12/2019 11:26 pm, coleen.phillimore at oracle.com wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 12/16/19 8:04 AM, David Holmes wrote:
>>>>>>>> Hi Coleen,
>>>>>>>>
>>>>>>>> On 16/12/2019 9:41 pm, coleen.phillimore at oracle.com wrote:
>>>>>>>>> Summary: Start ServiceThread before compiler threads, and run
>>>>>>>>> nmethod barriers for zgc before adding to the service thread
>>>>>>>>> queue, or posting the events on the java thread queue.
>>>>>>>>
>>>>>>>> I can't comment on most of this but the earlier starting of the
>>>>>>>> service thread has some concerns:
>>>>>>>>
>>>>>>>> - there is a lot of JDK level initialization which now will not
>>>>>>>> have happened before the service thread is started and it is
>>>>>>>> far from obvious that all possible initialization dependencies
>>>>>>>> will be satisfied
>>>>>>>
>>>>>>> I agree that the order of initialization is very sensitive. From
>>>>>>> the actions that the service thread does, the one that I found
>>>>>>> was a problem was that events were posted before the LIVE phase
>>>>>>> (see comment in has_events()), which could have happened with
>>>>>>> the existing code, but the window for the race is a lot smaller.
>>>>>>> The other actions can be run if there's a GC before
>>>>>>> initialization but would be a bug in the initialization code,
>>>>>>> and I didn't find these bugs in all my testing. There are some
>>>>>>> ordering dependencies that do have odd side effects (between the
>>>>>>> compiler thread startup and initialization jsr292 classes) which
>>>>>>> have comments. This patch doesn't touch those.
>>>>>>>
>>>>>>>>
>>>>>>>> - current starting of the service thread in
>>>>>>>> Management::initialize is guarded by "#if INCLUDE_MANAGEMENT",
>>>>>>>> but now you are starting the service thread unconditionally for
>>>>>>>> all builds. Hmm just saw your latest comment to the bug report
>>>>>>>> - so the service thread is now (for quite some time?) being
>>>>>>>> used for other than management tasks and so should always be
>>>>>>>> present even if INCLUDE_MANAGEMENT is not enabled. Is that
>>>>>>>> sufficient or are there likely to be other changes needed to
>>>>>>>> actually ensure that all works correctly e.g. any code the
>>>>>>>> service thread executes that is only defined for
>>>>>>>> INCLUDE_MANAGEMENT will need to be compiled out explicitly.
>>>>>>>>
>>>>>>>
>>>>>>> I asked Jie offline to check the minimal build. I don't think
>>>>>>> there are other INCLUDE_MANAGEMENT actions in the service thread
>>>>>>> and I'm not sure why it was initialized there in the first
>>>>>>> place. The minimal vm would have been broken ie. hashtables
>>>>>>> would not have been cleaned up, etc, but I'm not sure how well
>>>>>>> that is tested or if one would notice.
>>>>>>>> - the service thread and the notification thread are (were?)
>>>>>>>> closely related but now started at completely different times
>>>>>>>
>>>>>>> The notification thread is limited to "services" so it makes
>>>>>>> sense where it is. The ServiceThread does lots of other
>>>>>>> things. Maybe it needs renaming in 15.
>>>>>>>>
>>>>>>>> The bug report states the problem as:
>>>>>>>>
>>>>>>>> "The graal crash is because compiled_method_load events are
>>>>>>>> added to the ServiceThread's deferred event queue before the
>>>>>>>> ServiceThread is created so are not walked to keep them from
>>>>>>>> being zombied."
>>>>>>>>
>>>>>>>> so why isn't the solution to ensure the deferred event queue is
>>>>>>>> walked? I'm not clear how starting the service thread relates
>>>>>>>> to walking the queue.
>>>>>>>>
>>>>>>>
>>>>>>> The service thread is responsible for walking the deferred event
>>>>>>> queue. See ServiceThread::oops_do/nmethods_do. The design
>>>>>>> could be changed to have some global walk somewhere of this
>>>>>>> queue, but essentially this queue is processed by the service
>>>>>>> thread.
>>>>>>
>>>>>> Sorry I don't follow. I thought "oops_do" and friends are for the
>>>>>> GC threads and/or VMThread to call to process oops when GC
>>>>>> updates them.
>>>>>
>>>>> The oops_do and nmethods_do() can be called by a thread walk in
>>>>> handshakes (by the sweeper thread) and by parallel GC thread
>>>>> walks. There isn't a single entry to do the thread-specific
>>>>> closures that we need to do for these deferred event queues. I
>>>>> tried a version that walked the queues with a static call but
>>>>> missed some places where it would be needed to make this call
>>>>> (didn't work). Keeping this associated with the ServiceThread
>>>>> simplifies a lot.
>>>>
>>>> Just to clarify that further, the thread walk requires the thread
>>>> appears in ALL_JAVA_THREADS but that only happens after the
>>>> ServiceThread has been started. So in essence we don't really need
>>>> the ServiceThread to have commenced execution earlier, but we need
>>>> it to have been created. Those two steps are combined in practice.
>>>
>>> Yes. Then the ServiceThread waits on the Service_lock until
>>> notified by these events:
>>>
>>> while (((sensors_changed = (!UseNotificationThread &&
>>> LowMemoryDetector::has_pending_requests())) |
>>> (has_jvmti_events = _jvmti_service_queue.has_events()) |
>>> (has_gc_notification_event = (!UseNotificationThread
>>> && GCNotifier::has_event())) |
>>> (has_dcmd_notification_event =
>>> (!UseNotificationThread &&
>>> DCmdFactory::has_pending_jmx_notification())) |
>>> (stringtable_work = StringTable::has_work()) |
>>> (symboltable_work = SymbolTable::has_work()) |
>>> (resolved_method_table_work =
>>> ResolvedMethodTable::has_work()) |
>>> (thread_id_table_work = ThreadIdTable::has_work()) |
>>> (protection_domain_table_work =
>>> SystemDictionary::pd_cache_table()->has_work()) |
>>> (oopstorage_work =
>>> OopStorage::has_cleanup_work_and_reset())
>>> ) == 0) {
>>>
>>> The first, third and fourth events are from management.cpp events
>>> that were initialized after the ServiceThread was started.
>>> The second event I have changed, to wait until LIVE phase to return
>>> true.
>>> The stringtable, symboltable, resolved_method_table, thread_id and
>>> pd table have static _has_work variables initialized to false.
>>> The oopstorage_work has similar, but may update a time-based counter
>>> a bit earlier with the service thread starting earlier. I think this
>>> is harmless.
>>>
>>> It is possible that after the service thread starts and before the
>>> compiler thread starts, there could be a GC that notifies the
>>> stringtable to clean up. This seems like a good thing that the GC
>>> would clean up these tables with this order. I ran the tier4 graal
>>> tests and there were no failures.
>>>
>>> Thanks,
>>> Coleen
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>> thanks,
>>>>> Coleen
>>>>>
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> I had an additional change to make the queue non-static but want
>>>>>>> to limit the change at this point.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> See bug for description of the problems found with the new
>>>>>>>>> Zombie.java test.
>>>>>>>>>
>>>>>>>>> open webrev at
>>>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8235829.01/webrev
>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8235829
>>>>>>>>>
>>>>>>>>> Ran tier1 all platforms, and tier2-8 testing, as well as
>>>>>>>>> rerunning original test failure from bug
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8173361.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>
>>>>>
>>>
>
More information about the serviceability-dev
mailing list