[14] RFR 8235829: graal crashes with Zombie.java test

Wed Dec 18 18:33:08 UTC 2019

Hi Coleen,

Just wanted to confirm the webrev V2 version looks okay to me.
Sorry for replying on the wrong mailing thread.

Thanks,
Serguei

On 12/18/19 08:42, coleen.phillimore at oracle.com wrote:
>
>
> On 12/18/19 8:45 AM, David Holmes wrote:
>> Thanks for the additional info Coleen!
>>
>> Just to add a bit more to the initialization history. The 
>> ServiceThread is a generalization of the LowMemoryDetectorThread that 
>> was part of the management API, and so it was initialized in 
>> Management::initialize. When it turned into the ServiceThread - to 
>> process JVMTI deferred events in addition to the low-memory-detector 
>> events - the initialization placement remained the same. Then later 
>> the INCLUDE_MANAGEMENT guards were added (JDK-7189254, October 2012). 
>> Later still we started adding other items of work for the 
>> ServiceThread. The earliest was the AllocationContextService 
>> notification in September 2014 but as that no longer exists I can't 
>> tell if that was the first non-management related use. Then the 
>> StringTable use was added 18 months ago - which definitely was 
>> outside the realm of the management API. So that is when the 
>> MinimalVM was first "broken". So it is good that is fixed.
>>
>> With regard to the placement in the initialization order, my 
>> remaining concern was with JVMTI event processing that might happen 
>> via events generated very early in the init sequence. But you have 
>> now modified things so that we will only process events in the LIVE 
>> phase, which only activates after all the class library 
>> initialization is complete.
>>
>> So overall I'm no longer significantly concerned about the change to 
>> the initialization order as I think you have it all covered. Thanks 
>> for bearing with me and all the off-list discussion.
>
> Thank you for having this discussion with me and provoking me to 
> recheck the ServiceThread.   I think we can do further work to 
> future-proof initialization order but the design needs to be improved.
>
> Thanks for reviewing,
> Coleen
>>
>> Cheers,
>> David
>> -----
>>
>> On 18/12/2019 1:27 am, coleen.phillimore at oracle.com wrote:
>>>
>>>
>>> On 12/16/19 11:04 PM, David Holmes wrote:
>>>> Clarification ...
>>>>
>>>> On 17/12/2019 12:40 pm, coleen.phillimore at oracle.com wrote:
>>>>>
>>>>> Short answer below.
>>>>>
>>>>> On 12/16/19 5:51 PM, David Holmes wrote:
>>>>>> Hi Coleen,
>>>>>>
>>>>>> A quick initial response ...
>>>>>>
>>>>>> On 16/12/2019 11:26 pm, coleen.phillimore at oracle.com wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 12/16/19 8:04 AM, David Holmes wrote:
>>>>>>>> Hi Coleen,
>>>>>>>>
>>>>>>>> On 16/12/2019 9:41 pm, coleen.phillimore at oracle.com wrote:
>>>>>>>>> Summary: Start ServiceThread before compiler threads, and run 
>>>>>>>>> nmethod barriers for zgc before adding to the service thread 
>>>>>>>>> queue, or posting the events on the java thread queue.
>>>>>>>>
>>>>>>>> I can't comment on most of this but the earlier starting of the 
>>>>>>>> service thread has some concerns:
>>>>>>>>
>>>>>>>> - there is a lot of JDK level initialization which now will not 
>>>>>>>> have happened before the service thread is started and it is 
>>>>>>>> far from obvious that all possible initialization dependencies 
>>>>>>>> will be satisfied
>>>>>>>
>>>>>>> I agree that the order of initialization is very sensitive. From 
>>>>>>> the actions that the service thread does, the one that I found 
>>>>>>> was a problem was that events were posted before the LIVE phase 
>>>>>>> (see comment in has_events()), which could have happened with 
>>>>>>> the existing code, but the window for the race is a lot smaller. 
>>>>>>>   The other actions can be run if there's a GC before 
>>>>>>> initialization but would be a bug in the initialization code, 
>>>>>>> and I didn't find these bugs in all my testing. There are some 
>>>>>>> ordering dependencies that do have odd side effects (between the 
>>>>>>> compiler thread startup and initialization jsr292 classes) which 
>>>>>>> have comments.  This patch doesn't touch those.
>>>>>>>
>>>>>>>>
>>>>>>>> - current starting of the service thread in 
>>>>>>>> Management::initialize is guarded by "#if INCLUDE_MANAGEMENT", 
>>>>>>>> but now you are starting the service thread unconditionally for 
>>>>>>>> all builds. Hmm just saw your latest comment to the bug report 
>>>>>>>> - so the service thread is now (for quite some time?) being 
>>>>>>>> used for other than management tasks and so should always be 
>>>>>>>> present even if INCLUDE_MANAGEMENT is not enabled. Is that 
>>>>>>>> sufficient or are there likely to be other changes needed to 
>>>>>>>> actually ensure that all works correctly  e.g. any code the 
>>>>>>>> service thread executes that is only defined for 
>>>>>>>> INCLUDE_MANAGEMENT will need to be compiled out explicitly.
>>>>>>>>
>>>>>>>
>>>>>>> I asked Jie offline to check the minimal build.  I don't think 
>>>>>>> there are other INCLUDE_MANAGEMENT actions in the service thread 
>>>>>>> and I'm not sure why it was initialized there in the first 
>>>>>>> place. The minimal vm would have been broken ie. hashtables 
>>>>>>> would not have been cleaned up, etc, but I'm not sure how well 
>>>>>>> that is tested or if one would notice.
>>>>>>>> - the service thread and the notification thread are (were?) 
>>>>>>>> closely related but now started at completely different times
>>>>>>>
>>>>>>> The notification thread is limited to "services" so it makes 
>>>>>>> sense where it is.  The ServiceThread does lots of other 
>>>>>>> things.  Maybe it needs renaming in 15.
>>>>>>>>
>>>>>>>> The bug report states the problem as:
>>>>>>>>
>>>>>>>> "The graal crash is because compiled_method_load events are 
>>>>>>>> added to the ServiceThread's deferred event queue before the 
>>>>>>>> ServiceThread is created so are not walked to keep them from 
>>>>>>>> being zombied."
>>>>>>>>
>>>>>>>> so why isn't the solution to ensure the deferred event queue is 
>>>>>>>> walked? I'm not clear how starting the service thread relates 
>>>>>>>> to walking the queue.
>>>>>>>>
>>>>>>>
>>>>>>> The service thread is responsible for walking the deferred event 
>>>>>>> queue.   See ServiceThread::oops_do/nmethods_do.   The design 
>>>>>>> could be changed to have some global walk somewhere of this 
>>>>>>> queue, but essentially this queue is processed by the service 
>>>>>>> thread.
>>>>>>
>>>>>> Sorry I don't follow. I thought "oops_do" and friends are for the 
>>>>>> GC threads and/or VMThread to call to process oops when GC 
>>>>>> updates them.
>>>>>
>>>>> The oops_do and nmethods_do() can be called by a thread walk in 
>>>>> handshakes (by the sweeper thread) and by parallel GC thread 
>>>>> walks. There isn't a single entry to do the thread-specific 
>>>>> closures that we need to do for these deferred event queues.   I 
>>>>> tried a version that walked the queues with a static call but 
>>>>> missed some places where it would be needed to make this call 
>>>>> (didn't work).  Keeping this associated with the ServiceThread 
>>>>> simplifies a lot.
>>>>
>>>> Just to clarify that further, the thread walk requires the thread 
>>>> appears in ALL_JAVA_THREADS but that only happens after the 
>>>> ServiceThread has been started. So in essence we don't really need 
>>>> the ServiceThread to have commenced execution earlier, but we need 
>>>> it to have been created. Those two steps are combined in practice.
>>>
>>> Yes.  Then the ServiceThread waits on the Service_lock until 
>>> notified by these events:
>>>
>>>        while (((sensors_changed = (!UseNotificationThread && 
>>> LowMemoryDetector::has_pending_requests())) |
>>>                (has_jvmti_events = _jvmti_service_queue.has_events()) |
>>>                (has_gc_notification_event = (!UseNotificationThread 
>>> && GCNotifier::has_event())) |
>>>                (has_dcmd_notification_event = 
>>> (!UseNotificationThread && 
>>> DCmdFactory::has_pending_jmx_notification())) |
>>>                (stringtable_work = StringTable::has_work()) |
>>>                (symboltable_work = SymbolTable::has_work()) |
>>>                (resolved_method_table_work = 
>>> ResolvedMethodTable::has_work()) |
>>>                (thread_id_table_work = ThreadIdTable::has_work()) |
>>>                (protection_domain_table_work = 
>>> SystemDictionary::pd_cache_table()->has_work()) |
>>>                (oopstorage_work = 
>>> OopStorage::has_cleanup_work_and_reset())
>>>               ) == 0) {
>>>
>>> The first, third and fourth events are from management.cpp events 
>>> that were initialized after the ServiceThread was started.
>>> The second event I have changed, to wait until LIVE phase to return 
>>> true.
>>> The stringtable, symboltable, resolved_method_table, thread_id and 
>>> pd table have static _has_work variables initialized to false.
>>> The oopstorage_work has similar, but may update a time-based counter 
>>> a bit earlier with the service thread starting earlier. I think this 
>>> is harmless.
>>>
>>> It is possible that after the service thread starts and before the 
>>> compiler thread starts, there could be a GC that notifies the 
>>> stringtable to clean up.  This seems like a good thing that the GC 
>>> would clean up these tables with this order.  I ran the tier4 graal 
>>> tests and there were no failures.
>>>
>>> Thanks,
>>> Coleen
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>> thanks,
>>>>> Coleen
>>>>>
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> I had an additional change to make the queue non-static but want 
>>>>>>> to limit the change at this point.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> See bug for description of the problems found with the new 
>>>>>>>>> Zombie.java test.
>>>>>>>>>
>>>>>>>>> open webrev at 
>>>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8235829.01/webrev
>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8235829
>>>>>>>>>
>>>>>>>>> Ran tier1 all platforms, and tier2-8 testing, as well as 
>>>>>>>>> rerunning original test failure from bug 
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8173361.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>
>>>>>
>>>
>