[14] RFR 8235829: graal crashes with Zombie.java test

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Tue Dec 17 02:40:57 UTC 2019


Short answer below.

On 12/16/19 5:51 PM, David Holmes wrote:
> Hi Coleen,
>
> A quick initial response ...
>
> On 16/12/2019 11:26 pm, coleen.phillimore at oracle.com wrote:
>>
>>
>> On 12/16/19 8:04 AM, David Holmes wrote:
>>> Hi Coleen,
>>>
>>> On 16/12/2019 9:41 pm, coleen.phillimore at oracle.com wrote:
>>>> Summary: Start ServiceThread before compiler threads, and run 
>>>> nmethod barriers for zgc before adding to the service thread queue, 
>>>> or posting the events on the java thread queue.
>>>
>>> I can't comment on most of this but the earlier starting of the 
>>> service thread has some concerns:
>>>
>>> - there is a lot of JDK level initialization which now will not have 
>>> happened before the service thread is started and it is far from 
>>> obvious that all possible initialization dependencies will be satisfied
>>
>> I agree that the order of initialization is very sensitive. From the 
>> actions that the service thread does, the one that I found was a 
>> problem was that events were posted before the LIVE phase (see 
>> comment in has_events()), which could have happened with the existing 
>> code, but the window for the race is a lot smaller.   The other 
>> actions can be run if there's a GC before initialization but would be 
>> a bug in the initialization code, and I didn't find these bugs in all 
>> my testing. There are some ordering dependencies that do have odd 
>> side effects (between the compiler thread startup and initialization 
>> jsr292 classes) which have comments.  This patch doesn't touch those.
>>
>>>
>>> - current starting of the service thread in Management::initialize 
>>> is guarded by "#if INCLUDE_MANAGEMENT", but now you are starting the 
>>> service thread unconditionally for all builds. Hmm just saw your 
>>> latest comment to the bug report - so the service thread is now (for 
>>> quite some time?) being used for other than management tasks and so 
>>> should always be present even if INCLUDE_MANAGEMENT is not enabled. 
>>> Is that sufficient or are there likely to be other changes needed to 
>>> actually ensure that all works correctly  e.g. any code the service 
>>> thread executes that is only defined for INCLUDE_MANAGEMENT will 
>>> need to be compiled out explicitly.
>>>
>>
>> I asked Jie offline to check the minimal build.  I don't think there 
>> are other INCLUDE_MANAGEMENT actions in the service thread and I'm 
>> not sure why it was initialized there in the first place.  The 
>> minimal vm would have been broken ie. hashtables would not have been 
>> cleaned up, etc, but I'm not sure how well that is tested or if one 
>> would notice.
>>> - the service thread and the notification thread are (were?) closely 
>>> related but now started at completely different times
>>
>> The notification thread is limited to "services" so it makes sense 
>> where it is.  The ServiceThread does lots of other things.  Maybe it 
>> needs renaming in 15.
>>>
>>> The bug report states the problem as:
>>>
>>> "The graal crash is because compiled_method_load events are added to 
>>> the ServiceThread's deferred event queue before the ServiceThread is 
>>> created so are not walked to keep them from being zombied."
>>>
>>> so why isn't the solution to ensure the deferred event queue is 
>>> walked? I'm not clear how starting the service thread relates to 
>>> walking the queue.
>>>
>>
>> The service thread is responsible for walking the deferred event 
>> queue.   See ServiceThread::oops_do/nmethods_do.   The design could 
>> be changed to have some global walk somewhere of this queue, but 
>> essentially this queue is processed by the service thread.
>
> Sorry I don't follow. I thought "oops_do" and friends are for the GC 
> threads and/or VMThread to call to process oops when GC updates them.

The oops_do and nmethods_do() can be called by a thread walk in 
handshakes (by the sweeper thread) and by parallel GC thread walks. 
There isn't a single entry to do the thread-specific closures that we 
need to do for these deferred event queues.   I tried a version that 
walked the queues with a static call but missed some places where it 
would be needed to make this call (didn't work).  Keeping this 
associated with the ServiceThread simplifies a lot.

thanks,
Coleen

>
> David
> -----
>
>> I had an additional change to make the queue non-static but want to 
>> limit the change at this point.
>>
>> Thanks,
>> Coleen
>>> Thanks,
>>> David
>>>
>>>> See bug for description of the problems found with the new 
>>>> Zombie.java test.
>>>>
>>>> open webrev at 
>>>> http://cr.openjdk.java.net/~coleenp/2019/8235829.01/webrev
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8235829
>>>>
>>>> Ran tier1 all platforms, and tier2-8 testing, as well as rerunning 
>>>> original test failure from bug 
>>>> https://bugs.openjdk.java.net/browse/JDK-8173361.
>>>>
>>>> Thanks,
>>>> Coleen
>>



More information about the serviceability-dev mailing list