[14] RFR 8235829: graal crashes with Zombie.java test

David Holmes david.holmes at oracle.com
Mon Dec 16 22:51:00 UTC 2019


Hi Coleen,

A quick initial response ...

On 16/12/2019 11:26 pm, coleen.phillimore at oracle.com wrote:
> 
> 
> On 12/16/19 8:04 AM, David Holmes wrote:
>> Hi Coleen,
>>
>> On 16/12/2019 9:41 pm, coleen.phillimore at oracle.com wrote:
>>> Summary: Start ServiceThread before compiler threads, and run nmethod 
>>> barriers for zgc before adding to the service thread queue, or 
>>> posting the events on the java thread queue.
>>
>> I can't comment on most of this but the earlier starting of the 
>> service thread has some concerns:
>>
>> - there is a lot of JDK level initialization which now will not have 
>> happened before the service thread is started and it is far from 
>> obvious that all possible initialization dependencies will be satisfied
> 
> I agree that the order of initialization is very sensitive.  From the 
> actions that the service thread does, the one that I found was a problem 
> was that events were posted before the LIVE phase (see comment in 
> has_events()), which could have happened with the existing code, but the 
> window for the race is a lot smaller.   The other actions can be run if 
> there's a GC before initialization but would be a bug in the 
> initialization code, and I didn't find these bugs in all my testing. 
> There are some ordering dependencies that do have odd side effects 
> (between the compiler thread startup and initialization jsr292 classes) 
> which have comments.  This patch doesn't touch those.
> 
>>
>> - current starting of the service thread in Management::initialize is 
>> guarded by "#if INCLUDE_MANAGEMENT", but now you are starting the 
>> service thread unconditionally for all builds. Hmm just saw your 
>> latest comment to the bug report - so the service thread is now (for 
>> quite some time?) being used for other than management tasks and so 
>> should always be present even if INCLUDE_MANAGEMENT is not enabled. Is 
>> that sufficient or are there likely to be other changes needed to 
>> actually ensure that all works correctly  e.g. any code the service 
>> thread executes that is only defined for INCLUDE_MANAGEMENT will need 
>> to be compiled out explicitly.
>>
> 
> I asked Jie offline to check the minimal build.  I don't think there are 
> other INCLUDE_MANAGEMENT actions in the service thread and I'm not sure 
> why it was initialized there in the first place.  The minimal vm would 
> have been broken ie. hashtables would not have been cleaned up, etc, but 
> I'm not sure how well that is tested or if one would notice.
>> - the service thread and the notification thread are (were?) closely 
>> related but now started at completely different times
> 
> The notification thread is limited to "services" so it makes sense where 
> it is.  The ServiceThread does lots of other things.  Maybe it needs 
> renaming in 15.
>>
>> The bug report states the problem as:
>>
>> "The graal crash is because compiled_method_load events are added to 
>> the ServiceThread's deferred event queue before the ServiceThread is 
>> created so are not walked to keep them from being zombied."
>>
>> so why isn't the solution to ensure the deferred event queue is 
>> walked? I'm not clear how starting the service thread relates to 
>> walking the queue.
>>
> 
> The service thread is responsible for walking the deferred event 
> queue.   See ServiceThread::oops_do/nmethods_do.   The design could be 
> changed to have some global walk somewhere of this queue, but 
> essentially this queue is processed by the service thread.

Sorry I don't follow. I thought "oops_do" and friends are for the GC 
threads and/or VMThread to call to process oops when GC updates them.

David
-----

> I had an additional change to make the queue non-static but want to 
> limit the change at this point.
> 
> Thanks,
> Coleen
>> Thanks,
>> David
>>
>>> See bug for description of the problems found with the new 
>>> Zombie.java test.
>>>
>>> open webrev at 
>>> http://cr.openjdk.java.net/~coleenp/2019/8235829.01/webrev
>>> bug link https://bugs.openjdk.java.net/browse/JDK-8235829
>>>
>>> Ran tier1 all platforms, and tier2-8 testing, as well as rerunning 
>>> original test failure from bug 
>>> https://bugs.openjdk.java.net/browse/JDK-8173361.
>>>
>>> Thanks,
>>> Coleen
> 


More information about the serviceability-dev mailing list