[14] RFR 8235829: graal crashes with Zombie.java test

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Mon Dec 16 13:26:58 UTC 2019



On 12/16/19 8:04 AM, David Holmes wrote:
> Hi Coleen,
>
> On 16/12/2019 9:41 pm, coleen.phillimore at oracle.com wrote:
>> Summary: Start ServiceThread before compiler threads, and run nmethod 
>> barriers for zgc before adding to the service thread queue, or 
>> posting the events on the java thread queue.
>
> I can't comment on most of this but the earlier starting of the 
> service thread has some concerns:
>
> - there is a lot of JDK level initialization which now will not have 
> happened before the service thread is started and it is far from 
> obvious that all possible initialization dependencies will be satisfied

I agree that the order of initialization is very sensitive.  From the 
actions that the service thread does, the one that I found was a problem 
was that events were posted before the LIVE phase (see comment in 
has_events()), which could have happened with the existing code, but the 
window for the race is a lot smaller.   The other actions can be run if 
there's a GC before initialization but would be a bug in the 
initialization code, and I didn't find these bugs in all my testing.  
There are some ordering dependencies that do have odd side effects 
(between the compiler thread startup and initialization jsr292 classes) 
which have comments.  This patch doesn't touch those.

>
> - current starting of the service thread in Management::initialize is 
> guarded by "#if INCLUDE_MANAGEMENT", but now you are starting the 
> service thread unconditionally for all builds. Hmm just saw your 
> latest comment to the bug report - so the service thread is now (for 
> quite some time?) being used for other than management tasks and so 
> should always be present even if INCLUDE_MANAGEMENT is not enabled. Is 
> that sufficient or are there likely to be other changes needed to 
> actually ensure that all works correctly  e.g. any code the service 
> thread executes that is only defined for INCLUDE_MANAGEMENT will need 
> to be compiled out explicitly.
>

I asked Jie offline to check the minimal build.  I don't think there are 
other INCLUDE_MANAGEMENT actions in the service thread and I'm not sure 
why it was initialized there in the first place.  The minimal vm would 
have been broken ie. hashtables would not have been cleaned up, etc, but 
I'm not sure how well that is tested or if one would notice.
> - the service thread and the notification thread are (were?) closely 
> related but now started at completely different times

The notification thread is limited to "services" so it makes sense where 
it is.  The ServiceThread does lots of other things.  Maybe it needs 
renaming in 15.
>
> The bug report states the problem as:
>
> "The graal crash is because compiled_method_load events are added to 
> the ServiceThread's deferred event queue before the ServiceThread is 
> created so are not walked to keep them from being zombied."
>
> so why isn't the solution to ensure the deferred event queue is 
> walked? I'm not clear how starting the service thread relates to 
> walking the queue.
>

The service thread is responsible for walking the deferred event 
queue.   See ServiceThread::oops_do/nmethods_do.   The design could be 
changed to have some global walk somewhere of this queue, but 
essentially this queue is processed by the service thread.

I had an additional change to make the queue non-static but want to 
limit the change at this point.

Thanks,
Coleen
> Thanks,
> David
>
>> See bug for description of the problems found with the new 
>> Zombie.java test.
>>
>> open webrev at 
>> http://cr.openjdk.java.net/~coleenp/2019/8235829.01/webrev
>> bug link https://bugs.openjdk.java.net/browse/JDK-8235829
>>
>> Ran tier1 all platforms, and tier2-8 testing, as well as rerunning 
>> original test failure from bug 
>> https://bugs.openjdk.java.net/browse/JDK-8173361.
>>
>> Thanks,
>> Coleen



More information about the serviceability-dev mailing list