[14] RFR 8235829: graal crashes with Zombie.java test
David Holmes
david.holmes at oracle.com
Mon Dec 16 22:51:00 UTC 2019
Hi Coleen,
A quick initial response ...
On 16/12/2019 11:26 pm, coleen.phillimore at oracle.com wrote:
>
>
> On 12/16/19 8:04 AM, David Holmes wrote:
>> Hi Coleen,
>>
>> On 16/12/2019 9:41 pm, coleen.phillimore at oracle.com wrote:
>>> Summary: Start ServiceThread before compiler threads, and run nmethod
>>> barriers for zgc before adding to the service thread queue, or
>>> posting the events on the java thread queue.
>>
>> I can't comment on most of this but the earlier starting of the
>> service thread has some concerns:
>>
>> - there is a lot of JDK level initialization which now will not have
>> happened before the service thread is started and it is far from
>> obvious that all possible initialization dependencies will be satisfied
>
> I agree that the order of initialization is very sensitive. From the
> actions that the service thread does, the one that I found was a problem
> was that events were posted before the LIVE phase (see comment in
> has_events()), which could have happened with the existing code, but the
> window for the race is a lot smaller. The other actions can be run if
> there's a GC before initialization but would be a bug in the
> initialization code, and I didn't find these bugs in all my testing.
> There are some ordering dependencies that do have odd side effects
> (between the compiler thread startup and initialization jsr292 classes)
> which have comments. This patch doesn't touch those.
>
>>
>> - current starting of the service thread in Management::initialize is
>> guarded by "#if INCLUDE_MANAGEMENT", but now you are starting the
>> service thread unconditionally for all builds. Hmm just saw your
>> latest comment to the bug report - so the service thread is now (for
>> quite some time?) being used for other than management tasks and so
>> should always be present even if INCLUDE_MANAGEMENT is not enabled. Is
>> that sufficient or are there likely to be other changes needed to
>> actually ensure that all works correctly e.g. any code the service
>> thread executes that is only defined for INCLUDE_MANAGEMENT will need
>> to be compiled out explicitly.
>>
>
> I asked Jie offline to check the minimal build. I don't think there are
> other INCLUDE_MANAGEMENT actions in the service thread and I'm not sure
> why it was initialized there in the first place. The minimal vm would
> have been broken ie. hashtables would not have been cleaned up, etc, but
> I'm not sure how well that is tested or if one would notice.
>> - the service thread and the notification thread are (were?) closely
>> related but now started at completely different times
>
> The notification thread is limited to "services" so it makes sense where
> it is. The ServiceThread does lots of other things. Maybe it needs
> renaming in 15.
>>
>> The bug report states the problem as:
>>
>> "The graal crash is because compiled_method_load events are added to
>> the ServiceThread's deferred event queue before the ServiceThread is
>> created so are not walked to keep them from being zombied."
>>
>> so why isn't the solution to ensure the deferred event queue is
>> walked? I'm not clear how starting the service thread relates to
>> walking the queue.
>>
>
> The service thread is responsible for walking the deferred event
> queue. See ServiceThread::oops_do/nmethods_do. The design could be
> changed to have some global walk somewhere of this queue, but
> essentially this queue is processed by the service thread.
Sorry I don't follow. I thought "oops_do" and friends are for the GC
threads and/or VMThread to call to process oops when GC updates them.
David
-----
> I had an additional change to make the queue non-static but want to
> limit the change at this point.
>
> Thanks,
> Coleen
>> Thanks,
>> David
>>
>>> See bug for description of the problems found with the new
>>> Zombie.java test.
>>>
>>> open webrev at
>>> http://cr.openjdk.java.net/~coleenp/2019/8235829.01/webrev
>>> bug link https://bugs.openjdk.java.net/browse/JDK-8235829
>>>
>>> Ran tier1 all platforms, and tier2-8 testing, as well as rerunning
>>> original test failure from bug
>>> https://bugs.openjdk.java.net/browse/JDK-8173361.
>>>
>>> Thanks,
>>> Coleen
>
More information about the serviceability-dev
mailing list