[14] RFR 8235829: graal crashes with Zombie.java test
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Tue Dec 17 02:40:57 UTC 2019
Short answer below.
On 12/16/19 5:51 PM, David Holmes wrote:
> Hi Coleen,
>
> A quick initial response ...
>
> On 16/12/2019 11:26 pm, coleen.phillimore at oracle.com wrote:
>>
>>
>> On 12/16/19 8:04 AM, David Holmes wrote:
>>> Hi Coleen,
>>>
>>> On 16/12/2019 9:41 pm, coleen.phillimore at oracle.com wrote:
>>>> Summary: Start ServiceThread before compiler threads, and run
>>>> nmethod barriers for zgc before adding to the service thread queue,
>>>> or posting the events on the java thread queue.
>>>
>>> I can't comment on most of this but the earlier starting of the
>>> service thread has some concerns:
>>>
>>> - there is a lot of JDK level initialization which now will not have
>>> happened before the service thread is started and it is far from
>>> obvious that all possible initialization dependencies will be satisfied
>>
>> I agree that the order of initialization is very sensitive. From the
>> actions that the service thread does, the one that I found was a
>> problem was that events were posted before the LIVE phase (see
>> comment in has_events()), which could have happened with the existing
>> code, but the window for the race is a lot smaller. The other
>> actions can be run if there's a GC before initialization but would be
>> a bug in the initialization code, and I didn't find these bugs in all
>> my testing. There are some ordering dependencies that do have odd
>> side effects (between the compiler thread startup and initialization
>> jsr292 classes) which have comments. This patch doesn't touch those.
>>
>>>
>>> - current starting of the service thread in Management::initialize
>>> is guarded by "#if INCLUDE_MANAGEMENT", but now you are starting the
>>> service thread unconditionally for all builds. Hmm just saw your
>>> latest comment to the bug report - so the service thread is now (for
>>> quite some time?) being used for other than management tasks and so
>>> should always be present even if INCLUDE_MANAGEMENT is not enabled.
>>> Is that sufficient or are there likely to be other changes needed to
>>> actually ensure that all works correctly e.g. any code the service
>>> thread executes that is only defined for INCLUDE_MANAGEMENT will
>>> need to be compiled out explicitly.
>>>
>>
>> I asked Jie offline to check the minimal build. I don't think there
>> are other INCLUDE_MANAGEMENT actions in the service thread and I'm
>> not sure why it was initialized there in the first place. The
>> minimal vm would have been broken ie. hashtables would not have been
>> cleaned up, etc, but I'm not sure how well that is tested or if one
>> would notice.
>>> - the service thread and the notification thread are (were?) closely
>>> related but now started at completely different times
>>
>> The notification thread is limited to "services" so it makes sense
>> where it is. The ServiceThread does lots of other things. Maybe it
>> needs renaming in 15.
>>>
>>> The bug report states the problem as:
>>>
>>> "The graal crash is because compiled_method_load events are added to
>>> the ServiceThread's deferred event queue before the ServiceThread is
>>> created so are not walked to keep them from being zombied."
>>>
>>> so why isn't the solution to ensure the deferred event queue is
>>> walked? I'm not clear how starting the service thread relates to
>>> walking the queue.
>>>
>>
>> The service thread is responsible for walking the deferred event
>> queue. See ServiceThread::oops_do/nmethods_do. The design could
>> be changed to have some global walk somewhere of this queue, but
>> essentially this queue is processed by the service thread.
>
> Sorry I don't follow. I thought "oops_do" and friends are for the GC
> threads and/or VMThread to call to process oops when GC updates them.
The oops_do and nmethods_do() can be called by a thread walk in
handshakes (by the sweeper thread) and by parallel GC thread walks.
There isn't a single entry to do the thread-specific closures that we
need to do for these deferred event queues. I tried a version that
walked the queues with a static call but missed some places where it
would be needed to make this call (didn't work). Keeping this
associated with the ServiceThread simplifies a lot.
thanks,
Coleen
>
> David
> -----
>
>> I had an additional change to make the queue non-static but want to
>> limit the change at this point.
>>
>> Thanks,
>> Coleen
>>> Thanks,
>>> David
>>>
>>>> See bug for description of the problems found with the new
>>>> Zombie.java test.
>>>>
>>>> open webrev at
>>>> http://cr.openjdk.java.net/~coleenp/2019/8235829.01/webrev
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8235829
>>>>
>>>> Ran tier1 all platforms, and tier2-8 testing, as well as rerunning
>>>> original test failure from bug
>>>> https://bugs.openjdk.java.net/browse/JDK-8173361.
>>>>
>>>> Thanks,
>>>> Coleen
>>
More information about the serviceability-dev
mailing list