RFR: 8072147: Preloading libjsig.dylib causes deadlock when signal() is called

david buck david.buck at oracle.com
Tue Jun 30 08:34:57 UTC 2015


Hi Dmitry!

Another good catch. Thank you (again)!

I'll rebuild/retest and post a revised webrev shortly.

Cheers,
-Buck

On 2015/06/30 17:09, Dmitry Dmitriev wrote:
> Hello David,
>
> Looks good. Just one comment/question: call_os_signal function return
> type is 'sa_handler_t', thus it is possible to define 'res'
> variable(line 80) with 'sa_handler_t' type?
>
> Thank you,
> Dmitry
>
> On 29.06.2015 19:41, david buck wrote:
>> Hi Dmitry!
>>
>> I have fixed the issue and retested both manually and with the
>> "hotspot" test suit. Thank you again for the catch.
>>
>> http://cr.openjdk.java.net/~dbuck/8072147/webrev.01/
>>
>> Hi Karen!
>>
>> The "reenter" boolean is thread local, so it should already work
>> exactly as you describe below.
>>
>> > p.s. let me see if I can find any existing signal handler chaining
>> tests - I suspect you will need to write some
>> > additional ones
>>
>> I took a look, but was not able to find any tests that run actually
>> run on all *nix platforms and install signal handlers. Ultimately we
>> should have some sort of comprehensive functional testing for this. If
>> there already is such testing, I would like to add this particular
>> case. But I suspect that there is nothing comprehensive at the moment.
>> If so, I believe adding new functional testing for all of libjsig is
>> probably outside the scope of this point fix.
>>
>> Cheers,
>> -Buck
>>
>> On 2015/06/30 1:27, Karen Kinnear wrote:
>>> David,
>>>
>>> Thank you for tackling this bug.
>>>
>>> I don't understand how the proposed solution handles multiple threads
>>>    - e.g. 2 jni threads installing their signal handlers and a jvm
>>> installing its signal handlers
>>>
>>> The jsig code is written to handle multi-threaded timing:
>>>     user installs first, jvm installs after
>>>        user install in progress, jvm tries to install
>>>     jvm installs first, user installs after
>>>        jvm install in progress, user tries to install
>>>
>>> I think having a global reentry, that only gets set to true on the
>>> first call_os_signal, means
>>> that the chaining won't be installed properly.
>>>
>>> I wonder if what you are looking for is a way to track that a given
>>> thread is currently
>>> inside this logic
>>>     - not a global state
>>>     - not whether a given thread has ever called this (a thread may
>>> call this code multiple times, one for each signal)
>>>     - but whether we have a recursive call.
>>>
>>> So - perhaps there is a way to track a list of threads that are in
>>> the os_signal call and the thread
>>> that owns the signal_lock?  So if you are in the sigaction call you
>>> can check if your thread already
>>> owns the signal_lock and if it is in the os_signal call - and skip
>>> both the lock and unlock (basically making
>>> a reentrant lock). And then when you are done the os_signal call,
>>> remove your thread from the list
>>> of threads in the os_signal call before returning from call_os_signal.
>>>
>>> That way the lock will still work between threads and between a
>>> requests by the same thread.
>>>
>>> Does that sounds worth exploring?
>>>
>>> thanks,
>>> Karen
>>>
>>> p.s. let me see if I can find any existing signal handler chaining
>>> tests - I suspect you will need to write some
>>> additional ones
>>>
>>>
>>> On Jun 29, 2015, at 10:35 AM, david buck wrote:
>>>
>>>> Hi Dmitry!
>>>>
>>>> Ouch. Yes, that is *very* wrong. I will fix and retest.
>>>>
>>>> Sorry for the trivial error and thank you for the catch!
>>>>
>>>> Cheers,
>>>> -Buck
>>>>
>>>> On 2015/06/29 23:31, Dmitry Dmitriev wrote:
>>>>> Hello David,
>>>>>
>>>>> In call_os_signal function you add "reentry = false;"(line 93) after
>>>>> return statement and it seems as unreachable code.
>>>>>
>>>>> Thanks,
>>>>> Dmitry
>>>>>
>>>>> On 29.06.2015 17:18, david buck wrote:
>>>>>> Hi!
>>>>>>
>>>>>> Please review my fix below:
>>>>>>
>>>>>> bug report: https://bugs.openjdk.java.net/browse/jdk-8072147
>>>>>>
>>>>>> webrev: http://cr.openjdk.java.net/~dbuck/8072147/webrev.00/
>>>>>>
>>>>>> The fix is intentionally limited only to the platform (OSX/BSD) and
>>>>>> scenario (signal() implemented by means of a direct call to
>>>>>> sigaction()) where we can reasonably expect to see (and therefor
>>>>>> test)
>>>>>> this issue.
>>>>>>
>>>>>> Fix has been manually tested for effectiveness by myself. "hotspot"
>>>>>> test suite run and passed on all platforms.
>>>>>>
>>>>>> Cheers,
>>>>>> -Buck
>>>>>
>>>
>


More information about the hotspot-runtime-dev mailing list