RFR: 8072147: Preloading libjsig.dylib causes deadlock when signal() is called

Daniel D. Daugherty daniel.daugherty at oracle.com
Tue Jun 30 13:21:06 UTC 2015


Buck,

Don't think you need a revised webrev for a minor type change.

Dan


On 6/30/15 2:34 AM, david buck wrote:
> Hi Dmitry!
>
> Another good catch. Thank you (again)!
>
> I'll rebuild/retest and post a revised webrev shortly.
>
> Cheers,
> -Buck
>
> On 2015/06/30 17:09, Dmitry Dmitriev wrote:
>> Hello David,
>>
>> Looks good. Just one comment/question: call_os_signal function return
>> type is 'sa_handler_t', thus it is possible to define 'res'
>> variable(line 80) with 'sa_handler_t' type?
>>
>> Thank you,
>> Dmitry
>>
>> On 29.06.2015 19:41, david buck wrote:
>>> Hi Dmitry!
>>>
>>> I have fixed the issue and retested both manually and with the
>>> "hotspot" test suit. Thank you again for the catch.
>>>
>>> http://cr.openjdk.java.net/~dbuck/8072147/webrev.01/
>>>
>>> Hi Karen!
>>>
>>> The "reenter" boolean is thread local, so it should already work
>>> exactly as you describe below.
>>>
>>> > p.s. let me see if I can find any existing signal handler chaining
>>> tests - I suspect you will need to write some
>>> > additional ones
>>>
>>> I took a look, but was not able to find any tests that run actually
>>> run on all *nix platforms and install signal handlers. Ultimately we
>>> should have some sort of comprehensive functional testing for this. If
>>> there already is such testing, I would like to add this particular
>>> case. But I suspect that there is nothing comprehensive at the moment.
>>> If so, I believe adding new functional testing for all of libjsig is
>>> probably outside the scope of this point fix.
>>>
>>> Cheers,
>>> -Buck
>>>
>>> On 2015/06/30 1:27, Karen Kinnear wrote:
>>>> David,
>>>>
>>>> Thank you for tackling this bug.
>>>>
>>>> I don't understand how the proposed solution handles multiple threads
>>>>    - e.g. 2 jni threads installing their signal handlers and a jvm
>>>> installing its signal handlers
>>>>
>>>> The jsig code is written to handle multi-threaded timing:
>>>>     user installs first, jvm installs after
>>>>        user install in progress, jvm tries to install
>>>>     jvm installs first, user installs after
>>>>        jvm install in progress, user tries to install
>>>>
>>>> I think having a global reentry, that only gets set to true on the
>>>> first call_os_signal, means
>>>> that the chaining won't be installed properly.
>>>>
>>>> I wonder if what you are looking for is a way to track that a given
>>>> thread is currently
>>>> inside this logic
>>>>     - not a global state
>>>>     - not whether a given thread has ever called this (a thread may
>>>> call this code multiple times, one for each signal)
>>>>     - but whether we have a recursive call.
>>>>
>>>> So - perhaps there is a way to track a list of threads that are in
>>>> the os_signal call and the thread
>>>> that owns the signal_lock?  So if you are in the sigaction call you
>>>> can check if your thread already
>>>> owns the signal_lock and if it is in the os_signal call - and skip
>>>> both the lock and unlock (basically making
>>>> a reentrant lock). And then when you are done the os_signal call,
>>>> remove your thread from the list
>>>> of threads in the os_signal call before returning from call_os_signal.
>>>>
>>>> That way the lock will still work between threads and between a
>>>> requests by the same thread.
>>>>
>>>> Does that sounds worth exploring?
>>>>
>>>> thanks,
>>>> Karen
>>>>
>>>> p.s. let me see if I can find any existing signal handler chaining
>>>> tests - I suspect you will need to write some
>>>> additional ones
>>>>
>>>>
>>>> On Jun 29, 2015, at 10:35 AM, david buck wrote:
>>>>
>>>>> Hi Dmitry!
>>>>>
>>>>> Ouch. Yes, that is *very* wrong. I will fix and retest.
>>>>>
>>>>> Sorry for the trivial error and thank you for the catch!
>>>>>
>>>>> Cheers,
>>>>> -Buck
>>>>>
>>>>> On 2015/06/29 23:31, Dmitry Dmitriev wrote:
>>>>>> Hello David,
>>>>>>
>>>>>> In call_os_signal function you add "reentry = false;"(line 93) after
>>>>>> return statement and it seems as unreachable code.
>>>>>>
>>>>>> Thanks,
>>>>>> Dmitry
>>>>>>
>>>>>> On 29.06.2015 17:18, david buck wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> Please review my fix below:
>>>>>>>
>>>>>>> bug report: https://bugs.openjdk.java.net/browse/jdk-8072147
>>>>>>>
>>>>>>> webrev: http://cr.openjdk.java.net/~dbuck/8072147/webrev.00/
>>>>>>>
>>>>>>> The fix is intentionally limited only to the platform (OSX/BSD) and
>>>>>>> scenario (signal() implemented by means of a direct call to
>>>>>>> sigaction()) where we can reasonably expect to see (and therefor
>>>>>>> test)
>>>>>>> this issue.
>>>>>>>
>>>>>>> Fix has been manually tested for effectiveness by myself. "hotspot"
>>>>>>> test suite run and passed on all platforms.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> -Buck
>>>>>>
>>>>
>>



More information about the hotspot-runtime-dev mailing list