RFR: 8072147: Preloading libjsig.dylib causes deadlock when signal() is called

Mon Jun 29 16:41:27 UTC 2015

Hi Dmitry!

I have fixed the issue and retested both manually and with the "hotspot" 
test suit. Thank you again for the catch.

http://cr.openjdk.java.net/~dbuck/8072147/webrev.01/

Hi Karen!

The "reenter" boolean is thread local, so it should already work exactly 
as you describe below.

 > p.s. let me see if I can find any existing signal handler chaining 
tests - I suspect you will need to write some
 > additional ones

I took a look, but was not able to find any tests that run actually run 
on all *nix platforms and install signal handlers. Ultimately we should 
have some sort of comprehensive functional testing for this. If there 
already is such testing, I would like to add this particular case. But I 
suspect that there is nothing comprehensive at the moment. If so, I 
believe adding new functional testing for all of libjsig is probably 
outside the scope of this point fix.

Cheers,
-Buck

On 2015/06/30 1:27, Karen Kinnear wrote:
> David,
>
> Thank you for tackling this bug.
>
> I don't understand how the proposed solution handles multiple threads
>    - e.g. 2 jni threads installing their signal handlers and a jvm installing its signal handlers
>
> The jsig code is written to handle multi-threaded timing:
>     user installs first, jvm installs after
>        user install in progress, jvm tries to install
>     jvm installs first, user installs after
>        jvm install in progress, user tries to install
>
> I think having a global reentry, that only gets set to true on the first call_os_signal, means
> that the chaining won't be installed properly.
>
> I wonder if what you are looking for is a way to track that a given thread is currently
> inside this logic
>     - not a global state
>     - not whether a given thread has ever called this (a thread may call this code multiple times, one for each signal)
>     - but whether we have a recursive call.
>
> So - perhaps there is a way to track a list of threads that are in the os_signal call and the thread
> that owns the signal_lock?  So if you are in the sigaction call you can check if your thread already
> owns the signal_lock and if it is in the os_signal call - and skip both the lock and unlock (basically making
> a reentrant lock). And then when you are done the os_signal call, remove your thread from the list
> of threads in the os_signal call before returning from call_os_signal.
>
> That way the lock will still work between threads and between a requests by the same thread.
>
> Does that sounds worth exploring?
>
> thanks,
> Karen
>
> p.s. let me see if I can find any existing signal handler chaining tests - I suspect you will need to write some
> additional ones
>
>
> On Jun 29, 2015, at 10:35 AM, david buck wrote:
>
>> Hi Dmitry!
>>
>> Ouch. Yes, that is *very* wrong. I will fix and retest.
>>
>> Sorry for the trivial error and thank you for the catch!
>>
>> Cheers,
>> -Buck
>>
>> On 2015/06/29 23:31, Dmitry Dmitriev wrote:
>>> Hello David,
>>>
>>> In call_os_signal function you add "reentry = false;"(line 93) after
>>> return statement and it seems as unreachable code.
>>>
>>> Thanks,
>>> Dmitry
>>>
>>> On 29.06.2015 17:18, david buck wrote:
>>>> Hi!
>>>>
>>>> Please review my fix below:
>>>>
>>>> bug report: https://bugs.openjdk.java.net/browse/jdk-8072147
>>>>
>>>> webrev: http://cr.openjdk.java.net/~dbuck/8072147/webrev.00/
>>>>
>>>> The fix is intentionally limited only to the platform (OSX/BSD) and
>>>> scenario (signal() implemented by means of a direct call to
>>>> sigaction()) where we can reasonably expect to see (and therefor test)
>>>> this issue.
>>>>
>>>> Fix has been manually tested for effectiveness by myself. "hotspot"
>>>> test suite run and passed on all platforms.
>>>>
>>>> Cheers,
>>>> -Buck
>>>
>