RFR: 8072147: Preloading libjsig.dylib causes deadlock when signal() is called

Karen Kinnear karen.kinnear at oracle.com
Mon Jun 29 17:29:42 UTC 2015


David,

Many thanks - I had missed that the reentry was thread local - so you are right - this latest webrev should work.
Code looks good.

In addition to the tests I sent - it might make sense to modify one for a single thread to call signal() or sigaction()
multiple times - if none of the existing tests do that already.

thanks,
Karen

On Jun 29, 2015, at 12:41 PM, david buck wrote:

> Hi Dmitry!
> 
> I have fixed the issue and retested both manually and with the "hotspot" test suit. Thank you again for the catch.
> 
> http://cr.openjdk.java.net/~dbuck/8072147/webrev.01/
> 
> Hi Karen!
> 
> The "reenter" boolean is thread local, so it should already work exactly as you describe below.
> 
> > p.s. let me see if I can find any existing signal handler chaining tests - I suspect you will need to write some
> > additional ones
> 
> I took a look, but was not able to find any tests that run actually run on all *nix platforms and install signal handlers. Ultimately we should have some sort of comprehensive functional testing for this. If there already is such testing, I would like to add this particular case. But I suspect that there is nothing comprehensive at the moment. If so, I believe adding new functional testing for all of libjsig is probably outside the scope of this point fix.
> 
> Cheers,
> -Buck
> 
> On 2015/06/30 1:27, Karen Kinnear wrote:
>> David,
>> 
>> Thank you for tackling this bug.
>> 
>> I don't understand how the proposed solution handles multiple threads
>>   - e.g. 2 jni threads installing their signal handlers and a jvm installing its signal handlers
>> 
>> The jsig code is written to handle multi-threaded timing:
>>    user installs first, jvm installs after
>>       user install in progress, jvm tries to install
>>    jvm installs first, user installs after
>>       jvm install in progress, user tries to install
>> 
>> I think having a global reentry, that only gets set to true on the first call_os_signal, means
>> that the chaining won't be installed properly.
>> 
>> I wonder if what you are looking for is a way to track that a given thread is currently
>> inside this logic
>>    - not a global state
>>    - not whether a given thread has ever called this (a thread may call this code multiple times, one for each signal)
>>    - but whether we have a recursive call.
>> 
>> So - perhaps there is a way to track a list of threads that are in the os_signal call and the thread
>> that owns the signal_lock?  So if you are in the sigaction call you can check if your thread already
>> owns the signal_lock and if it is in the os_signal call - and skip both the lock and unlock (basically making
>> a reentrant lock). And then when you are done the os_signal call, remove your thread from the list
>> of threads in the os_signal call before returning from call_os_signal.
>> 
>> That way the lock will still work between threads and between a requests by the same thread.
>> 
>> Does that sounds worth exploring?
>> 
>> thanks,
>> Karen
>> 
>> p.s. let me see if I can find any existing signal handler chaining tests - I suspect you will need to write some
>> additional ones
>> 
>> 
>> On Jun 29, 2015, at 10:35 AM, david buck wrote:
>> 
>>> Hi Dmitry!
>>> 
>>> Ouch. Yes, that is *very* wrong. I will fix and retest.
>>> 
>>> Sorry for the trivial error and thank you for the catch!
>>> 
>>> Cheers,
>>> -Buck
>>> 
>>> On 2015/06/29 23:31, Dmitry Dmitriev wrote:
>>>> Hello David,
>>>> 
>>>> In call_os_signal function you add "reentry = false;"(line 93) after
>>>> return statement and it seems as unreachable code.
>>>> 
>>>> Thanks,
>>>> Dmitry
>>>> 
>>>> On 29.06.2015 17:18, david buck wrote:
>>>>> Hi!
>>>>> 
>>>>> Please review my fix below:
>>>>> 
>>>>> bug report: https://bugs.openjdk.java.net/browse/jdk-8072147
>>>>> 
>>>>> webrev: http://cr.openjdk.java.net/~dbuck/8072147/webrev.00/
>>>>> 
>>>>> The fix is intentionally limited only to the platform (OSX/BSD) and
>>>>> scenario (signal() implemented by means of a direct call to
>>>>> sigaction()) where we can reasonably expect to see (and therefor test)
>>>>> this issue.
>>>>> 
>>>>> Fix has been manually tested for effectiveness by myself. "hotspot"
>>>>> test suite run and passed on all platforms.
>>>>> 
>>>>> Cheers,
>>>>> -Buck
>>>> 
>> 



More information about the hotspot-runtime-dev mailing list