RFR: 8072147: Preloading libjsig.dylib causes deadlock when signal() is called

Karen Kinnear karen.kinnear at oracle.com
Mon Jun 29 16:27:16 UTC 2015


David,

Thank you for tackling this bug.

I don't understand how the proposed solution handles multiple threads 
  - e.g. 2 jni threads installing their signal handlers and a jvm installing its signal handlers

The jsig code is written to handle multi-threaded timing:
   user installs first, jvm installs after
      user install in progress, jvm tries to install
   jvm installs first, user installs after
      jvm install in progress, user tries to install

I think having a global reentry, that only gets set to true on the first call_os_signal, means
that the chaining won't be installed properly.

I wonder if what you are looking for is a way to track that a given thread is currently
inside this logic 
   - not a global state
   - not whether a given thread has ever called this (a thread may call this code multiple times, one for each signal)
   - but whether we have a recursive call.

So - perhaps there is a way to track a list of threads that are in the os_signal call and the thread
that owns the signal_lock?  So if you are in the sigaction call you can check if your thread already
owns the signal_lock and if it is in the os_signal call - and skip both the lock and unlock (basically making
a reentrant lock). And then when you are done the os_signal call, remove your thread from the list
of threads in the os_signal call before returning from call_os_signal.

That way the lock will still work between threads and between a requests by the same thread.

Does that sounds worth exploring?

thanks,
Karen

p.s. let me see if I can find any existing signal handler chaining tests - I suspect you will need to write some
additional ones
   

On Jun 29, 2015, at 10:35 AM, david buck wrote:

> Hi Dmitry!
> 
> Ouch. Yes, that is *very* wrong. I will fix and retest.
> 
> Sorry for the trivial error and thank you for the catch!
> 
> Cheers,
> -Buck
> 
> On 2015/06/29 23:31, Dmitry Dmitriev wrote:
>> Hello David,
>> 
>> In call_os_signal function you add "reentry = false;"(line 93) after
>> return statement and it seems as unreachable code.
>> 
>> Thanks,
>> Dmitry
>> 
>> On 29.06.2015 17:18, david buck wrote:
>>> Hi!
>>> 
>>> Please review my fix below:
>>> 
>>> bug report: https://bugs.openjdk.java.net/browse/jdk-8072147
>>> 
>>> webrev: http://cr.openjdk.java.net/~dbuck/8072147/webrev.00/
>>> 
>>> The fix is intentionally limited only to the platform (OSX/BSD) and
>>> scenario (signal() implemented by means of a direct call to
>>> sigaction()) where we can reasonably expect to see (and therefor test)
>>> this issue.
>>> 
>>> Fix has been manually tested for effectiveness by myself. "hotspot"
>>> test suite run and passed on all platforms.
>>> 
>>> Cheers,
>>> -Buck
>> 



More information about the hotspot-runtime-dev mailing list