Improving the speed of Thread interrupt checking

Fri May 10 23:46:37 PDT 2013

Charles,

Thanks for the explanation. I have recently (for the last 6 months) been
involved with some very performance centric multi-threaded work in
profiling the JVM. Using JVMTI as a profiling tool with C++ underneath. The
code all uses JVM locks where locks are required - but as profilers need to
be as invisible as possible I have been removing locks where they can be
avoided.

My experience here has indicated that on modern machies CAS operations are
always worth a try compared to locks. The cost of loosing the current
quantum (even on *NIX) is so high that it is not worth paying unless a
thread is truly blocked - e.g. for IO.

In the next stage of this work I am seriously considering moving to CAS
based read/write spin locks having had good results from simple spin locks.
The general feeling in the community against spin locking seems to have
aged some what. More recent hardware means the cost of spinning on a failed
CAS is low - it a mutating CAS which is costly. Thus, the locked operation
needs to take hundred of even thousands of cycles before it is worth
loosing a quantum over.

In your case, inter-thread signalling is definitely not work loosing a
quantum over.

If I get chance over the next couple of days I'll make great a cut down
example of CAS over thread.interup and run the profiler (DevpartnerJ) over
it - it could be a great unit test.

Best wishes - AJ

On 10 May 2013 22:24, Charles Oliver Nutter <headius at headius.com> wrote:

> You need CAS because one form of the interrupt check clears it and another
> does not. So the get + check + set of interrupt status needs to be atomic,
> or another thread could jump in and change it during that process.
>
> If it were just being read, then sure...it could simply be volatile. But
> since there's a non-atomic operation in there, a race might be possible.
>
> I just took a deeper look at the intrinsic, to see if it avoids the
> lock...but unfortunately it does not. It adds fast paths for when the
> thread is not interrupted *and* clearing is not requested (Thread.interrupt
> clears, Thread#isInterrupted does not). So the typical use case of calling
> Thread.interrupt() to get and clear interrupt status still follows the
> slow, locking path all the time.
>
> We are mitigating this in our code by using Thread#isInterrupted()
> (th.isInterrupted on a Thread object) to do the frequent checks, and then
> using Thread.interrupted to clear it only when it has been set. I think
> this will be ok, but the slow path still seems like it could benefit from a
> CAS impl instead of a lock.
>
> - Charlie
>
>
> On Fri, May 10, 2013 at 11:17 AM, Alexander Turner <nerdscentral at gmail.com
> > wrote:
>
>> Charles,
>>
>> Why bother even using CAS?
>>
>> Thread A is monitoring Thread B. Thread B cooperatively checks to see if
>> it should die.
>>
>> Therefore, you only need B to know when A has told it to shut down.
>>
>> Therefore, all you need is a volatile boolean. A volatile boolean is very
>> much faster than a full CAS operation.
>> http://nerds-central.blogspot.co.uk/2011/11/atomicinteger-volatile-synchronized-and.html
>>
>> Best wishes - AJ
>>
>>
>> On 10 May 2013 17:03, Charles Oliver Nutter <headius at headius.com> wrote:
>>
>>> This isn't strictly language-related, but I thought I'd post here before
>>> I start pinging hotspot folks directly...
>>>
>>> We are looking at adding interrupt checking to our regex engine, Joni,
>>> so that long-running (or never-terminating) expressions could be terminated
>>> early. To do this we're using Thread.interrupt.
>>>
>>> Unfortunately our first experiments with it have shown that interrupt
>>> checking is rather expensive; having it in the main instruction loop slowed
>>> down a 16s benchmark to 68s. We're reducing that checking by only doing it
>>> every N instructions now, but I figured I'd look into why it's so slow.
>>>
>>> Thread.isInterrupted does currentThread().interrupted(), both of which
>>> are native calls. They end up as intrinsics and/or calling
>>> JVM_CurrentThread and JVM_IsInterrupted. The former is not a
>>> problem...accesses threadObj off the current thread (presumably from env)
>>> and twiddles handle lifetime a bit. The latter, however, has to acquire a
>>> lock to ensure retrieval and clearing are atomic.
>>>
>>> So then it occurred to me...why does it have to acquire a lock at all?
>>> It seems like a get + CAS to clear would prevent accidentally clearing
>>> another thread's re-interrupt. Some combination of CAS operations could
>>> avoid the case where two threads both check interrupt status at the same
>>> time.
>>>
>>> I would expect the CAS version would have lower overhead than the hard
>>> mutex acquisition.
>>>
>>> Does this seem reasonable?
>>>
>>> - Charlie
>>>
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>>
>>>
>>
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>>
>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20130511/f318fde6/attachment-0001.html