Template interpreter::notice_safepoints()

Mon Feb 23 07:03:28 PST 2009

Sure, that makes sense.  When I'm writing concurrent code I have in
mind "Hell's Computer", which is the worst possible machine that
honours the guarantees of POSIX threads and the WG14 work on the
semantics of multithreaded programs in C++.  (As an example of WG14's
thinking, "Our current approach largely follows Pthreads and leaves
the semantics undefined if there is a data race, i.e. if a program
modifies a location while another thread is accessing it.")  But as
you say, no real machine is quite as bad as Hell's Computer, for which
we perhaps should be grateful.

I'm interested in this issue because Ed Nevill of ARM suggested a
modification to the C++ interpreter that implements
notice_safepoints() in the same way as the template interpreter.  It
seems that, as long as we use the polling page, we might be able to do
this.

However, Tom Rodriguez' suggestion requires fewer assumptions.
load_ptr_acquire() does nothing except the load on x86/GNU/Linux and
SPARC/Solaris:

inline intptr_t OrderAccess::load_ptr_acquire(volatile intptr_t* p) { return *p; }

So, if we implemented instruction dispatch as

#define DISPATCH(opcode) goto load_ptr_acquire(dispatch_table[opcode])

instead of

#define DISPATCH(opcode) goto *dispatch_table[opcode]

there would be no performance difference on most processors.  The
same argument applies to using release_store_ptr() in copy_table().

Andrew.

[1] Hans Boehm. Threads cannot be implemented as a library. In
Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language
Design and Implementation, pages 26–37, 2005.

Paul Hohensee wrote:

> I believe the safepoint code (safepoint.cpp) switches the
> interpreter dispatch table, then protects the polling page via
> mprotect().  mprotect() has the effect of a system-wide memory
> barrier (because it requires the OS to issue a global TLB
> shootdown), so the new dispatch table address will be flushed to
> memory in a timely fashion.  I think this is a case of "the sum of
> the bugs is zero", because I don't think we ever really thought
> about it.
>
> All that aside, I don't know of any SMP that requires programmatic
> intervention to synchronize a write across all caches.  As long as
> the write happens "eventually", which means in a short enough time
> to make safepoints "short", I don't think there's a problem
>
> Tom Rodriguez wrote:
>> All the ia64 ports currently use the C++ interpreter so I wouldn't be
>> surprised if there are weak cache consistency problems in the template
>> interpreter.  In practice though you can't run forever in the
>> interpreter without going into the runtime at least for a backward
>> branch overflow.  copy_table probably needs to use release_store_ptr
>> and the reads from the table may need to use a load acquire but I'd
>> suspect there would be other issues as well.
>>
>> tom
>>
>> On Feb 20, 2009, at 9:35 AM, Andrew Haley wrote:
>>
>>> I'm having a little difficulty understanding how the template
>>> interpreter
>>> is safe on SMP machines.
>>>
>>> notice_safepoints() looks like this:
>>>
>>> void TemplateInterpreter::notice_safepoints() {
>>>  if (!_notice_safepoints) {
>>>    // switch to safepoint dispatch table
>>>    _notice_safepoints = true;
>>>    copy_table((address*)&_safept_table, (address*)&_active_table,
>>> sizeof(_active_table) / sizeof(address));
>>>  }
>>> }
>>>
>>> So, the dispatch table is rewritten.  But on an SMP machine with
>>> weak cache coherency, how does this work?  A thread could be
>>> executing bytecodes in a loop but never see the change to _active_table
>>> because there's nothing to cause its cache to be updated.  Is it
>>> simply that this code doesn't work on such architectures?
>>>
>>> Andrew.
>>