RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH

Daniel D. Daugherty daniel.daugherty at oracle.com
Fri Jul 5 17:07:54 UTC 2019


On 7/4/19 3:10 AM, Erik Österlund wrote:
> Hi Dan,
>
> Thanks for picking this up. The change looks good.

Thanks!  Of course, just the size of the comment below makes me wonder
what I got myself into... :-) And I was so happy that the non-logging
part of the fix was an else-statement with _one_ line...


> However, when reviewing this, I looked at the code for actually 
> restoring the table (ignore/notice safepoints). It copies the dispatch 
> table for the interpreter. There is a comment stating it is important 
> the copying is atomic for MT-safety, and I can definitely see why. 
> However, the copying the line after that comment is in fact not atomic.

Actually, the comment doesn't mention 'atomic', but that's probably
because the code and the comment are very, very old. It mentions
'word wise for MT safety' and I agree that 'atomic' is what the
person likely meant...

The history:

$ sgv src/share/vm/interpreter/templateInterpreter.cpp | grep 'The copy 
has to occur word wise for MT safety'
1.1       // Copy non-overlapping tables. The copy has to occur word 
wise for MT safety.

$ sp -r1.1 src/share/vm/interpreter/templateInterpreter.cpp
src/share/vm/interpreter/SCCS/s.templateInterpreter.cpp:

D 1.1 07/08/29 13:42:26 sgoldman 1 0    00600/00000/00000
MRs:
COMMENTS:
6571248 - continuation_for is specialized for template interpreter

Hmmm... I expected that comment to be even older... ahhhh... a little
more poking around and I found:

$ sgv -r1.147 src/share/vm/interpreter/interpreter.cpp | grep 'The copy 
has to occur word wise for MT safety'
1.147     // Copy non-overlapping tables. The copy has to occur word 
wise for MT safety.

$ sp -r1.147 src/share/vm/interpreter/interpreter.cpp
src/share/vm/interpreter/SCCS/s.interpreter.cpp:

D 1.147 99/02/17 10:14:36 steffen 235 233       00008/00002/00762
MRs:
COMMENTS:

This makes more sense (timeline wise) and dates back to when all
of the interpreter was in vm/interpreter/interpreter.cpp.


> Here is the copying code in templateInterpreter.cpp:
>
> static inline void copy_table(address* from, address* to, int size) {
>   // Copy non-overlapping tables. The copy has to occur word wise for 
> MT safety.
>   while (size-- > 0) *to++ = *from++;
> }
>
> Copying using a loop of non-volatile loads and stores can and 
> definitely will on some compilers turn into memcpy calls instead as 
> the compiler (correctly) considers that an equivalent transformation.

Yet another C++ compiler optimization land mine... sigh...


> And memcpy does not guarantee atomicity. Indeed on some platforms it 
> is not atomic. On some platforms it will even enjoy out-of-thin-air 
> values.

That last bit is scary...


> Perhaps Copy::disjoint_words_atomic() would be a better choice for 
> atomic word copying. If not, at the very least we should use 
> Atomic::load/store here.

Copy::disjoint_words_atomic() sounds appealing...

For those folks that aren't familiar with this part of safepointing...

SafepointSynchronize::arm_safepoint() calls Interpreter::notice_safepoints()
which calls calls copy_table(). So we're not at a safepoint yet, and, in 
fact,
we're trying to bring those pesky JavaThreads to a safepoint...

SafepointSynchronize::disarm_safepoint() calls 
Interpreter::ignore_safepoints()
which also calls copy_table(). However, we did that before we have woken the
JavaThreads that are blocked for the safepoint so that use of copy_table 
is safe:


   // Release threads lock, so threads can be created/destroyed again.
   Threads_lock->unlock();

   // Wake threads after local state is correctly set.
   _wait_barrier->disarm();
}

The 'Threads_lock->unlock()' should synchronize memory so that the restored
table should be properly synced out to memory...


> Having said that, the fix for that issue seems like a separate RFE, 
> because it has been sitting there for a lot longer than TLH has been 
> around.

Yes I would like to keep the copy_table() issue for a separate bug (not 
RFE).
I'll file a follow up bug after the dust settles for 8227117.

Thanks again for the review!

Dan

>
> Thanks,
> /Erik
>
> On 2019-07-04 04:04, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> Robbin recently discovered this issue with Thread Local Handshakes. 
>> Since
>> he's not available at the moment, I'm handling the issue:
>>
>>      JDK-8227117 normal interpreter table is not restored after 
>> single stepping with TLH
>>      https://bugs.openjdk.java.net/browse/JDK-8227117
>>
>> When using Thread Local Handshakes, the normal interpreter table is
>> not restored after single stepping. This issue is caused by the
>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to
>> restore the normal interpreter table for the "off" case.
>>
>> Prior to Thread Local Handshakes, this was a valid assumption to make.
>> SafepointSynchronize::end() has been refactored into
>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints()
>> on the global safepoint branch. That matches up with the call to
>> Interpreter::notice_safepoints() that is also on the global safepoint
>> branch.
>>
>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case
>> to call Interpreter::ignore_safepoints() directly.
>>
>> Here's the webrev URL:
>>
>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/
>>
>> The fix is just a small addition to VM_ChangeSingleStep::doit():
>>
>>     if (_on) {
>>       Interpreter::notice_safepoints();
>> +  } else {
>> +    Interpreter::ignore_safepoints();
>>     }
>>
>> Everything else is just new logging support for future debugging of
>> interpreter table management and single stepping.
>>
>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms.
>> Mach5 Tier[4-6] on standard Oracle platforms is running now.
>>
>> Thanks, in advance, for questions, comments or suggestions.
>>
>> Dan
>>



More information about the serviceability-dev mailing list