RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH

Erik Osterlund erik.osterlund at oracle.com
Fri Jul 5 20:41:45 UTC 2019


Thanks Dan!

/Erik

On 5 Jul 2019, at 21:53, Daniel D. Daugherty <daniel.daugherty at oracle.com> wrote:

>> I'll file a follow up bug after the dust settles for 8227117. 
> 
> I filed the following:
> 
>     JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer
>     https://bugs.openjdk.java.net/browse/JDK-8227338
> 
> Dan
> 
> 
>> On 7/5/19 1:07 PM, Daniel D. Daugherty wrote:
>>> On 7/4/19 3:10 AM, Erik Österlund wrote:
>>> Hi Dan,
>>> 
>>> Thanks for picking this up. The change looks good.
>> 
>> Thanks!  Of course, just the size of the comment below makes me wonder
>> what I got myself into... :-) And I was so happy that the non-logging
>> part of the fix was an else-statement with _one_ line...
>> 
>> 
>>> However, when reviewing this, I looked at the code for actually restoring the table (ignore/notice safepoints). It copies the dispatch table for the interpreter. There is a comment stating it is important the copying is atomic for MT-safety, and I can definitely see why. However, the copying the line after that comment is in fact not atomic.
>> 
>> Actually, the comment doesn't mention 'atomic', but that's probably
>> because the code and the comment are very, very old. It mentions
>> 'word wise for MT safety' and I agree that 'atomic' is what the
>> person likely meant...
>> 
>> The history:
>> 
>> $ sgv src/share/vm/interpreter/templateInterpreter.cpp | grep 'The copy has to occur word wise for MT safety'
>> 1.1       // Copy non-overlapping tables. The copy has to occur word wise for MT safety.
>> 
>> $ sp -r1.1 src/share/vm/interpreter/templateInterpreter.cpp
>> src/share/vm/interpreter/SCCS/s.templateInterpreter.cpp:
>> 
>> D 1.1 07/08/29 13:42:26 sgoldman 1 0    00600/00000/00000
>> MRs:
>> COMMENTS:
>> 6571248 - continuation_for is specialized for template interpreter
>> 
>> Hmmm... I expected that comment to be even older... ahhhh... a little
>> more poking around and I found:
>> 
>> $ sgv -r1.147 src/share/vm/interpreter/interpreter.cpp | grep 'The copy has to occur word wise for MT safety'
>> 1.147     // Copy non-overlapping tables. The copy has to occur word wise for MT safety.
>> 
>> $ sp -r1.147 src/share/vm/interpreter/interpreter.cpp
>> src/share/vm/interpreter/SCCS/s.interpreter.cpp:
>> 
>> D 1.147 99/02/17 10:14:36 steffen 235 233       00008/00002/00762
>> MRs:
>> COMMENTS:
>> 
>> This makes more sense (timeline wise) and dates back to when all
>> of the interpreter was in vm/interpreter/interpreter.cpp.
>> 
>> 
>>> Here is the copying code in templateInterpreter.cpp:
>>> 
>>> static inline void copy_table(address* from, address* to, int size) {
>>>   // Copy non-overlapping tables. The copy has to occur word wise for MT safety.
>>>   while (size-- > 0) *to++ = *from++;
>>> }
>>> 
>>> Copying using a loop of non-volatile loads and stores can and definitely will on some compilers turn into memcpy calls instead as the compiler (correctly) considers that an equivalent transformation.
>> 
>> Yet another C++ compiler optimization land mine... sigh...
>> 
>> 
>>> And memcpy does not guarantee atomicity. Indeed on some platforms it is not atomic. On some platforms it will even enjoy out-of-thin-air values.
>> 
>> That last bit is scary...
>> 
>> 
>>> Perhaps Copy::disjoint_words_atomic() would be a better choice for atomic word copying. If not, at the very least we should use Atomic::load/store here.
>> 
>> Copy::disjoint_words_atomic() sounds appealing...
>> 
>> For those folks that aren't familiar with this part of safepointing...
>> 
>> SafepointSynchronize::arm_safepoint() calls Interpreter::notice_safepoints()
>> which calls calls copy_table(). So we're not at a safepoint yet, and, in fact,
>> we're trying to bring those pesky JavaThreads to a safepoint...
>> 
>> SafepointSynchronize::disarm_safepoint() calls Interpreter::ignore_safepoints()
>> which also calls copy_table(). However, we did that before we have woken the
>> JavaThreads that are blocked for the safepoint so that use of copy_table is safe:
>> 
>> 
>>   // Release threads lock, so threads can be created/destroyed again.
>>   Threads_lock->unlock();
>> 
>>   // Wake threads after local state is correctly set.
>>   _wait_barrier->disarm();
>> }
>> 
>> The 'Threads_lock->unlock()' should synchronize memory so that the restored
>> table should be properly synced out to memory...
>> 
>> 
>>> Having said that, the fix for that issue seems like a separate RFE, because it has been sitting there for a lot longer than TLH has been around.
>> 
>> Yes I would like to keep the copy_table() issue for a separate bug (not RFE).
>> I'll file a follow up bug after the dust settles for 8227117.
>> 
>> Thanks again for the review!
>> 
>> Dan
>> 
>>> 
>>> Thanks,
>>> /Erik
>>> 
>>>> On 2019-07-04 04:04, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>> 
>>>> Robbin recently discovered this issue with Thread Local Handshakes. Since
>>>> he's not available at the moment, I'm handling the issue:
>>>> 
>>>>      JDK-8227117 normal interpreter table is not restored after single stepping with TLH
>>>>      https://bugs.openjdk.java.net/browse/JDK-8227117
>>>> 
>>>> When using Thread Local Handshakes, the normal interpreter table is
>>>> not restored after single stepping. This issue is caused by the
>>>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to
>>>> restore the normal interpreter table for the "off" case.
>>>> 
>>>> Prior to Thread Local Handshakes, this was a valid assumption to make.
>>>> SafepointSynchronize::end() has been refactored into
>>>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints()
>>>> on the global safepoint branch. That matches up with the call to
>>>> Interpreter::notice_safepoints() that is also on the global safepoint
>>>> branch.
>>>> 
>>>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case
>>>> to call Interpreter::ignore_safepoints() directly.
>>>> 
>>>> Here's the webrev URL:
>>>> 
>>>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/
>>>> 
>>>> The fix is just a small addition to VM_ChangeSingleStep::doit():
>>>> 
>>>>     if (_on) {
>>>>       Interpreter::notice_safepoints();
>>>> +  } else {
>>>> +    Interpreter::ignore_safepoints();
>>>>     }
>>>> 
>>>> Everything else is just new logging support for future debugging of
>>>> interpreter table management and single stepping.
>>>> 
>>>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms.
>>>> Mach5 Tier[4-6] on standard Oracle platforms is running now.
>>>> 
>>>> Thanks, in advance, for questions, comments or suggestions.
>>>> 
>>>> Dan
>>>> 
>> 
>> 
> 



More information about the serviceability-dev mailing list