RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH

David Holmes david.holmes at oracle.com
Thu Jul 4 09:38:27 UTC 2019


Hi Erik,

On 4/07/2019 6:08 pm, Erik Österlund wrote:
> Hi David,
> 
> When you run without TLH, this copying mechanism is used to synchronize 
> the safepoint while JavaThreads are running. The interpreter doesn't 
> emit any polls then. Instead it clobbers the dispatch table. JavaThreads 
> will be reading from the dispatch table while it is being 
> (non-atomically) modified. That could crash. For example with the 
> Solaris + studio + SPARC - TLH configuration, the compiler will almost 
> certainly emit a memcpy (this transformation has been observed in 
> practice), the memcpy will use BIS instructions (observed in practice) 
> for performance, with out-of-thin-air values (observed in practice), and 
> the JavaThreads will occasionally crash during safepoint synchronization 
> due to said out-of-thin-air values.
> 
> So I guess the problem might be larger back when TLH was not default. 
> But this seems conceptually wrong.

I always thought there were two dispatch tables and we simply switched 
between them - not copied anything!

David

> /Erik
> 
> On 2019-07-04 09:17, David Holmes wrote:
>> Hi Erik,
>>
>> On 4/07/2019 5:10 pm, Erik Österlund wrote:
>>> Hi Dan,
>>>
>>> Thanks for picking this up. The change looks good.
>>>
>>> However, when reviewing this, I looked at the code for actually 
>>> restoring the table (ignore/notice safepoints). It copies the 
>>> dispatch table for the interpreter. There is a comment stating it is 
>>> important the copying is atomic for MT-safety, and I can definitely 
>>> see why. However, the copying the line after that comment is in fact 
>>> not atomic.
>>
>> Is it assuming "atomicity" by virtue of executing at a safepoint?
>>
>> David
>> -----
>>
>>> Here is the copying code in templateInterpreter.cpp:
>>>
>>> static inline void copy_table(address* from, address* to, int size) {
>>>    // Copy non-overlapping tables. The copy has to occur word wise 
>>> for MT safety.
>>>    while (size-- > 0) *to++ = *from++;
>>> }
>>>
>>> Copying using a loop of non-volatile loads and stores can and 
>>> definitely will on some compilers turn into memcpy calls instead as 
>>> the compiler (correctly) considers that an equivalent transformation. 
>>> And memcpy does not guarantee atomicity. Indeed on some platforms it 
>>> is not atomic. On some platforms it will even enjoy out-of-thin-air 
>>> values. Perhaps Copy::disjoint_words_atomic() would be a better 
>>> choice for atomic word copying. If not, at the very least we should 
>>> use Atomic::load/store here.
>>>
>>> Having said that, the fix for that issue seems like a separate RFE, 
>>> because it has been sitting there for a lot longer than TLH has been 
>>> around.
>>>
>>> Thanks,
>>> /Erik
>>>
>>> On 2019-07-04 04:04, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> Robbin recently discovered this issue with Thread Local Handshakes. 
>>>> Since
>>>> he's not available at the moment, I'm handling the issue:
>>>>
>>>>      JDK-8227117 normal interpreter table is not restored after 
>>>> single stepping with TLH
>>>>      https://bugs.openjdk.java.net/browse/JDK-8227117
>>>>
>>>> When using Thread Local Handshakes, the normal interpreter table is
>>>> not restored after single stepping. This issue is caused by the
>>>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to
>>>> restore the normal interpreter table for the "off" case.
>>>>
>>>> Prior to Thread Local Handshakes, this was a valid assumption to make.
>>>> SafepointSynchronize::end() has been refactored into
>>>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints()
>>>> on the global safepoint branch. That matches up with the call to
>>>> Interpreter::notice_safepoints() that is also on the global safepoint
>>>> branch.
>>>>
>>>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case
>>>> to call Interpreter::ignore_safepoints() directly.
>>>>
>>>> Here's the webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/
>>>>
>>>> The fix is just a small addition to VM_ChangeSingleStep::doit():
>>>>
>>>>     if (_on) {
>>>>       Interpreter::notice_safepoints();
>>>> +  } else {
>>>> +    Interpreter::ignore_safepoints();
>>>>     }
>>>>
>>>> Everything else is just new logging support for future debugging of
>>>> interpreter table management and single stepping.
>>>>
>>>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms.
>>>> Mach5 Tier[4-6] on standard Oracle platforms is running now.
>>>>
>>>> Thanks, in advance, for questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
> 


More information about the serviceability-dev mailing list