RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH
David Holmes
david.holmes at oracle.com
Thu Jul 4 09:38:27 UTC 2019
Hi Erik,
On 4/07/2019 6:08 pm, Erik Österlund wrote:
> Hi David,
>
> When you run without TLH, this copying mechanism is used to synchronize
> the safepoint while JavaThreads are running. The interpreter doesn't
> emit any polls then. Instead it clobbers the dispatch table. JavaThreads
> will be reading from the dispatch table while it is being
> (non-atomically) modified. That could crash. For example with the
> Solaris + studio + SPARC - TLH configuration, the compiler will almost
> certainly emit a memcpy (this transformation has been observed in
> practice), the memcpy will use BIS instructions (observed in practice)
> for performance, with out-of-thin-air values (observed in practice), and
> the JavaThreads will occasionally crash during safepoint synchronization
> due to said out-of-thin-air values.
>
> So I guess the problem might be larger back when TLH was not default.
> But this seems conceptually wrong.
I always thought there were two dispatch tables and we simply switched
between them - not copied anything!
David
> /Erik
>
> On 2019-07-04 09:17, David Holmes wrote:
>> Hi Erik,
>>
>> On 4/07/2019 5:10 pm, Erik Österlund wrote:
>>> Hi Dan,
>>>
>>> Thanks for picking this up. The change looks good.
>>>
>>> However, when reviewing this, I looked at the code for actually
>>> restoring the table (ignore/notice safepoints). It copies the
>>> dispatch table for the interpreter. There is a comment stating it is
>>> important the copying is atomic for MT-safety, and I can definitely
>>> see why. However, the copying the line after that comment is in fact
>>> not atomic.
>>
>> Is it assuming "atomicity" by virtue of executing at a safepoint?
>>
>> David
>> -----
>>
>>> Here is the copying code in templateInterpreter.cpp:
>>>
>>> static inline void copy_table(address* from, address* to, int size) {
>>> // Copy non-overlapping tables. The copy has to occur word wise
>>> for MT safety.
>>> while (size-- > 0) *to++ = *from++;
>>> }
>>>
>>> Copying using a loop of non-volatile loads and stores can and
>>> definitely will on some compilers turn into memcpy calls instead as
>>> the compiler (correctly) considers that an equivalent transformation.
>>> And memcpy does not guarantee atomicity. Indeed on some platforms it
>>> is not atomic. On some platforms it will even enjoy out-of-thin-air
>>> values. Perhaps Copy::disjoint_words_atomic() would be a better
>>> choice for atomic word copying. If not, at the very least we should
>>> use Atomic::load/store here.
>>>
>>> Having said that, the fix for that issue seems like a separate RFE,
>>> because it has been sitting there for a lot longer than TLH has been
>>> around.
>>>
>>> Thanks,
>>> /Erik
>>>
>>> On 2019-07-04 04:04, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> Robbin recently discovered this issue with Thread Local Handshakes.
>>>> Since
>>>> he's not available at the moment, I'm handling the issue:
>>>>
>>>> JDK-8227117 normal interpreter table is not restored after
>>>> single stepping with TLH
>>>> https://bugs.openjdk.java.net/browse/JDK-8227117
>>>>
>>>> When using Thread Local Handshakes, the normal interpreter table is
>>>> not restored after single stepping. This issue is caused by the
>>>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to
>>>> restore the normal interpreter table for the "off" case.
>>>>
>>>> Prior to Thread Local Handshakes, this was a valid assumption to make.
>>>> SafepointSynchronize::end() has been refactored into
>>>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints()
>>>> on the global safepoint branch. That matches up with the call to
>>>> Interpreter::notice_safepoints() that is also on the global safepoint
>>>> branch.
>>>>
>>>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case
>>>> to call Interpreter::ignore_safepoints() directly.
>>>>
>>>> Here's the webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/
>>>>
>>>> The fix is just a small addition to VM_ChangeSingleStep::doit():
>>>>
>>>> if (_on) {
>>>> Interpreter::notice_safepoints();
>>>> + } else {
>>>> + Interpreter::ignore_safepoints();
>>>> }
>>>>
>>>> Everything else is just new logging support for future debugging of
>>>> interpreter table management and single stepping.
>>>>
>>>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms.
>>>> Mach5 Tier[4-6] on standard Oracle platforms is running now.
>>>>
>>>> Thanks, in advance, for questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>
More information about the serviceability-dev
mailing list