RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH
Daniel D. Daugherty
daniel.daugherty at oracle.com
Fri Jul 5 17:38:37 UTC 2019
On 7/4/19 5:38 AM, David Holmes wrote:
> Hi Erik,
>
> On 4/07/2019 6:08 pm, Erik Österlund wrote:
>> Hi David,
>>
>> When you run without TLH, this copying mechanism is used to
>> synchronize the safepoint while JavaThreads are running. The
>> interpreter doesn't emit any polls then. Instead it clobbers the
>> dispatch table. JavaThreads will be reading from the dispatch table
>> while it is being (non-atomically) modified. That could crash. For
>> example with the Solaris + studio + SPARC - TLH configuration, the
>> compiler will almost certainly emit a memcpy (this transformation has
>> been observed in practice), the memcpy will use BIS instructions
>> (observed in practice) for performance, with out-of-thin-air values
>> (observed in practice), and the JavaThreads will occasionally crash
>> during safepoint synchronization due to said out-of-thin-air values.
>>
>> So I guess the problem might be larger back when TLH was not default.
>> But this seems conceptually wrong.
>
> I always thought there were two dispatch tables and we simply switched
> between them - not copied anything!
Based on my search back into the TeamWare repos, it looks like we have
always copied the table...
Dan
>
> David
>
>> /Erik
>>
>> On 2019-07-04 09:17, David Holmes wrote:
>>> Hi Erik,
>>>
>>> On 4/07/2019 5:10 pm, Erik Österlund wrote:
>>>> Hi Dan,
>>>>
>>>> Thanks for picking this up. The change looks good.
>>>>
>>>> However, when reviewing this, I looked at the code for actually
>>>> restoring the table (ignore/notice safepoints). It copies the
>>>> dispatch table for the interpreter. There is a comment stating it
>>>> is important the copying is atomic for MT-safety, and I can
>>>> definitely see why. However, the copying the line after that
>>>> comment is in fact not atomic.
>>>
>>> Is it assuming "atomicity" by virtue of executing at a safepoint?
>>>
>>> David
>>> -----
>>>
>>>> Here is the copying code in templateInterpreter.cpp:
>>>>
>>>> static inline void copy_table(address* from, address* to, int size) {
>>>> // Copy non-overlapping tables. The copy has to occur word wise
>>>> for MT safety.
>>>> while (size-- > 0) *to++ = *from++;
>>>> }
>>>>
>>>> Copying using a loop of non-volatile loads and stores can and
>>>> definitely will on some compilers turn into memcpy calls instead as
>>>> the compiler (correctly) considers that an equivalent
>>>> transformation. And memcpy does not guarantee atomicity. Indeed on
>>>> some platforms it is not atomic. On some platforms it will even
>>>> enjoy out-of-thin-air values. Perhaps Copy::disjoint_words_atomic()
>>>> would be a better choice for atomic word copying. If not, at the
>>>> very least we should use Atomic::load/store here.
>>>>
>>>> Having said that, the fix for that issue seems like a separate RFE,
>>>> because it has been sitting there for a lot longer than TLH has
>>>> been around.
>>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>> On 2019-07-04 04:04, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> Robbin recently discovered this issue with Thread Local
>>>>> Handshakes. Since
>>>>> he's not available at the moment, I'm handling the issue:
>>>>>
>>>>> JDK-8227117 normal interpreter table is not restored after
>>>>> single stepping with TLH
>>>>> https://bugs.openjdk.java.net/browse/JDK-8227117
>>>>>
>>>>> When using Thread Local Handshakes, the normal interpreter table is
>>>>> not restored after single stepping. This issue is caused by the
>>>>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to
>>>>> restore the normal interpreter table for the "off" case.
>>>>>
>>>>> Prior to Thread Local Handshakes, this was a valid assumption to
>>>>> make.
>>>>> SafepointSynchronize::end() has been refactored into
>>>>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints()
>>>>> on the global safepoint branch. That matches up with the call to
>>>>> Interpreter::notice_safepoints() that is also on the global safepoint
>>>>> branch.
>>>>>
>>>>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case
>>>>> to call Interpreter::ignore_safepoints() directly.
>>>>>
>>>>> Here's the webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/
>>>>>
>>>>> The fix is just a small addition to VM_ChangeSingleStep::doit():
>>>>>
>>>>> if (_on) {
>>>>> Interpreter::notice_safepoints();
>>>>> + } else {
>>>>> + Interpreter::ignore_safepoints();
>>>>> }
>>>>>
>>>>> Everything else is just new logging support for future debugging of
>>>>> interpreter table management and single stepping.
>>>>>
>>>>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle
>>>>> platforms.
>>>>> Mach5 Tier[4-6] on standard Oracle platforms is running now.
>>>>>
>>>>> Thanks, in advance, for questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>
More information about the serviceability-dev
mailing list