RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer
Daniel D. Daugherty
daniel.daugherty at oracle.com
Mon Jul 8 12:35:34 UTC 2019
Hi David,
On 7/7/19 8:08 PM, David Holmes wrote:
> On 7/07/2019 6:48 pm, Erik Osterlund wrote:
>> Yeah that switch statement code and yet another plain non-volatile
>> load/store loop looks like complete nonsense unfortunately. It should
>> at least use Atomic::load/store.
>>
>> Fortunately, on x86_64, I believe it will in practice yield word
>> atomic copying anyway by chance. But it should be fixed anyway. *sigh*
>
> The requirement is for atomic word accesses, which properly aligned
> addresses will always yield - we rely on that guarantee throughout to
> avoid word-tearing. The issue is that the loop may be converted to a
> 'bulk' copying operation that may not provide atomic word accesses. So
> unless the Atomic::load/store will prevent the loop from being
> converted they are not in and off themselves required for correctness
> (else we need to use them nearly everywhere).
>
>> The real danger is SPARC though and its BIS instructions. I don’t
>> have the code in front of me, but I really hope not to see that
>> switch statement and non-volatile loop in that
>> pd_disjoint_words_atomic() function.
>
> sparc uses the same loop.
>
> Let's face it, almost no body expects the compiler to do these kinds
> of transformations. :(
>
>> And I agree that the atomic copying API should be used when we need
>> atomic copying. And if it turns out the implementation of that API is
>> not atomic, it should be fixed in that atomic copying API.
>
> I agree to some extent, but we assume atomic load/stores of words all
> over the place - and rightly so. The issue here is that we need to
> hide the loop inside an API that we can somehow prevent the C++
> compiler from screwing up. It's hardly intuitive or obvious when this
> is needed e.g if I simply copy three adjacent words without a loop
> could the compiler convert that to a block move that is non-atomic?
>
>> So I think this change looks good. But it looks like we are not done
>> yet. :c
>
> I agree that changing the current code to use the atomic copy API to
> convey intent is fine.
Thanks.
Dan
>
> Cheers,
> David
> -----
>
>> Thanks,
>> /Erik
>>
>>> On 7 Jul 2019, at 02:46, Daniel D. Daugherty
>>> <daniel.daugherty at oracle.com> wrote:
>>>
>>> Added Erik O. to the "To:" list...
>>>
>>>
>>>> On 7/6/19 8:05 PM, Daniel D. Daugherty wrote:
>>>>> On 7/6/19 6:06 PM, David Holmes wrote:
>>>>> Hi Dan,
>>>>>
>>>>>> On 6/07/2019 11:53 pm, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> During the code review for the following fix:
>>>>>>
>>>>>> JDK-8227117 normal interpreter table is not restored after
>>>>>> single stepping with TLH
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227117
>>>>>>
>>>>>> Erik O. noticed a potential race with templateInterpreter.cpp:
>>>>>> copy_table()
>>>>>> depending on C++ compiler optimizations. The following bug is
>>>>>> being used
>>>>>> to fix this issue:
>>>>>>
>>>>>> JDK-8227338 templateInterpreter.cpp: copy_table() needs to
>>>>>> be safer
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227338
>>>>>>
>>>>>> Here's the webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/
>>>>>
>>>>> So the original code uses a loop to copy, while the new code calls
>>>>> Copy::disjoint_words_atomic, but the implementation of that on x64
>>>>> is just a loop same as the original AFAICS:
>>>>
>>>> Yup. I figure Erik O. will jump in here with his reasoning... :-)
>>>
>>> Thinking about it more... I think the answer is that we are
>>> switching to
>>> calling code that specifies the type of behavior that we need:
>>>
>>> Copy::disjoint_words_atomic()
>>>
>>> is what we need when we're not at a safepoint. If, down the road, we
>>> find
>>> that the compiler does something with pd_disjoint_words_atomic() that
>>> breaks our expectation for pd_disjoint_words_atomic(), then we fix that
>>> version of pd_disjoint_words_atomic() and all the callers will be good
>>> again... Or something like that...
>>>
>>> The version in
>>> src/hotspot/os_cpu/solaris_x86/copy_solaris_x86.inline.hpp
>>> happens to be exactly our loop with no switch statement... which is
>>> particularly funny given Erik's observations about what at least one
>>> Solaris X64 compiler did to loops...
>>>
>>> Still, I'm just guessing here on a Saturday night... hopefully Erik
>>> will chime in here...
>>>
>>> Dan
>>>
>>>
>>>
>>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> static void pd_disjoint_words_atomic(const HeapWord* from,
>>>>> HeapWord* to, size_t count) {
>>>>> #ifdef AMD64
>>>>> switch (count) {
>>>>> case 8: to[7] = from[7];
>>>>> case 7: to[6] = from[6];
>>>>> case 6: to[5] = from[5];
>>>>> case 5: to[4] = from[4];
>>>>> case 4: to[3] = from[3];
>>>>> case 3: to[2] = from[2];
>>>>> case 2: to[1] = from[1];
>>>>> case 1: to[0] = from[0];
>>>>> case 0: break;
>>>>> default:
>>>>> while (count-- > 0) {
>>>>> *to++ = *from++;
>>>>> }
>>>>> break;
>>>>> }
>>>>> #else
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> This fix has been tested via Mach5 Tier[1-3] on Oracle's usual
>>>>>> platforms.
>>>>>> Mach5 tier[4-6] is running now. It has also been tested with the
>>>>>> manual
>>>>>> jdb test from JDK-8227117 using 'release' and 'fastdebug' bits.
>>>>>>
>>>>>> Thanks, in advance, for questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>
>>>
>>
More information about the serviceability-dev
mailing list