RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer

Mon Jul 8 12:35:34 UTC 2019

Hi David,

On 7/7/19 8:08 PM, David Holmes wrote:
> On 7/07/2019 6:48 pm, Erik Osterlund wrote:
>> Yeah that switch statement code and yet another plain non-volatile 
>> load/store loop looks like complete nonsense unfortunately. It should 
>> at least use Atomic::load/store.
>>
>> Fortunately, on x86_64, I believe it will in practice yield word 
>> atomic copying anyway by chance. But it should be fixed anyway. *sigh*
>
> The requirement is for atomic word accesses, which properly aligned 
> addresses will always yield - we rely on that guarantee throughout to 
> avoid word-tearing. The issue is that the loop may be converted to a 
> 'bulk' copying operation that may not provide atomic word accesses. So 
> unless the Atomic::load/store will prevent the loop from being 
> converted they are not in and off themselves required for correctness 
> (else we need to use them nearly everywhere).
>
>> The real danger is SPARC though and its BIS instructions. I don’t 
>> have the code in front of me, but I really hope not to see that 
>> switch statement and non-volatile loop in that 
>> pd_disjoint_words_atomic() function.
>
> sparc uses the same loop.
>
> Let's face it, almost no body expects the compiler to do these kinds 
> of transformations. :(
>
>> And I agree that the atomic copying API should be used when we need 
>> atomic copying. And if it turns out the implementation of that API is 
>> not atomic, it should be fixed in that atomic copying API.
>
> I agree to some extent, but we assume atomic load/stores of words all 
> over the place - and rightly so. The issue here is that we need to 
> hide the loop inside an API that we can somehow prevent the C++ 
> compiler from screwing up. It's hardly intuitive or obvious when this 
> is needed e.g if I simply copy three adjacent words without a loop 
> could the compiler convert that to a block move that is non-atomic?
>
>> So I think this change looks good. But it looks like we are not done 
>> yet. :c
>
> I agree that changing the current code to use the atomic copy API to 
> convey intent is fine.

Thanks.

Dan

>
> Cheers,
> David
> -----
>
>> Thanks,
>> /Erik
>>
>>> On 7 Jul 2019, at 02:46, Daniel D. Daugherty 
>>> <daniel.daugherty at oracle.com> wrote:
>>>
>>> Added Erik O. to the "To:" list...
>>>
>>>
>>>> On 7/6/19 8:05 PM, Daniel D. Daugherty wrote:
>>>>> On 7/6/19 6:06 PM, David Holmes wrote:
>>>>> Hi Dan,
>>>>>
>>>>>> On 6/07/2019 11:53 pm, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> During the code review for the following fix:
>>>>>>
>>>>>>       JDK-8227117 normal interpreter table is not restored after 
>>>>>> single stepping with TLH
>>>>>>       https://bugs.openjdk.java.net/browse/JDK-8227117
>>>>>>
>>>>>> Erik O. noticed a potential race with templateInterpreter.cpp: 
>>>>>> copy_table()
>>>>>> depending on C++ compiler optimizations. The following bug is 
>>>>>> being used
>>>>>> to fix this issue:
>>>>>>
>>>>>>       JDK-8227338 templateInterpreter.cpp: copy_table() needs to 
>>>>>> be safer
>>>>>>       https://bugs.openjdk.java.net/browse/JDK-8227338
>>>>>>
>>>>>> Here's the webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/
>>>>>
>>>>> So the original code uses a loop to copy, while the new code calls 
>>>>> Copy::disjoint_words_atomic, but the implementation of that on x64 
>>>>> is just a loop same as the original AFAICS:
>>>>
>>>> Yup. I figure Erik O. will jump in here with his reasoning... :-)
>>>
>>> Thinking about it more... I think the answer is that we are 
>>> switching to
>>> calling code that specifies the type of behavior that we need:
>>>
>>>      Copy::disjoint_words_atomic()
>>>
>>> is what we need when we're not at a safepoint. If, down the road, we 
>>> find
>>> that the compiler does something with pd_disjoint_words_atomic() that
>>> breaks our expectation for pd_disjoint_words_atomic(), then we fix that
>>> version of pd_disjoint_words_atomic() and all the callers will be good
>>> again...  Or something like that...
>>>
>>> The version in 
>>> src/hotspot/os_cpu/solaris_x86/copy_solaris_x86.inline.hpp
>>> happens to be exactly our loop with no switch statement... which is
>>> particularly funny given Erik's observations about what at least one
>>> Solaris X64 compiler did to loops...
>>>
>>> Still, I'm just guessing here on a Saturday night... hopefully Erik
>>> will chime in here...
>>>
>>> Dan
>>>
>>>
>>>
>>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> static void pd_disjoint_words_atomic(const HeapWord* from, 
>>>>> HeapWord* to, size_t count) {
>>>>> #ifdef AMD64
>>>>>    switch (count) {
>>>>>    case 8:  to[7] = from[7];
>>>>>    case 7:  to[6] = from[6];
>>>>>    case 6:  to[5] = from[5];
>>>>>    case 5:  to[4] = from[4];
>>>>>    case 4:  to[3] = from[3];
>>>>>    case 3:  to[2] = from[2];
>>>>>    case 2:  to[1] = from[1];
>>>>>    case 1:  to[0] = from[0];
>>>>>    case 0:  break;
>>>>>    default:
>>>>>      while (count-- > 0) {
>>>>>        *to++ = *from++;
>>>>>      }
>>>>>      break;
>>>>>    }
>>>>> #else
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> This fix has been tested via Mach5 Tier[1-3] on Oracle's usual 
>>>>>> platforms.
>>>>>> Mach5 tier[4-6] is running now. It has also been tested with the 
>>>>>> manual
>>>>>> jdb test from JDK-8227117 using 'release' and 'fastdebug' bits.
>>>>>>
>>>>>> Thanks, in advance, for questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>
>>>
>>