RFR [15] 8247696: Incorrect tail computation for large segments in AbstractMemorySegmentImpl::mismatch

Fri Jun 19 19:07:44 UTC 2020

Looks good!

Thanks
Maurizio

On 19/06/2020 11:56, Chris Hegarty wrote:
> Paul, Maurizio,
>
> This version incorporates all feedback so far.
>
> https://cr.openjdk.java.net/~chegar/8247696/webrev.01/
> Results on my machine:
>
> Benchmark  Mode  Cnt       Score        Error  Units
> BulkOps.mismatch_large_bytebuffer  avgt   30   88266.728 
> ?   4083.476  ns/op
> BulkOps.mismatch_large_segment     avgt   30   86141.343 
> ?   2450.450  ns/op
> BulkOps.mismatch_small_bytebuffer  avgt   30       6.360 ?   0.425  ns/op
> BulkOps.mismatch_small_segment     avgt   30       4.582 ?   1.040  ns/op
>
> -Chris.
>
>> On 19 Jun 2020, at 00:35, Paul Sandoz <paul.sandoz at oracle.com 
>> <mailto:paul.sandoz at oracle.com>> wrote:
>>
>> Thanks Chris.
>>
>>> On Jun 18, 2020, at 2:57 AM, Maurizio Cimadamore 
>>> <maurizio.cimadamore at oracle.com 
>>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>
>>> Thanks for looking at this Chris
>>>
>>> On 17/06/2020 21:56, Paul Sandoz wrote:
>>>> Hi Chris,
>>>>
>>>> AbstractMemorySegmentImpl
>>>> —
>>>>
>>>> In vectorizedMismatchLarge:
>>>>
>>>> 163             if (remaining > 7)
>>>> 164                 throw new InternalError("remaining greater than 
>>>> 7: " + remaining);
>>>> 165             i = length - remaining;
>>>> 166         }
>>>>
>>>> Should this check be an assert?
>>> I suggested that to Chris, since sometimes asserts are enabled when 
>>> running tests, sometimes are not - in langtools we moved away from 
>>> using asserts as we realized that in certain cases they were 
>>> silently failing. I'm ok with whatever standard you feel comfortable 
>>> with though.
>>
>> Automated running of tests enable assertions (-ea -esa), perhaps the 
>> use is more commonly accepted in library code than compiler code. I 
>> would favor so in this case if necessary (sometimes they are avoided 
>> to reduce inline thresholds).
>>
>>
>>>>
>>>> —
>>>>
>>>> This fix prompted me to think more deeply about the implementation 
>>>> of vectorizedMismatchLarge and its use of 
>>>> vectorizedMismatch. Sorry, I should have thought about this more 
>>>> throughly earlier on.
>>>>
>>>> We need to refine the approach, not for this patch, but something 
>>>> to follow up after. I think there are two issues.
>>>>
>>>> 1) The intrinsic method vectorizedMismatch could potentially bomb 
>>>> out at any point and return the bitwise compliment of the 
>>>> remaining number of elements to check.
>>>>
>>>> Obviously, there is no point doing that arbitrarily but a stub 
>>>> implementation for, say, x86 AVX-512 might decide bomb out for  < 
>>>> 64 remaining elements, rather than apply vector operations on 
>>>> smaller vector sizes or use a scalar loop. It does not today, but I 
>>>> think we should guard against that potentially happening, otherwise 
>>>> bad things can happen.
>>>
>>> So your worry here is that we'll end up with an infinite loop, right?
>>>
>>
>> Or more likely that an incorrect result is returned since tail 
>> elements will be skipped over as the offset and size is updated.
>>
>>
>>> If so, we could check remaining against previous remaining and bomb 
>>> out too if no further progress seem to be made?
>>>
>>
>> I think it's better to always terminate the loop after the last 
>> sub-range is checked, rather than unnecessarily calling twice.
>>
>> I am not confident the vectorizedMismatch intrinsic stub has been 
>> tested properly on very small lengths, since it's never directly 
>> called in such cases, so also keeping "remaining > 7" is good too.
>>
>> Paul.
>>
>>>>
>>>> I think the loop should exit when the last sub-range has been 
>>>> checked. We should rely on other tests to ensure the intrinsic 
>>>> method is operating efficiently.
>>>>
>>>>
>>>> 2) This method only works when operating on byte arrays. It will 
>>>> not work correctly if operating on short or long arrays, since we 
>>>> are not adjusting the length and offsets accordingly by the scale. 
>>>> It's probably easier to just rename this as 
>>>> vectorizedMismatchLargeForBytes and drop the log2ArrayIndexScale 
>>>> argument. Then expand later if need be. I still think the method 
>>>> rightly belongs in ArraysSupport.
>>>
>>> Yep - probably good idea to restrict on bytes, for now.
>>>
>>> Maurizio
>>>
>>>>
>>>> Paul.
>>>>
>>>>> On Jun 17, 2020, at 8:33 AM, Chris Hegarty 
>>>>> <chris.hegarty at oracle.com <mailto:chris.hegarty at oracle.com>> wrote:
>>>>>
>>>>> The MemorySegment::mismatch implementation added vectorized 
>>>>> mismatch of long sizes. The implementation is trivial, but 
>>>>> the starting point for a more optimized implementation, if needed. 
>>>>> ArraysSupport::vectorizedMismatchLarge incorrectly returns the 
>>>>> bitwise complement of the offset of the first mismatch, where it 
>>>>> should return the bitwise complement of the number of remaining 
>>>>> pairs of elements to be checked in the tail of the two arrays. The 
>>>>> AbstractMemorySegmentImpl::mismatch masked this problem, since 
>>>>> it seamlessly compared the remaining tail, which is larger than it 
>>>>> should be.
>>>>>
>>>>> Webrev:
>>>>> https://cr.openjdk.java.net/~chegar/8247696/webrev.00/
>>>>>
>>>>> I updated the exiting BulkOps micro-benchmark to cover mismatch. 
>>>>> Here are the results, compared to ByteBuffer::mismatch, on my machine:
>>>>>
>>>>> Benchmark                          Mode  Cnt  Score        Error 
>>>>>  Units
>>>>> BulkOps.mismatch_large_bytebuffer  avgt   30 740186.973 ? 
>>>>> 119314.207  ns/op
>>>>> BulkOps.mismatch_large_segment     avgt   30 683475.305 ? 
>>>>>  76355.043  ns/op
>>>>> BulkOps.mismatch_small_bytebuffer  avgt   30  7.367 ?      0.523 
>>>>>  ns/op
>>>>> BulkOps.mismatch_small_segment     avgt   30  4.140 ?      0.602 
>>>>>  ns/op
>>>>>
>>>>>
>>>>> -Chris.
>