RFR [15] 8247696: Incorrect tail computation for large segments in AbstractMemorySegmentImpl::mismatch

Mon Jun 22 17:36:31 UTC 2020

Paul pointed out an issue, off list, where subsequent subranges
could still result in a call to the intrinsic. That is now fixed in-place
and the test amended to cover the scenario.

https://cr.openjdk.java.net/~chegar/8247696/webrev.01/

-Chris.

> On 19 Jun 2020, at 11:56, Chris Hegarty <chris.hegarty at oracle.com> wrote:
> 
> Paul, Maurizio,
> 
> This version incorporates all feedback so far.
> 
> https://cr.openjdk.java.net/~chegar/8247696/webrev.01/
> Results on my machine:
> 
> Benchmark                          Mode  Cnt       Score        Error  Units
> BulkOps.mismatch_large_bytebuffer  avgt   30   88266.728 ?   4083.476  ns/op
> BulkOps.mismatch_large_segment     avgt   30   86141.343 ?   2450.450  ns/op
> BulkOps.mismatch_small_bytebuffer  avgt   30       6.360 ?      0.425  ns/op
> BulkOps.mismatch_small_segment     avgt   30       4.582 ?      1.040  ns/op
> 
> -Chris.
> 
>> On 19 Jun 2020, at 00:35, Paul Sandoz <paul.sandoz at oracle.com> wrote:
>> 
>> Thanks Chris.
>> 
>>> On Jun 18, 2020, at 2:57 AM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
>>> 
>>> Thanks for looking at this Chris
>>> 
>>> On 17/06/2020 21:56, Paul Sandoz wrote:
>>>> Hi Chris,
>>>> 
>>>> AbstractMemorySegmentImpl
>>>> —
>>>> 
>>>> In vectorizedMismatchLarge:
>>>> 
>>>> 163             if (remaining > 7)
>>>> 164                 throw new InternalError("remaining greater than 7: " + remaining);
>>>> 165             i = length - remaining;
>>>> 166         }
>>>> 
>>>> Should this check be an assert?
>>> I suggested that to Chris, since sometimes asserts are enabled when running tests, sometimes are not - in langtools we moved away from using asserts as we realized that in certain cases they were silently failing. I'm ok with whatever standard you feel comfortable with though.
>> 
>> Automated running of tests enable assertions (-ea -esa), perhaps the use is more commonly accepted in library code than compiler code. I would favor so in this case if necessary (sometimes they are avoided to reduce inline thresholds).
>> 
>> 
>>>> 
>>>> —
>>>> 
>>>> This fix prompted me to think more deeply about the implementation of vectorizedMismatchLarge and its use of vectorizedMismatch. Sorry, I should have thought about this more throughly earlier on.
>>>> 
>>>> We need to refine the approach, not for this patch, but something to follow up after. I think there are two issues.
>>>> 
>>>> 1) The intrinsic method vectorizedMismatch could potentially bomb out at any point and return the bitwise compliment of the remaining number of elements to check.
>>>> 
>>>> Obviously, there is no point doing that arbitrarily but a stub implementation for, say, x86 AVX-512 might decide bomb out for  < 64 remaining elements, rather than apply vector operations on smaller vector sizes or use a scalar loop. It does not today, but I think we should guard against that potentially happening, otherwise bad things can happen.
>>> 
>>> So your worry here is that we'll end up with an infinite loop, right?
>>> 
>> 
>> Or more likely that an incorrect result is returned since tail elements will be skipped over as the offset and size is updated.
>> 
>> 
>>> If so, we could check remaining against previous remaining and bomb out too if no further progress seem to be made?
>>> 
>> 
>> I think it's better to always terminate the loop after the last sub-range is checked, rather than unnecessarily calling twice.
>> 
>> I am not confident the vectorizedMismatch intrinsic stub has been tested properly on very small lengths, since it's never directly called in such cases, so also keeping "remaining > 7" is good too. 
>> 
>> Paul.
>> 
>>>> 
>>>> I think the loop should exit when the last sub-range has been checked. We should rely on other tests to ensure the intrinsic method is operating efficiently.
>>>> 
>>>> 
>>>> 2) This method only works when operating on byte arrays. It will not work correctly if operating on short or long arrays, since we are not adjusting the length and offsets accordingly by the scale. It's probably easier to just rename this as vectorizedMismatchLargeForBytes and drop the log2ArrayIndexScale argument. Then expand later if need be. I still think the method rightly belongs in ArraysSupport.
>>> 
>>> Yep - probably good idea to restrict on bytes, for now.
>>> 
>>> Maurizio
>>> 
>>>> 
>>>> Paul.
>>>> 
>>>>> On Jun 17, 2020, at 8:33 AM, Chris Hegarty <chris.hegarty at oracle.com> wrote:
>>>>> 
>>>>> The MemorySegment::mismatch implementation added vectorized mismatch of long sizes. The implementation is trivial, but the starting point for a more optimized implementation, if needed. ArraysSupport::vectorizedMismatchLarge incorrectly returns the bitwise complement of the offset of the first mismatch, where it should return the bitwise complement of the number of remaining pairs of elements to be checked in the tail of the two arrays. The AbstractMemorySegmentImpl::mismatch masked this problem, since it seamlessly compared the remaining tail, which is larger than it should be.
>>>>> 
>>>>> Webrev:
>>>>> https://cr.openjdk.java.net/~chegar/8247696/webrev.00/
>>>>> 
>>>>> I updated the exiting BulkOps micro-benchmark to cover mismatch. Here are the results, compared to ByteBuffer::mismatch, on my machine:
>>>>> 
>>>>> Benchmark                          Mode  Cnt        Score        Error  Units
>>>>> BulkOps.mismatch_large_bytebuffer  avgt   30   740186.973 ? 119314.207  ns/op
>>>>> BulkOps.mismatch_large_segment     avgt   30   683475.305 ?  76355.043  ns/op
>>>>> BulkOps.mismatch_small_bytebuffer  avgt   30        7.367 ?      0.523  ns/op
>>>>> BulkOps.mismatch_small_segment     avgt   30        4.140 ?      0.602  ns/op
>>>>> 
>>>>> 
>>>>> -Chris.
>