RFR [15] 8247696: Incorrect tail computation for large segments in AbstractMemorySegmentImpl::mismatch
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Jun 19 19:07:44 UTC 2020
Looks good!
Thanks
Maurizio
On 19/06/2020 11:56, Chris Hegarty wrote:
> Paul, Maurizio,
>
> This version incorporates all feedback so far.
>
> https://cr.openjdk.java.net/~chegar/8247696/webrev.01/
> Results on my machine:
>
> Benchmark Mode Cnt Score Error Units
> BulkOps.mismatch_large_bytebuffer avgt 30 88266.728
> ? 4083.476 ns/op
> BulkOps.mismatch_large_segment avgt 30 86141.343
> ? 2450.450 ns/op
> BulkOps.mismatch_small_bytebuffer avgt 30 6.360 ? 0.425 ns/op
> BulkOps.mismatch_small_segment avgt 30 4.582 ? 1.040 ns/op
>
> -Chris.
>
>> On 19 Jun 2020, at 00:35, Paul Sandoz <paul.sandoz at oracle.com
>> <mailto:paul.sandoz at oracle.com>> wrote:
>>
>> Thanks Chris.
>>
>>> On Jun 18, 2020, at 2:57 AM, Maurizio Cimadamore
>>> <maurizio.cimadamore at oracle.com
>>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>
>>> Thanks for looking at this Chris
>>>
>>> On 17/06/2020 21:56, Paul Sandoz wrote:
>>>> Hi Chris,
>>>>
>>>> AbstractMemorySegmentImpl
>>>> —
>>>>
>>>> In vectorizedMismatchLarge:
>>>>
>>>> 163 if (remaining > 7)
>>>> 164 throw new InternalError("remaining greater than
>>>> 7: " + remaining);
>>>> 165 i = length - remaining;
>>>> 166 }
>>>>
>>>> Should this check be an assert?
>>> I suggested that to Chris, since sometimes asserts are enabled when
>>> running tests, sometimes are not - in langtools we moved away from
>>> using asserts as we realized that in certain cases they were
>>> silently failing. I'm ok with whatever standard you feel comfortable
>>> with though.
>>
>> Automated running of tests enable assertions (-ea -esa), perhaps the
>> use is more commonly accepted in library code than compiler code. I
>> would favor so in this case if necessary (sometimes they are avoided
>> to reduce inline thresholds).
>>
>>
>>>>
>>>> —
>>>>
>>>> This fix prompted me to think more deeply about the implementation
>>>> of vectorizedMismatchLarge and its use of
>>>> vectorizedMismatch. Sorry, I should have thought about this more
>>>> throughly earlier on.
>>>>
>>>> We need to refine the approach, not for this patch, but something
>>>> to follow up after. I think there are two issues.
>>>>
>>>> 1) The intrinsic method vectorizedMismatch could potentially bomb
>>>> out at any point and return the bitwise compliment of the
>>>> remaining number of elements to check.
>>>>
>>>> Obviously, there is no point doing that arbitrarily but a stub
>>>> implementation for, say, x86 AVX-512 might decide bomb out for <
>>>> 64 remaining elements, rather than apply vector operations on
>>>> smaller vector sizes or use a scalar loop. It does not today, but I
>>>> think we should guard against that potentially happening, otherwise
>>>> bad things can happen.
>>>
>>> So your worry here is that we'll end up with an infinite loop, right?
>>>
>>
>> Or more likely that an incorrect result is returned since tail
>> elements will be skipped over as the offset and size is updated.
>>
>>
>>> If so, we could check remaining against previous remaining and bomb
>>> out too if no further progress seem to be made?
>>>
>>
>> I think it's better to always terminate the loop after the last
>> sub-range is checked, rather than unnecessarily calling twice.
>>
>> I am not confident the vectorizedMismatch intrinsic stub has been
>> tested properly on very small lengths, since it's never directly
>> called in such cases, so also keeping "remaining > 7" is good too.
>>
>> Paul.
>>
>>>>
>>>> I think the loop should exit when the last sub-range has been
>>>> checked. We should rely on other tests to ensure the intrinsic
>>>> method is operating efficiently.
>>>>
>>>>
>>>> 2) This method only works when operating on byte arrays. It will
>>>> not work correctly if operating on short or long arrays, since we
>>>> are not adjusting the length and offsets accordingly by the scale.
>>>> It's probably easier to just rename this as
>>>> vectorizedMismatchLargeForBytes and drop the log2ArrayIndexScale
>>>> argument. Then expand later if need be. I still think the method
>>>> rightly belongs in ArraysSupport.
>>>
>>> Yep - probably good idea to restrict on bytes, for now.
>>>
>>> Maurizio
>>>
>>>>
>>>> Paul.
>>>>
>>>>> On Jun 17, 2020, at 8:33 AM, Chris Hegarty
>>>>> <chris.hegarty at oracle.com <mailto:chris.hegarty at oracle.com>> wrote:
>>>>>
>>>>> The MemorySegment::mismatch implementation added vectorized
>>>>> mismatch of long sizes. The implementation is trivial, but
>>>>> the starting point for a more optimized implementation, if needed.
>>>>> ArraysSupport::vectorizedMismatchLarge incorrectly returns the
>>>>> bitwise complement of the offset of the first mismatch, where it
>>>>> should return the bitwise complement of the number of remaining
>>>>> pairs of elements to be checked in the tail of the two arrays. The
>>>>> AbstractMemorySegmentImpl::mismatch masked this problem, since
>>>>> it seamlessly compared the remaining tail, which is larger than it
>>>>> should be.
>>>>>
>>>>> Webrev:
>>>>> https://cr.openjdk.java.net/~chegar/8247696/webrev.00/
>>>>>
>>>>> I updated the exiting BulkOps micro-benchmark to cover mismatch.
>>>>> Here are the results, compared to ByteBuffer::mismatch, on my machine:
>>>>>
>>>>> Benchmark Mode Cnt Score Error
>>>>> Units
>>>>> BulkOps.mismatch_large_bytebuffer avgt 30 740186.973 ?
>>>>> 119314.207 ns/op
>>>>> BulkOps.mismatch_large_segment avgt 30 683475.305 ?
>>>>> 76355.043 ns/op
>>>>> BulkOps.mismatch_small_bytebuffer avgt 30 7.367 ? 0.523
>>>>> ns/op
>>>>> BulkOps.mismatch_small_segment avgt 30 4.140 ? 0.602
>>>>> ns/op
>>>>>
>>>>>
>>>>> -Chris.
>
More information about the core-libs-dev
mailing list