RFR: 8328938: C2 SuperWord: disable vectorization for large stride and scale [v2]
Christian Hagedorn
chagedorn at openjdk.org
Thu Mar 28 15:19:33 UTC 2024
On Thu, 28 Mar 2024 13:57:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> **Problem**
>> In [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) / https://git.openjdk.org/jdk/pull/14785 I fixed the alignment with `AlignVector`. For that, I had to compute `abs(scale)` and `abs(stride)`, as well as `scale * stride`. The issue is that all of these values can overflow the int range (e.g. `abs(min_int) = min_int`).
>>
>> We hit asserts like:
>>
>> `# assert(is_power_of_2(value)) failed: value must be a power of 2: 0xffffffff80000000`
>> Happens because we take `abs(min_int)`, which is `min_int = 0x80000000`, and assuming this was a positive (unsigned) number is a power of 2 `2^31`. We then expand it to `long`, get `0xffffffff80000000`, which is not a power of 2 anymore. This violates the implicit assumptions, and we hit the assert.
>>
>> `# assert(q >= 1) failed: modulo value must be large enough`
>> We have `scale = 2^30` and `stride = 4 = 2^2`. For the alignment calculation we compute `scale * stride = 2^32`, which overflows the int range and becomes zero.
>>
>> Before [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) we could get similar issues with the (old) code in `SuperWord::ref_is_alignable`, if `AlignVector` is enabled:
>>
>>
>> int span = preloop_stride * p.scale_in_bytes();
>> ...
>> if (vw % span == 0) {
>>
>>
>> if `span == 0` because of overflow, then the `idiv` from the modulo gets a division by zero -> `SIGFPE`.
>>
>> But it seems the bug is possibly a regression from JDK20 b2 [JDK-8286197](https://bugs.openjdk.org/browse/JDK-8286197). Here we enabled certaint Unsafe memory access address patterns, and it is such patterns that the reproducer requires.
>>
>> **Solution**
>> I could either patch up all the code that works with `scale` and `stride`, and make sure no overflows ever happen. But that is quite involved and error prone.
>>
>> I now just disable vectorization for large `scale` and `stride`. This should not have any performance impact, because such large `scale` and `stride` would lead to highly inefficient memory accesses, since they are spaced very far apart.
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> improve comments
I agree with that. That's a reasonable, safe and easy solution.
src/hotspot/share/opto/vectorization.cpp line 396:
> 394: NOT_PRODUCT(_tracer.ctor_6(mem);)
> 395:
> 396: // In the pointer analysis, and especially the AlignVector analysis we assume that
Suggestion:
// In the pointer analysis, and especially the AlignVector, analysis we assume that
src/hotspot/share/opto/vectorization.cpp line 402:
> 400: // to at least allow small and moderately large stride and scale. Therefore, we
> 401: // allow values up to 2^30, which is only a factor 2 smaller than the max/min int
> 402: // values. Normal performance relevant code will have much lower. And the restriction
Suggestion:
// allow values up to 2^30, which is only a factor 2 smaller than the max/min int.
// Normal performance relevant code will have much lower values. And the restriction
-------------
Marked as reviewed by chagedorn (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/18485#pullrequestreview-1966505218
PR Review Comment: https://git.openjdk.org/jdk/pull/18485#discussion_r1543149553
PR Review Comment: https://git.openjdk.org/jdk/pull/18485#discussion_r1543154067
More information about the hotspot-compiler-dev
mailing list