RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v4]
Srinivas Vamsi Parasa
duke at openjdk.org
Fri Jun 2 04:18:08 UTC 2023
On Thu, 1 Jun 2023 17:55:19 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:
>> I notice that
>>
>> zmm_t ymm_vector<float>::max(zmm_t x, zmm_t y) {
>> return _mm256_max_ps(x, y);
>> }
>>
>> This is not quite right, `Arrays.sort` uses the total order imposed by `Double.compare` to sort the array, while `_mm256_max_ps(x, y)` does `x > y ? x : y` which is different.
>
>> I notice that
>>
>> ```
>> zmm_t ymm_vector<float>::max(zmm_t x, zmm_t y) {
>> return _mm256_max_ps(x, y);
>> }
>> ```
>>
>> This is not quite right, `Arrays.sort` uses the total order imposed by `Double.compare` to sort the array, while `_mm256_max_ps(x, y)` does `x > y ? x : y` which is different.
>
> Hi @merykitty
> The algorithm is working for double as expected (i.e. implementing the total order). For example, for the input below:
> ` double[] arrayUnsorted = {-0.0, Double.NaN, 15.75, Double.POSITIVE_INFINITY, -234.4869, Double.NEGATIVE_INFINITY, +0.0, 100.045};
> `
> It's showing the correct output after sorting as expected:
> `[-Infinity, -234.4869, -0.0, 0.0, 15.75, 100.045, Infinity, NaN]`
> Hi @vamsi-parasa ! Given https://bugs.openjdk.org/browse/JDK-8295496 I have noticed how much important is to add benchmark cases where offset and length parameters change and/or differ from the usual 0 and the whole array length. Equally important is to warmup with different combinations of them in order to "pollute" the JIT existing decisions, making the compiled method (and stubs) to appear more similar to what users would observe in a real world scenario. Playing with the benchmark parameters like this, together with the advice of @theRealAph to try with small inputs (that matters a lot) would unveil any perf difference with the current impl. In addition, I understand by https://github.com/openjdk/jdk/pull/14227/files#diff-1929ace9ae6df116e2fa2a718ed3924d9dae9a2daea454ca9a78177c21477aa3R5237 that's still not the case for such, at this implementation stage, hence mine is a wish for the final round impl for this PR. 🙏
Hi @franz1981, thank you for the suggestions! The algorithm was tested to sort only a part of the array with non-zero offsets and length. I will upstream those benchmarks/tests as well.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1573121022
More information about the hotspot-compiler-dev
mailing list