RFR: 8325991: Accelerate Poly1305 on x86_64 using AVX2 instructions [v3]

Srinivas Vamsi Parasa duke at openjdk.org
Wed Feb 21 05:40:14 UTC 2024


On Sun, 18 Feb 2024 13:37:16 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   change overloaded C to use COEFF
>
> src/hotspot/cpu/x86/assembler_x86.cpp line 5150:
> 
>> 5148: 
>> 5149: void Assembler::vpmadd52luq(XMMRegister dst, XMMRegister src1, Address src2, bool merge, int vector_len) {
>> 5150:   assert(VM_Version::supports_avxifma(), "");
> 
> Please add an assertion for vector_len to be either 128 or 256 bit.

Please see the assertions added.

> src/hotspot/cpu/x86/assembler_x86.cpp line 5152:
> 
>> 5150:   assert(VM_Version::supports_avxifma(), "");
>> 5151:   InstructionMark im(this);
>> 5152:   InstructionAttr attributes(vector_len, /* rex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ false, /* uses_vl */ true);
> 
> uses_vl should be false, its a VEX encoded instruction.

Updated uses_vl to false.

> src/hotspot/cpu/x86/assembler_x86.cpp line 5155:
> 
>> 5153:   if (merge) {
>> 5154:     attributes.reset_is_clear_context();
>> 5155:   }
> 
> As of now, merge semantics are only applicable to AVX512 instructions accepting an opmask register.

Pls see the merge semantics removed.

> src/hotspot/cpu/x86/assembler_x86.cpp line 5167:
> 
>> 5165: 
>> 5166: void Assembler::vpmadd52luq(XMMRegister dst, XMMRegister src1, XMMRegister src2, bool merge, int vector_len) {
>> 5167:   assert(VM_Version::supports_avxifma(), "");
> 
> assertion for vector lengths, same as above.

Added the assertions.

> src/hotspot/cpu/x86/assembler_x86.cpp line 5221:
> 
>> 5219:     attributes.reset_is_clear_context();
>> 5220:   }
>> 5221: 
> 
> Above comments applicable to this routine also.

Done.

> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 1367:
> 
>> 1365: 
>> 1366:   // VECTOR LOOP: process 4 * 16-byte message blocks at a time
>> 1367:   __ bind(L_process256Loop);
> 
> Add appropriate alignment at the beginning of vector loop.

Please see the stack setup and alignment is being done in lines 1170-1176.

> src/hotspot/cpu/x86/vm_version_x86.hpp line 302:
> 
>> 300:     uint32_t value;
>> 301:   };
>> 302: 
> 
> We can avoid creating additional structures for SefCpuid7Ecx1Ebx, SefCpuid7Ecx1Ecx and SefCpuid7Ecx1Edx. They occupy space over stack and none of their bits are being used.

Please see the unused unions removed in the latest commit pushed.

> src/hotspot/cpu/x86/vm_version_x86.hpp line 482:
> 
>> 480:     SefCpuid7Ecx1Ebx sef_cpuid7_ecx1_ebx;
>> 481:     SefCpuid7Ecx1Ecx sef_cpuid7_ecx1_ecx;
>> 482:     SefCpuid7Ecx1Edx sef_cpuid7_ecx1_edx;
> 
> Please remove SefCpuid7Ecx1Ebx, SefCpuid7Ecx1Ecx and SefCpuid7Ecx1Edx field definitions.

Please see the field declarations removed in the latest commit pushed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17881#discussion_r1496934558
PR Review Comment: https://git.openjdk.org/jdk/pull/17881#discussion_r1496934387
PR Review Comment: https://git.openjdk.org/jdk/pull/17881#discussion_r1496934855
PR Review Comment: https://git.openjdk.org/jdk/pull/17881#discussion_r1496934999
PR Review Comment: https://git.openjdk.org/jdk/pull/17881#discussion_r1496935129
PR Review Comment: https://git.openjdk.org/jdk/pull/17881#discussion_r1496936195
PR Review Comment: https://git.openjdk.org/jdk/pull/17881#discussion_r1496933154
PR Review Comment: https://git.openjdk.org/jdk/pull/17881#discussion_r1496933829


More information about the hotspot-compiler-dev mailing list