VectorAPI VectorInsert Intrinsic

Rukmannagari, Shravya shravya.rukmannagari at intel.com
Wed Jun 13 19:29:28 UTC 2018


Hi Vladimir,
There is a fix for 512 code, the vmovdqu has to be changed to evmovdquq. Everything else looks good and thanks a lot once again for making these changes. I'll update the patch once your changes are checked in.

Thanks,
Shravya.

-----Original Message-----
From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] 
Sent: Friday, June 8, 2018 5:33 AM
To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; panama-dev at openjdk.java.net
Subject: Re: VectorAPI VectorInsert Intrinsic



On 08/06/2018 02:50, Rukmannagari, Shravya wrote:
> Hi Vladimir,
> Thanks for reviewing the patch. I updated the patch to reflect the changes. Regarding the variable indexed element, we currently haven't given a thought about it, but that seems to be the ultimate design goal for inserting, I can work on it in the next few days.

I'd like to hear other opinions, but IMO it's quite important case to cover w.r.t. predictable performance goal.

Regarding implementation considerations, I'd try to extract variable index variants into vector stubs and share them among all use sites.

Best regards,
Vladimir Ivanov

> -----Original Message-----
> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
> Sent: Wednesday, June 6, 2018 12:52 AM
> To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; 
> panama-dev at openjdk.java.net
> Subject: Re: VectorAPI VectorInsert Intrinsic
> 
> 
> Additionally:
> 
> +instruct rvinsert64B(vecZ dst, vecZ src, vecZ tmp, rRegI val, immI 
> +idx) %{
> +  predicate(UseAVX > 0 &&
> n->bottom_type()->is_vect()->element_basic_type() == T_BYTE);
> 
> +instruct rvinsert32S(vecZ dst, vecZ src, vecZ tmp, rRegI val, immI 
> +idx) %{
> +  predicate(UseAVX > 0 &&
> n->bottom_type()->is_vect()->element_basic_type() == T_SHORT);
> 
> +instruct rvinsert16I(vecZ dst, vecZ src, vecZ tmp, rRegI val, immI 
> +idx) %{
> +  predicate(UseAVX > 0 &&
> n->bottom_type()->is_vect()->element_basic_type() == T_INT);
> 
> +instruct rvinsert8L(vecZ dst, vecZ src, vecZ tmp, rRegL val, immI 
> +idx) %{
> +  predicate(UseAVX > 0 &&
> n->bottom_type()->is_vect()->element_basic_type() == T_LONG);
> 
> +  instruct rvinsert16F(vecZ dst, vecZ src, vecZ tmp, regF val, immI idx) %{
> +    predicate(UseAVX > 0 &&
> n->bottom_type()->is_vect()->element_basic_type() == T_FLOAT);
> 
> +  instruct rvinsert8D(vecZ dst, vecZ src, vecZ tmp, regD val, rRegL
> tmp1, immI idx) %{
> +    predicate(UseAVX > 0 &&
> n->bottom_type()->is_vect()->element_basic_type() == T_DOUBLE);
> 
> Shouldn't UseAVX > 2 be used instead?
> 
> 
> Also, I suspect *_avx rules are never used.
> 
> +instruct rvinsert16B(vecX dst, vecX src, rRegI val, immI idx) %{
> +  predicate(UseSSE > 3 &&
> n->bottom_type()->is_vect()->element_basic_type() == T_BYTE);
> 
> +instruct rvinsert16B_avx(vecX dst, vecX src, rRegI val, immI idx) %{
> +  predicate(UseAVX > 0 &&
> n->bottom_type()->is_vect()->element_basic_type() == T_BYTE);
> 
> Both predicates are true when UseAVX > 0 and the first matching one is 
> used always. Don't you need "UseSSE > 3 && UseAVX == 0" in the first case?
> 
> Best regards,
> Vladimir Ivanov
> 
> On 06/06/2018 02:30, Vladimir Ivanov wrote:
>>
>>> http://cr.openjdk.java.net/~srukmannagar/VectorAPI_vectorInsert/webr
>>> ev.00/
>>>
>>
>> Overall, looks good!
>>
>> +instruct rvinsert32B(vecY dst, vecY src, vecY tmp, rRegI val, immI 
>> +idx) %{
>> ...
>> +    if(id <= 15) {
>> ...
>> +      __ vpinsrb($tmp$$XMMRegister, $tmp$$XMMRegister, 
>> +$val$$Register,
>> id - 16);
>> ...
>>
>> +  instruct rvinsert8D(vecZ dst, vecZ src, vecZ tmp, regD val, rRegL
>> tmp1, immI idx) %{
>> +      } else if(id <= 5) {
>> ...
>> +        __ vpinsrq($tmp$$XMMRegister, $tmp$$XMMRegister,
>> $tmp1$$Register, id - 6);
>>
>> "id - 16", "id - 6". Looks like copy-paste errors.
>>
>>
>>
>> +instruct rvinsert8B(vecD dst, vecD src, rRegI val, immI idx) %{
>> ...
>> +    int id = 0x7 & $idx$$constant;
>>
>> I'd prefer to see "immU3 idx" instead. idx should always be in-bound 
>> and it's validated on JDK side before calling intrinsic.
>>
>>
>> On a more general topic: what are the plans for optimizing variable 
>> indexed element accesses? It would be unfortunate if a non-constant 
>> index usage caused vector boxing.
>>
>> I remember I played with dispatch tables [1] and it worked ok. Are 
>> there better alternatives available?
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1]
>> http://hg.openjdk.java.net/panama/dev/file/dd9ee9434abe/test/jdk/pana
>> ma/snippets/VectorUtils.java#l300
>>


More information about the panama-dev mailing list