VectorAPI VectorInsert Intrinsic
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Thu Jun 14 13:04:17 UTC 2018
On 14/06/2018 02:00, Rukmannagari, Shravya wrote:
> Hi Vladimir,
> Yes, I was referring to this patch.
Thanks for testing & review! Pushed:
http://hg.openjdk.java.net/panama/dev/rev/e00750ed585a
Best regards,
Vladimir Ivanov
>
> Thanks,
> Shravya.
>
> -----Original Message-----
> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
> Sent: Wednesday, June 13, 2018 3:55 PM
> To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; panama-dev at openjdk.java.net
> Subject: Re: VectorAPI VectorInsert Intrinsic
>
> Shravya,
>
>> There is a fix for 512 code, the vmovdqu has to be changed to evmovdquq. Everything else looks good and thanks a lot once again for making these changes. I'll update the patch once your changes are checked in.
>
> Are you talking about the following patch?
> http://cr.openjdk.java.net/~vlivanov/panama/vector/insert/webrev.00/
>
> Best regards,
> Vladimir Ivanov
>>
>> Thanks,
>> Shravya.
>>
>> -----Original Message-----
>> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
>> Sent: Friday, June 8, 2018 5:33 AM
>> To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>;
>> panama-dev at openjdk.java.net
>> Subject: Re: VectorAPI VectorInsert Intrinsic
>>
>>
>>
>> On 08/06/2018 02:50, Rukmannagari, Shravya wrote:
>>> Hi Vladimir,
>>> Thanks for reviewing the patch. I updated the patch to reflect the changes. Regarding the variable indexed element, we currently haven't given a thought about it, but that seems to be the ultimate design goal for inserting, I can work on it in the next few days.
>>
>> I'd like to hear other opinions, but IMO it's quite important case to cover w.r.t. predictable performance goal.
>>
>> Regarding implementation considerations, I'd try to extract variable index variants into vector stubs and share them among all use sites.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>> -----Original Message-----
>>> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
>>> Sent: Wednesday, June 6, 2018 12:52 AM
>>> To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>;
>>> panama-dev at openjdk.java.net
>>> Subject: Re: VectorAPI VectorInsert Intrinsic
>>>
>>>
>>> Additionally:
>>>
>>> +instruct rvinsert64B(vecZ dst, vecZ src, vecZ tmp, rRegI val, immI
>>> +idx) %{
>>> + predicate(UseAVX > 0 &&
>>> n->bottom_type()->is_vect()->element_basic_type() == T_BYTE);
>>>
>>> +instruct rvinsert32S(vecZ dst, vecZ src, vecZ tmp, rRegI val, immI
>>> +idx) %{
>>> + predicate(UseAVX > 0 &&
>>> n->bottom_type()->is_vect()->element_basic_type() == T_SHORT);
>>>
>>> +instruct rvinsert16I(vecZ dst, vecZ src, vecZ tmp, rRegI val, immI
>>> +idx) %{
>>> + predicate(UseAVX > 0 &&
>>> n->bottom_type()->is_vect()->element_basic_type() == T_INT);
>>>
>>> +instruct rvinsert8L(vecZ dst, vecZ src, vecZ tmp, rRegL val, immI
>>> +idx) %{
>>> + predicate(UseAVX > 0 &&
>>> n->bottom_type()->is_vect()->element_basic_type() == T_LONG);
>>>
>>> + instruct rvinsert16F(vecZ dst, vecZ src, vecZ tmp, regF val, immI idx) %{
>>> + predicate(UseAVX > 0 &&
>>> n->bottom_type()->is_vect()->element_basic_type() == T_FLOAT);
>>>
>>> + instruct rvinsert8D(vecZ dst, vecZ src, vecZ tmp, regD val, rRegL
>>> tmp1, immI idx) %{
>>> + predicate(UseAVX > 0 &&
>>> n->bottom_type()->is_vect()->element_basic_type() == T_DOUBLE);
>>>
>>> Shouldn't UseAVX > 2 be used instead?
>>>
>>>
>>> Also, I suspect *_avx rules are never used.
>>>
>>> +instruct rvinsert16B(vecX dst, vecX src, rRegI val, immI idx) %{
>>> + predicate(UseSSE > 3 &&
>>> n->bottom_type()->is_vect()->element_basic_type() == T_BYTE);
>>>
>>> +instruct rvinsert16B_avx(vecX dst, vecX src, rRegI val, immI idx) %{
>>> + predicate(UseAVX > 0 &&
>>> n->bottom_type()->is_vect()->element_basic_type() == T_BYTE);
>>>
>>> Both predicates are true when UseAVX > 0 and the first matching one
>>> is used always. Don't you need "UseSSE > 3 && UseAVX == 0" in the first case?
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> On 06/06/2018 02:30, Vladimir Ivanov wrote:
>>>>
>>>>> http://cr.openjdk.java.net/~srukmannagar/VectorAPI_vectorInsert/web
>>>>> r
>>>>> ev.00/
>>>>>
>>>>
>>>> Overall, looks good!
>>>>
>>>> +instruct rvinsert32B(vecY dst, vecY src, vecY tmp, rRegI val, immI
>>>> +idx) %{
>>>> ...
>>>> + if(id <= 15) {
>>>> ...
>>>> + __ vpinsrb($tmp$$XMMRegister, $tmp$$XMMRegister,
>>>> +$val$$Register,
>>>> id - 16);
>>>> ...
>>>>
>>>> + instruct rvinsert8D(vecZ dst, vecZ src, vecZ tmp, regD val, rRegL
>>>> tmp1, immI idx) %{
>>>> + } else if(id <= 5) {
>>>> ...
>>>> + __ vpinsrq($tmp$$XMMRegister, $tmp$$XMMRegister,
>>>> $tmp1$$Register, id - 6);
>>>>
>>>> "id - 16", "id - 6". Looks like copy-paste errors.
>>>>
>>>>
>>>>
>>>> +instruct rvinsert8B(vecD dst, vecD src, rRegI val, immI idx) %{
>>>> ...
>>>> + int id = 0x7 & $idx$$constant;
>>>>
>>>> I'd prefer to see "immU3 idx" instead. idx should always be in-bound
>>>> and it's validated on JDK side before calling intrinsic.
>>>>
>>>>
>>>> On a more general topic: what are the plans for optimizing variable
>>>> indexed element accesses? It would be unfortunate if a non-constant
>>>> index usage caused vector boxing.
>>>>
>>>> I remember I played with dispatch tables [1] and it worked ok. Are
>>>> there better alternatives available?
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> [1]
>>>> http://hg.openjdk.java.net/panama/dev/file/dd9ee9434abe/test/jdk/pan
>>>> a
>>>> ma/snippets/VectorUtils.java#l300
>>>>
More information about the panama-dev
mailing list