Request for reviews (L): 6761600: Use SSE 4.2 In STTNI Intrinsics

Changpeng Fang Changpeng.Fang at Sun.COM
Tue Feb 10 11:46:01 PST 2009


The updated version of webrev has been posted at:
http://cr.openjdk.java.net/~cfang/6761600/webrev.01

Tom Rodriguez wrote:
> Why is UseSSE42Intrinsics conditional on supports_ht()?  Isn't 
> supports_sse4_2() sufficient?  Also the setting for UseSSE42Intrinsics 
> should take into account whether UseSSE >= 4 instead of having to test 
> for it everywhere with (UseSSE >= 4 && UseSSE42Intrinsics).
Use supports_ht() and UseSSE >=4 to set UseSSE42Intrinsics.

> In the ad file changes you don't need to force regXD6 and regXD7 for 
> temps.  Declare them as regXD and use the TEMP effect instead of the 
> KILL effect.  A TEMP is a synthetic input that interferes with the 
> other inputs but not the output.  As long as the TEMP and the result 
> are different register classes then there's no problem.  You should 
> also have different variants for UseSSE42Intrinsics and 
> !UseSSE42Intrinsics.  The !UseSSE42Intrinsics don't use or need the 
> regXDs right?
>
  Use TEMP effect instead of KILL effect for the two temporary xmm 
registers.

> The string_indexOf triggering criteria in library_call.cpp seems wrong 
> since it skips the optimization for constant strings in preference for 
> the SSE4.2 version.  The constant string seems like it should be 
> preferred shouldn't it?
>
 Now String.indexOf SSE42 intrinsic could accept non-constant argument.

> As an aside, when writing enc_class functions it's better to pass in 
> the operands instead of assuming the register usage in the enc_class 
> function, in the style of the enc_Array_Equals versions.  It's also 
> nicer to have names for registers instead of just using raw register 
> names everywhere.  This all makes it easier to read and easier to 
> change register assignments.  These intrinsics end up using a lot of 
> bound registers because we don't have a way to express that the inputs 
> interfere with the output.  Otherwise we could use unbound masks for 
> all these inputs and likely would generate better code.  Someday I'll 
> add this feature.
>
> Note that the types mentioned in the enc_class declaration are 
> completely meaningless.  They are ignored by the adlc since 
> enc_classes are handled like C macros, as as pure textual substitution.
>
I noticed that for the following case, tmp4Reg and resultReg turn out to 
be the same raw register (thus interfere each other):

**rdx_RegI tmp4, rbx_RegI result**

*+     Register tmp4Reg     = as_Register($tmp4$$reg);*
*+     Register resultReg   = as_Register($result$$reg);*

I am not sure whether result register can also be used as an temporary 
register in the intrinsic body, or not.

> Otherwise this looks good to me.
>
Thanks,

Changpeng

>
> tom
>
> On Feb 6, 2009, at 2:42 PM, Changpeng Fang wrote:
>
>> Thanks. I put the updated webrev in:
>> http://cr.openjdk.java.net/~cfang/6761600/webrev/
>>
>> and I am testing to adjust the masm.jccb stuff.
>>
>> Changpeng
>>
>>
>> Vladimir Kozlov wrote:
>>> Changpeng,
>>>
>>> Good work.
>>>
>>> src/cpu/x86/vm/assembler_x86.cpp:
>>> Remove Macro NOT_LP64() around asserts in new instructions.
>>> VM_Version::supports_sse4_2() should be checked on LP64 also
>>> since not all 64-bits CPUs support it.
>>>
>>>
>>> src/cpu/x86/vm/vm_version_x86_32.cpp:
>>> src/cpu/x86/vm/vm_version_x86_64.cpp:
>>> Remove UseUnalignedLoadStores check in the next code,
>>> it is not needed and it is wrong to have it since it depends on
>>> UseXMMForArrayCopy flag. The top check supports_sse4_2() && 
>>> supports_ht()
>>> is enough for movdqu verification.
>>>
>>> +         if(FLAG_IS_DEFAULT(UseSSE42Intrinsics) &&
>>> +            UseUnalignedLoadStores){ //sse42 intrinsics depend on 
>>> movdqu
>>> +             UseSSE42Intrinsics = true;
>>>          }
>>>
>>>
>>> In both x86_32.ad and x86_64.ad try to use short forward branches 
>>> masm.jccb()
>>> starting from the bottom of the assembler code. Then try to use all
>>> intrinsics with all flags variations and revert back to long jump 
>>> those branches
>>> VM will complain about.
>>>
>>>
>>> cpu/x86/vm/x86_32.ad:
>>> align comments:
>>>
>>> !       masm.andl(rsi, 0xfffffff8); // rsi holds the vector count
>>> !       masm.andl(rdi, 0x00000007);       // rdi holds the tail count
>>>
>>> In enc_String_Equals() and enc_String_IndexOf()
>>> use movptr() for value_offset.
>>>
>>>
>>> src/share/vm/adlc/formssel.cpp:
>>> Missing checks for AryEq.
>>>
>>> src/share/vm/classfile/vmSymbols.hpp:
>>> Remove empty after indexOf_name since equals() also j.l.String method:
>>>     do_name(     indexOf_name,       "indexOf")
>>>
>>>
>>> +   do_intrinsic(_equals, java_lang_String, equals_name, 
>>> object_boolean_signature,  F_R)
>>>
>>>
>>> src/share/vm/opto/library_call.cpp:
>>> Add that it because spec allow to specify NULL as argument:
>>> +   //should not do null check for argument for string_equals.
>>>
>>> Remove next line
>>> +   //argument = do_null_check(argument, T_OBJECT);
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> Changpeng Fang wrote:
>>>> http://webrev.invokedynamic.info/cfang/6761600/index.html
>>>>
>>>> Works on 6761600 : Use SSE 4.2 in intrinsics for String.compareTo(),
>>>> String.equals(), String.indexOf() and Arrays.equals(), among which,
>>>> String.equals() and String.indexOf() are intrinsified for the first 
>>>> time.
>>>>
>>>> Problem: Use SSE 4.2 instructions for STTNI intrinsics for for Nehalem
>>>>              improvements
>>>>
>>>> Solution:
>>>>   Implemented both 32 and 64-bit versions for X86.  Introduced  a new
>>>> flag  UseSSE42Intrinsics, with the default TRUE on systems that 
>>>> support
>>>> SSE4.2 (false otherwise).
>>>>
>>>> Reviewed by:
>>>>
>>>> Fix verified (y/n): y
>>>>
>>>> Other testing:  JPRT,
>>>>
>>>>
>>
>




More information about the hotspot-compiler-dev mailing list