Request for reviews (L): 6761600: Use SSE 4.2 In STTNI Intrinsics
Changpeng Fang
Changpeng.Fang at Sun.COM
Tue Feb 10 11:46:01 PST 2009
The updated version of webrev has been posted at:
http://cr.openjdk.java.net/~cfang/6761600/webrev.01
Tom Rodriguez wrote:
> Why is UseSSE42Intrinsics conditional on supports_ht()? Isn't
> supports_sse4_2() sufficient? Also the setting for UseSSE42Intrinsics
> should take into account whether UseSSE >= 4 instead of having to test
> for it everywhere with (UseSSE >= 4 && UseSSE42Intrinsics).
Use supports_ht() and UseSSE >=4 to set UseSSE42Intrinsics.
> In the ad file changes you don't need to force regXD6 and regXD7 for
> temps. Declare them as regXD and use the TEMP effect instead of the
> KILL effect. A TEMP is a synthetic input that interferes with the
> other inputs but not the output. As long as the TEMP and the result
> are different register classes then there's no problem. You should
> also have different variants for UseSSE42Intrinsics and
> !UseSSE42Intrinsics. The !UseSSE42Intrinsics don't use or need the
> regXDs right?
>
Use TEMP effect instead of KILL effect for the two temporary xmm
registers.
> The string_indexOf triggering criteria in library_call.cpp seems wrong
> since it skips the optimization for constant strings in preference for
> the SSE4.2 version. The constant string seems like it should be
> preferred shouldn't it?
>
Now String.indexOf SSE42 intrinsic could accept non-constant argument.
> As an aside, when writing enc_class functions it's better to pass in
> the operands instead of assuming the register usage in the enc_class
> function, in the style of the enc_Array_Equals versions. It's also
> nicer to have names for registers instead of just using raw register
> names everywhere. This all makes it easier to read and easier to
> change register assignments. These intrinsics end up using a lot of
> bound registers because we don't have a way to express that the inputs
> interfere with the output. Otherwise we could use unbound masks for
> all these inputs and likely would generate better code. Someday I'll
> add this feature.
>
> Note that the types mentioned in the enc_class declaration are
> completely meaningless. They are ignored by the adlc since
> enc_classes are handled like C macros, as as pure textual substitution.
>
I noticed that for the following case, tmp4Reg and resultReg turn out to
be the same raw register (thus interfere each other):
**rdx_RegI tmp4, rbx_RegI result**
*+ Register tmp4Reg = as_Register($tmp4$$reg);*
*+ Register resultReg = as_Register($result$$reg);*
I am not sure whether result register can also be used as an temporary
register in the intrinsic body, or not.
> Otherwise this looks good to me.
>
Thanks,
Changpeng
>
> tom
>
> On Feb 6, 2009, at 2:42 PM, Changpeng Fang wrote:
>
>> Thanks. I put the updated webrev in:
>> http://cr.openjdk.java.net/~cfang/6761600/webrev/
>>
>> and I am testing to adjust the masm.jccb stuff.
>>
>> Changpeng
>>
>>
>> Vladimir Kozlov wrote:
>>> Changpeng,
>>>
>>> Good work.
>>>
>>> src/cpu/x86/vm/assembler_x86.cpp:
>>> Remove Macro NOT_LP64() around asserts in new instructions.
>>> VM_Version::supports_sse4_2() should be checked on LP64 also
>>> since not all 64-bits CPUs support it.
>>>
>>>
>>> src/cpu/x86/vm/vm_version_x86_32.cpp:
>>> src/cpu/x86/vm/vm_version_x86_64.cpp:
>>> Remove UseUnalignedLoadStores check in the next code,
>>> it is not needed and it is wrong to have it since it depends on
>>> UseXMMForArrayCopy flag. The top check supports_sse4_2() &&
>>> supports_ht()
>>> is enough for movdqu verification.
>>>
>>> + if(FLAG_IS_DEFAULT(UseSSE42Intrinsics) &&
>>> + UseUnalignedLoadStores){ //sse42 intrinsics depend on
>>> movdqu
>>> + UseSSE42Intrinsics = true;
>>> }
>>>
>>>
>>> In both x86_32.ad and x86_64.ad try to use short forward branches
>>> masm.jccb()
>>> starting from the bottom of the assembler code. Then try to use all
>>> intrinsics with all flags variations and revert back to long jump
>>> those branches
>>> VM will complain about.
>>>
>>>
>>> cpu/x86/vm/x86_32.ad:
>>> align comments:
>>>
>>> ! masm.andl(rsi, 0xfffffff8); // rsi holds the vector count
>>> ! masm.andl(rdi, 0x00000007); // rdi holds the tail count
>>>
>>> In enc_String_Equals() and enc_String_IndexOf()
>>> use movptr() for value_offset.
>>>
>>>
>>> src/share/vm/adlc/formssel.cpp:
>>> Missing checks for AryEq.
>>>
>>> src/share/vm/classfile/vmSymbols.hpp:
>>> Remove empty after indexOf_name since equals() also j.l.String method:
>>> do_name( indexOf_name, "indexOf")
>>>
>>>
>>> + do_intrinsic(_equals, java_lang_String, equals_name,
>>> object_boolean_signature, F_R)
>>>
>>>
>>> src/share/vm/opto/library_call.cpp:
>>> Add that it because spec allow to specify NULL as argument:
>>> + //should not do null check for argument for string_equals.
>>>
>>> Remove next line
>>> + //argument = do_null_check(argument, T_OBJECT);
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> Changpeng Fang wrote:
>>>> http://webrev.invokedynamic.info/cfang/6761600/index.html
>>>>
>>>> Works on 6761600 : Use SSE 4.2 in intrinsics for String.compareTo(),
>>>> String.equals(), String.indexOf() and Arrays.equals(), among which,
>>>> String.equals() and String.indexOf() are intrinsified for the first
>>>> time.
>>>>
>>>> Problem: Use SSE 4.2 instructions for STTNI intrinsics for for Nehalem
>>>> improvements
>>>>
>>>> Solution:
>>>> Implemented both 32 and 64-bit versions for X86. Introduced a new
>>>> flag UseSSE42Intrinsics, with the default TRUE on systems that
>>>> support
>>>> SSE4.2 (false otherwise).
>>>>
>>>> Reviewed by:
>>>>
>>>> Fix verified (y/n): y
>>>>
>>>> Other testing: JPRT,
>>>>
>>>>
>>
>
More information about the hotspot-compiler-dev
mailing list