RFR (M): 6443505: Ideal() function for CmpLTMask
Christian Thalinger
christian.thalinger at oracle.com
Mon Apr 8 10:52:06 PDT 2013
On Apr 5, 2013, at 11:08 AM, David Chase <david.r.chase at oracle.com> wrote:
> http://cr.openjdk.java.net/~drchase/6443505/webrev.01/
>
> Suggestions incorporated, revised and whacked on the sparc and-cmpltmask pattern until it was (1) observed to fire and (2) did not interfere with add-and-cmpltmask.
>
> I kept the branch-free variants in comments of x86, since these were also tested and seen to work.
>
> Revised code seen to work correctly on Sparc and x86, and also passed JPRT targeted to the new unit test.
src/cpu/x86/vm/x86_32.ad:
! format %{ "XOR ECX,ECX\t# cadd_LTMaskA\n\t"
+ format %{ "XORL $tmp, $tmp\t# and_cmpLTMaskA\n\t"
Sometimes you have cmpLT and sometimes not. Always use the instruct name.
+ instruct and_cmpLTMaskB(rRegI p, rRegI q, rRegI y, eFlagsReg cr)
+ %{
Could you move the %{ to the same line to be like the existing ones (I know that 32 and 64 are different; it's annoying):
+ instruct cadd_cmpLTMaskB(rRegI p, rRegI q, rRegI y, eFlagsReg cr) %{
src/cpu/x86/vm/x86_64.ad:
+ /* Better to save a register than avoid a branch
Do we really want to keep the unused code?
-- Chris
>
>
> On 2013-04-01, at 2:01 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
>
>>
>> On Mar 28, 2013, at 4:37 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>>> "pipeline" info is not used in code generation, it is outdated anyway. So use what other similar instructions use.
>>>
>>> I think we should use only branch variant on all x86 (32 and 64bit). We save register (tmp) and it is MUCH MUCH more important for performance (less spills on stack). And you need only 3 instructions for cadd_cmpLTMask.
>>
>> I agree on this one. -- Chris
>>
>>>
>>> And, please, remove unneeded spaces near parenthesis, at least in new code.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 3/27/13 6:09 PM, David Chase wrote:
>>>> Problem:
>>>> (as stated)
>>>> C2 needs a special case like CmpLTMask (turn result of a comparison into a -1/0 mask) for the comparison of a difference with zero.
>>>>
>>>> (as actually observed)
>>>> 1) The improvement in that case is minimal and (very) difficult to get to trigger, because CSE interferes and extracts the p-q from the pattern.
>>>> 2) The original code generation for caddCmpLTMask on some platforms was actually wrong (used carry bit for a signed comparison)
>>>> 3) The original Ideal pattern matching, because of a typo/thinko, accidentally failed to apply in the symmetric case
>>>> 4) Code generation for CmpLTMask on all platforms omitted the somewhat relevant case of and-CmpLTMask; if the very specific pattern failed to apply, then it would fall back to the laborious calculation of an actual mask, when much more compact code could apply.
>>>>
>>>> Fix:
>>>> Repair wrong code generation.
>>>> Write additional pattern for (and (cmpltmask p q) y)
>>>> Fixed the typo so the pattern fires more often.
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~drchase/6443505/webrev.00/
>>>>
>>>> Testing:
>>>> Wrote a new test to definitely exercise the two patterns in question.
>>>> Verified that "new test" would fail running with unfixed compiler.
>>>>
>>>> Verified (observing assembly output) that the new patterns matched on x86_32-cmov, x86_32+cmov, x86_64, and Sparc
>>>> (except that I could not get the and-cmpltmask pattern to fire on Sparc.
>>>> Bit of a shame we lack a cumulative coverage tool wired into jtreg so we could easily know if it ever fired at all).
>>>>
>>>> Benchmarked change on x86_64 (saw little or no performance difference on the whole benchmark)
>>>>
>>>> JPRT on compiler tests (clean runs thwarted by irrelevant failures, but it was always the same 2 or 3 borked tests.)
>>>>
>>>> JPRT on just the new test (clean run)
>>>>
>>>> Questions:
>>>> not 100% sure on the pattern costs.
>>>> not 100% sure on the choice of "pipeline".
>>>>
>>
>
More information about the hotspot-compiler-dev
mailing list