RFR (M): 6443505: Ideal() function for CmpLTMask

Mon Apr 8 12:04:57 PDT 2013

David,

We already have several thousands lines in .ad files so I would prefer 
to remove unused code. You can leave the comment explaining why we use 
branches instead of cmoves.

Please, remove next lines since we will have comment about that:
// always // predicate(!VM_Version::supports_cmov());

I would like to ask you to convert old LTMask instructions to use 
macroassembler as you did with new one. It is direction in which we move 
.ad files.

sparc.ad:
Why do you need to specify default cost in cmpLTMask0()?
+   ins_cost(DEFAULT_COST);

You don't need -XX:+UnlockDiagnosticVMOptions in the test command since 
you are not using diagnostic flags. CompileOnly is product flag:

src/share/vm/runtime/globals.hpp:  product(ccstrlist, CompileOnly, "",

How different LTMask code in x86_32.ad and x86_64.ad? We have third 
x86.ad file which contains instructions which are the same for 32- and 
64-bit. Note both files are use rRegI now.
As separate changes (rfe) you can look if you can move LTMask (and 
other) instructions into x86.ad. One thing I notice is flag type name 
rFlagsReg vs eFlagsReg. You need to rename it to use rFlagsReg and move 
operand rFlagsReg() into x86.ad.

Thanks,
Vladimir

On 4/5/13 11:08 AM, David Chase wrote:
> http://cr.openjdk.java.net/~drchase/6443505/webrev.01/
>
> Suggestions incorporated, revised and whacked on the sparc and-cmpltmask pattern until it was (1) observed to fire and (2) did not interfere with add-and-cmpltmask.
>
> I kept the branch-free variants in comments of x86, since these were also tested and seen to work.
>
> Revised code seen to work correctly on Sparc and x86, and also passed JPRT targeted to the new unit test.
>
>
> On 2013-04-01, at 2:01 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
>
>>
>> On Mar 28, 2013, at 4:37 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>>> "pipeline" info is not used in code generation, it is outdated anyway. So use what other similar instructions use.
>>>
>>> I think we should use only branch variant on all x86 (32 and 64bit). We save register (tmp) and it is MUCH MUCH more important for performance (less spills on stack). And you need only 3 instructions for cadd_cmpLTMask.
>>
>> I agree on this one.  -- Chris
>>
>>>
>>> And, please, remove unneeded spaces near parenthesis, at least in new code.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 3/27/13 6:09 PM, David Chase wrote:
>>>> Problem:
>>>> (as stated)
>>>> C2 needs a special case like CmpLTMask (turn result of a comparison into a -1/0 mask) for the comparison of a difference with zero.
>>>>
>>>> (as actually observed)
>>>> 1) The improvement in that case is minimal and (very) difficult to get to trigger, because CSE interferes and extracts the p-q from the pattern.
>>>> 2) The original code generation for caddCmpLTMask on some platforms was actually wrong (used carry bit for a signed comparison)
>>>> 3) The original Ideal pattern matching, because of a typo/thinko, accidentally failed to apply in the symmetric case
>>>> 4) Code generation for CmpLTMask on all platforms omitted the somewhat relevant case of and-CmpLTMask; if the very specific pattern failed to apply, then it would fall back to the laborious calculation of an actual mask, when much more compact code could apply.
>>>>
>>>> Fix:
>>>> Repair wrong code generation.
>>>> Write additional pattern for (and (cmpltmask p q) y)
>>>> Fixed the typo so the pattern fires more often.
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~drchase/6443505/webrev.00/
>>>>
>>>> Testing:
>>>> Wrote a new test to definitely exercise the two patterns in question.
>>>> Verified that "new test" would fail running with unfixed compiler.
>>>>
>>>> Verified (observing assembly output) that the new patterns matched on x86_32-cmov, x86_32+cmov, x86_64, and Sparc
>>>> (except that I could not get the and-cmpltmask pattern to fire on Sparc.
>>>> Bit of a shame we lack a cumulative coverage tool wired into jtreg so we could easily know if it ever fired at all).
>>>>
>>>> Benchmarked change on x86_64 (saw little or no performance difference on the whole benchmark)
>>>>
>>>> JPRT on compiler tests (clean runs thwarted by irrelevant failures, but it was always the same 2 or 3 borked tests.)
>>>>
>>>> JPRT on just the new test (clean run)
>>>>
>>>> Questions:
>>>> not 100% sure on the pattern costs.
>>>> not 100% sure on the choice of "pipeline".
>>>>
>>
>