RFR (M): 6443505: Ideal() function for CmpLTMask

Tue Apr 9 08:50:22 PDT 2013

Leave only first lines commented to see matching and remove the rest:

   /* If I enable this, I encourage spilling in the inner loop of compress.
   instruct cadd_cmpLTMask_mem( ncxRegI p, ncxRegI q, memory y, eCXRegI 
tmp, eFlagsReg cr ) %{
   match(Set p (AddI (AndI (CmpLTMask p q) (LoadI y)) (SubI p q)));
   */

Vladimir

On 4/9/13 5:10 AM, David Chase wrote:
> Any reason I should not delete:
>
>    enc_class enc_cmpLTP_mem(rRegI p, rRegI q, memory mem, eCXRegI tmp) %{    // cadd_cmpLT
>
> It's only called from a comment:
>
> /* If I enable this, I encourage spilling in the inner loop of compress.
> instruct cadd_cmpLTMask_mem(ncxRegI p, ncxRegI q, memory y, eCXRegI tmp, eFlagsReg cr) %{
>
> And the code that it generates is wrong.
> And it uses the pre-macro-assembler.
>
> David
>
>
> On 2013-04-08, at 3:04 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>
>> David,
>>
>> We already have several thousands lines in .ad files so I would prefer to remove unused code. You can leave the comment explaining why we use branches instead of cmoves.
>>
>> Please, remove next lines since we will have comment about that:
>> // always // predicate(!VM_Version::supports_cmov());
>>
>> I would like to ask you to convert old LTMask instructions to use macroassembler as you did with new one. It is direction in which we move .ad files.
>>
>> sparc.ad:
>> Why do you need to specify default cost in cmpLTMask0()?
>> +   ins_cost(DEFAULT_COST);
>>
>> You don't need -XX:+UnlockDiagnosticVMOptions in the test command since you are not using diagnostic flags. CompileOnly is product flag:
>>
>> src/share/vm/runtime/globals.hpp:  product(ccstrlist, CompileOnly, "",
>>
>>
>> How different LTMask code in x86_32.ad and x86_64.ad? We have third x86.ad file which contains instructions which are the same for 32- and 64-bit. Note both files are use rRegI now.
>> As separate changes (rfe) you can look if you can move LTMask (and other) instructions into x86.ad. One thing I notice is flag type name rFlagsReg vs eFlagsReg. You need to rename it to use rFlagsReg and move operand rFlagsReg() into x86.ad.
>>
>>
>> Thanks,
>> Vladimir
>>
>> On 4/5/13 11:08 AM, David Chase wrote:
>>> http://cr.openjdk.java.net/~drchase/6443505/webrev.01/
>>>
>>> Suggestions incorporated, revised and whacked on the sparc and-cmpltmask pattern until it was (1) observed to fire and (2) did not interfere with add-and-cmpltmask.
>>>
>>> I kept the branch-free variants in comments of x86, since these were also tested and seen to work.
>>>
>>> Revised code seen to work correctly on Sparc and x86, and also passed JPRT targeted to the new unit test.
>>>
>>>
>>> On 2013-04-01, at 2:01 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
>>>
>>>>
>>>> On Mar 28, 2013, at 4:37 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>>>
>>>>> "pipeline" info is not used in code generation, it is outdated anyway. So use what other similar instructions use.
>>>>>
>>>>> I think we should use only branch variant on all x86 (32 and 64bit). We save register (tmp) and it is MUCH MUCH more important for performance (less spills on stack). And you need only 3 instructions for cadd_cmpLTMask.
>>>>
>>>> I agree on this one.  -- Chris
>>>>
>>>>>
>>>>> And, please, remove unneeded spaces near parenthesis, at least in new code.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 3/27/13 6:09 PM, David Chase wrote:
>>>>>> Problem:
>>>>>> (as stated)
>>>>>> C2 needs a special case like CmpLTMask (turn result of a comparison into a -1/0 mask) for the comparison of a difference with zero.
>>>>>>
>>>>>> (as actually observed)
>>>>>> 1) The improvement in that case is minimal and (very) difficult to get to trigger, because CSE interferes and extracts the p-q from the pattern.
>>>>>> 2) The original code generation for caddCmpLTMask on some platforms was actually wrong (used carry bit for a signed comparison)
>>>>>> 3) The original Ideal pattern matching, because of a typo/thinko, accidentally failed to apply in the symmetric case
>>>>>> 4) Code generation for CmpLTMask on all platforms omitted the somewhat relevant case of and-CmpLTMask; if the very specific pattern failed to apply, then it would fall back to the laborious calculation of an actual mask, when much more compact code could apply.
>>>>>>
>>>>>> Fix:
>>>>>> Repair wrong code generation.
>>>>>> Write additional pattern for (and (cmpltmask p q) y)
>>>>>> Fixed the typo so the pattern fires more often.
>>>>>>
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~drchase/6443505/webrev.00/
>>>>>>
>>>>>> Testing:
>>>>>> Wrote a new test to definitely exercise the two patterns in question.
>>>>>> Verified that "new test" would fail running with unfixed compiler.
>>>>>>
>>>>>> Verified (observing assembly output) that the new patterns matched on x86_32-cmov, x86_32+cmov, x86_64, and Sparc
>>>>>> (except that I could not get the and-cmpltmask pattern to fire on Sparc.
>>>>>> Bit of a shame we lack a cumulative coverage tool wired into jtreg so we could easily know if it ever fired at all).
>>>>>>
>>>>>> Benchmarked change on x86_64 (saw little or no performance difference on the whole benchmark)
>>>>>>
>>>>>> JPRT on compiler tests (clean runs thwarted by irrelevant failures, but it was always the same 2 or 3 borked tests.)
>>>>>>
>>>>>> JPRT on just the new test (clean run)
>>>>>>
>>>>>> Questions:
>>>>>> not 100% sure on the pattern costs.
>>>>>> not 100% sure on the choice of "pipeline".
>>>>>>
>>>>
>>>
>