Request for reviews (XL): 7119644: Increase superword's vector size up to 256 bits

Mon Apr 9 14:56:01 PDT 2012

VecX and VecY aren't great names.  Why not do Op_Vec{4,8,16,32} for these instead?  Maybe it needs to a trailing identifier like B to indicate the sizing?

sparc.ad:

apart from moving the code around and the new predicates, did the instruction definitions change in any other way?

x86.ad:

Please make sure all that  %{ are on the preceding lines instead of by themselves.

Is is possible to share the new spill copy code in here instead of duplicating it?

chaitin.cpp:

Why do we need the machine dependent ShouldNotReachHeres?  Aren't the asserts enough?

Can't this:

        if( lrg.num_regs() == 1 ) {
          reg = tempmask.find_first_elem();
+       } else if (lrg._is_vector) {
+         tempmask.clear_to_sets(lrg.num_regs());
+         reg = tempmask.find_first_set(lrg.num_regs());
        } else {
!         tempmask.ClearToPairs();
!         tempmask.clear_to_pairs();
          reg = tempmask.find_first_pair();
        }

be simplified to this:

        tempmask.clear_to_sets(lrg.num_regs());
        reg = tempmask.find_first_set(lrg.num_regs());

same with the other uses and the Clear/Insert/set_mask_size sequence.

matcher.cpp:

this is kind of gross.

+ #if defined(IA32) || defined(AMD64)
+   Op_VecS, Op_VecD, Op_VecX, Op_VecY, /* Vectors */
+ #else
+   0, Op_RegD, 0, 0, /* Vectors */
+ #endif

can't the other platforms be required to use Op_VecS instead?

postaloc.cpp:

Don't you need to generalize this for n_reg > 2:

    // See if it happens to already be in the correct register!
    // (either Phi's direct register, or the common case of the name
    // never-clobbered original-def register)
!   if( value[val_reg] == val &&
!   if (value[val_reg] == val &&
        // Doubles check both halves
!       ( single || value[val_reg-1] == val ) ) {
!       ((n_regs == 1) || value[val_reg-1] == val)) {
      blk_adjust += use_prior_register(n,k,regnd[val_reg],current_block,value,regnd);
      if( n->in(k) == regnd[val_reg] ) // Success!  Quit trying
        return blk_adjust;
    }

Why aren't you doing copy elimination on vectors?

regmask.cpp:

the logic in num_registers is too clever.  Please make it return the appropriate values directly.

superword.cpp:

You have an extra copy of the s2 line.

    set_alignment(s2, align + data_size(s1));
+   if (align == top_align || align == bottom_align )
+     set_alignment(s2, align);
+   else {
+     set_alignment(s2, align + data_size(s1));
+   }

I'm not sure I understand/trust the new logic in find_adjacent_refs.  It's iterating over all the candidate memory ops, collecting sets at potentially different alignments and then setting the overall alignment?  Somehow I don't think this logic will work in the general case or at least it has some hidden limitations.  Can you explain this part more?

find_align_to_ref shouldn't be setting _align_to_ref directly.  It should be returning the values it finds and the caller should be responsible for saving off the proper value.

I like the simplification of the loads and stores but I end up wondering why the rest of the operations aren't similarly simplified.

Overall the code looks good.

tom

On Apr 3, 2012, at 10:03 AM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7119644/webrev
> 
> 7119644: Increase superword's vector size up to 256 bits
> 
> Increase superword's vector size up to 256-bits for YMM AVX registers on x86. Added generation of different vector sizes for different types of arrays in the same loop. Allow to generate small (4 bytes) vectors for loops which were unrolled small number of iterations.
> Add new C2 types for vectors and rework VectorNode implementation. Used MachTypeNode as base node for vector mach nodes to keep vector type.
> Moved XMM registers definition and vector instructions into one file x86.ad (have to rename eRegI to rRegI in x86_32.ad).
> 
> Tested with full CTW, NSK, C2 regression tests, JPRT and added new test.
> 
> Thanks,
> Vladimir