[14] RFR (L): 8234391: C2: Generic vector operands

Sat Nov 30 01:28:28 UTC 2019

Wow, very impressive work, Jatin and Vladimir.

Here’s an extra review in case it helps.

In machnode.cpp I think the code would be easier to read if a common
subexpression were assigned a name ’num_edges’ as in similar places:

   uint num_edges = _opnds[opcnt]->num_edges();

(Alternatively, you could just increment “skipped” on the fly, which
would be OK too.)  I’m being picky because it cost me a second to verify
that the set of raw edges was processed exhaustively.  This is a task
every reader of the code will have to do.

Another nit:  There is some dissonance between the seemingly general
name postselect_cleanup and supports_generic_vector_operands, which
is a very specific name.  But they refer to the same condition.  Pick the
shorter name, maybe, and convert the longer one into a nice comment.
It’s also not clear that helpers like process_mach_node and get_vector_operand
are is part of postselect_cleanup.  Should they be cleanup_mach_node
cleanup_vector_operand?  It’s a nit-pick, but might help future readers.

The name get_vector_operand is particularly bad, since it suggests that
it accesses something previously computed, where in fact it transforms
the graph.  Also, IMO, “get_” is one of those noise words which offers little
help to the reader and just takes up valuable screen space.

The name clone_generic_vector_operand is confusing; I would expect it
to be called something like [specialize,cleanup,…]_generic_vector_operand.

Thank you for the get_vector_regmask refactor.  Relatively few readers
prefer walls of repetitive code.

There’s a funny condition reported in this comment:
  // RShiftCntV/RShiftCntV report wide vector type, but VecS as ideal register.

It seems to comes out of the blue sky.  Maybe add a cross-referencing comment
between that and the vshiftcnt instruction in the AD file?  (Are there asserts
that would catch similar oddities if they were to arise?  Was this one caught
via an assert?  I certainly hope it was, rather than by debugging bad code!)

On a similar note (about asserts), I’m very glad to see verify_mach_nodes.
The name is a little non-specific.  Maybe verify_after_postselect_cleanup.

I skimmed the AD files and they look good.  For a nicer experience I filtered
out the trivial changes (to vec[SDXYZ]) and used this reduced patch file:
  http://cr.openjdk.java.net/~jrose/vectors/8234391-x86.ad.reduced.patch

The diff noise from s/vec/vec1/ is unfortunate.  I suppose that’s the price
of adding a new type name to the AD file.  Glad to see the problem doesn’t
show up in the big file x86.ad.

Why do vector[xy]_reg_legacy and vectorz_reg_legacy get different treatments
in the change set?  I’m mainly curious here.  The vectorz_reg_vl thing is a
dynamic set (?) which is fine, but why is it needed for z and not xy?  A comment
might help.  Also, this gripe is not part of this review, but I’ll make it anyway:
The very very short acronym “vl” which appears here starts for “AVX512VL”
referring to “variable length” but it bumps into the phrase “vector legacy”
with an unfortunate occasion for confusion.  Suggest “_vl” be renamed to
“_512vl” or some other more specific thing.

For the string intrinsics, there’s a regular replacement of legRegS by legRegD.
That strikes me as potentially a semantic change.  I suppose the register
allocator will do the same things in both cases, and there’s no spill code
generated by the JIT for such a temp.  I wonder, if the change was necessary,
how do we know that all the occurrences were correctly found and changed?
(Also, the string intrinsic macros use AVX512 instructions when available,
and in theory those would require heftier register masks.)  Can someone
comment on this, for the record…maybe even in the source code?

I’m very glad to see the duplicate operand definitions hoisted up to x86.ad.
(Cut-and-paste coding makes my skin crawl.)  Ignoring those insertions into
x86.a, and ignoring the trivial changes of vec[X…] to vec, it turns out that
there are 140 new lines added and 160 old lines removed.  (Mostly the old
lines are vector move instructions of particular sizes.)  That’s a win!

I suggest that the warning comments “(leg)Vec should be used instead” could
be a little less cryptic.  An unconditional warning like this makes the reader
wonder, “so why is it here at all?”  Maybe use a cross-reference as in:
  // Replaces (leg)vec during post-selection cleanup.  See above.

So, reviewed.  I’m relieved to see lots of combinatorial complexity disappear
from the AD files.

— John

On Nov 19, 2019, at 6:30 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8234391
> 
> Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ]) to generic ones.
> 
> (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD instruction merges will be handled separately.)
> 
> On a high-level it is organized as follows:
> 
>  (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec;
> 
>  (2) at runtime, right after matching is over, a special pass is performed which does:
> 
>      * replaces vecOper with vec[SDXYZ] depending on mach node type
>         - vector mach nodes capute bottom_type() of their ideal prototype;
> 
>      * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec)
>         - matcher needs them, but they are useless for register allocator (moreover, may cause additional spills);
> 
> 
>   (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands.
> 
> 
> Some details:
> 
>   (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works
> 
> 
>   (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands) which is enabled only on x86
> 
> 
>   (3) post-selection analysis is implemented as a single pass over the graph and processing individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type() (which is an instance of TypeVect)
> 
> 
>   (4) most of the analysis is cross-platform and interface with platform-specific code through 3 methods:
> 
>     static bool is_generic_reg2reg_move(MachNode* m);
>     // distinguishes MoveVec2Leg/MoveLeg2Vec nodes
> 
>     static bool is_generic_vector(MachOper* opnd);
>     // distinguishes vec/legVec operands
> 
>     static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg);
>     // constructs fixed-sized vector operand based on ideal reg
>     //   vec    + Op_Vec[SDXYZ] =>    vec[SDXYZ]
>     //   legVec + Op_Vec[SDXYZ] => legVec[SDXYZ]
> 
> 
>   (5) TEMP operands are handled specially:
>     - TEMP uses max_vector_size() to determine what fixed-sized operand to use
>         * it is needed to cover reductions which don't produce vectors but scalars
>     - TEMP_DEF inherits fixed-sized operand type from DEF;
> 
> 
>   (6) there is limited number of special cases for mach nodes in Matcher::get_vector_operand_helper:
> 
>       - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not ideal_reg().
> 
>       - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type.
> 
> 
>   (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see Matcher::get_vector_regmask)
> 
> 
>   (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec operands
> 
> 
>   (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD
>      - it aligns the code between x86_64.ad and x86_32.ad
>      - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g., string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0)
> 
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Testing: tier1-tier4, jtreg compiler tests on KNL and SKL,
>         performance testing (SPEC* + Octane + micros / G1 + ParGC).
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html
> 
> [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf