RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v6]

Emanuel Peter epeter at openjdk.org
Thu Jan 16 06:45:44 UTC 2025


On Wed, 15 Jan 2025 12:51:40 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   More fixes for vnkozlov
>
> src/hotspot/share/opto/vectorization.hpp line 1212:
> 
>> 1210:     tty->print("m * q(%d) + r(%d)", _q, _r);
>> 1211:     if (_vpointer.count_invar_summands() > 0) {
>> 1212:       tty->print(" - invar / (iv_scale(%d) * pre_stride)", _vpointer.iv_scale());
> 
> Would it make sense to print all invariant summands here as well?

I think I would rather not do that, because it would be too verbose. With the `TraceAlignVector` `ALIGN_VECTOR` flag we do this printing here, which should be on one line:

`solution for pack:         m * q(2) + r(0) - invar / (iv_scale(4) * pre_stride) [- init / pre_stride], mem_ref[1047]`

And the `invar` is already printed earlier, so it is known from the context:

  invar = SUM(invar_summands), invar_summands:
   4 * [101 LoadI] ->   101  LoadI  === _ 7 100  [[ 1083 226 235 280 372 458 ]]  @java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact+144 *, name=zero, idx=5; #int !orig=[225] !jvms: Test::test00001 @ bci:10 (line 94)
  invar_factor = 4


This is the fuller context:

 vector mem_ref: 1047  StoreI  === 1119 1117 1056 1048  [[ 1021 1023 1029 ]]  @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6;  Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=902,764,625,185,650 !jvms: Test::test00001 @ bci:24 (line 94)
  VPointer: VPointer[size:  4, object, base(37 CastPP) + con( 16) + iv_scale(  4) * iv + invar(4 * [101 LoadI])]
  vector_width = 64
  aw = alignment_width = min(vector_width(64), ObjectAlignmentInBytes(8)) = 8
  invar = SUM(invar_summands), invar_summands:
   4 * [101 LoadI] ->   101  LoadI  === _ 7 100  [[ 1083 226 235 280 372 458 ]]  @java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact+144 *, name=zero, idx=5; #int !orig=[225] !jvms: Test::test00001 @ bci:10 (line 94)
  invar_factor = 4
  iv = init(   0) + pre_iter * pre_stride(1) + main_iter * main_stride(16)
  adr = base[37] + con(16) + invar + iv_scale(4) * iv      = base[37] + C_const(16) + C_invar(4) * var_invar + C_init(0) * var_init + C_pre(4) * pre_iter + C_main(64) * main_iter
  init is constant:
    C_const_init = 0
    C_init = 0
  invariant present:
    C_invar = invar_factor = 4
  C_const = con(16) + iv_scale(4) * C_const_init(0) = 16
  C_pre   = iv_scale(4) * pre_stride(1) = 4
  C_main  = iv_scale(4) * main_stride(16) = 64
  EQ(1 ): (C_const(16) + C_invar(4) * var_invar + C_init(0) * var_init + C_pre(4) * pre_iter + C_main(64) * main_iter) % aw(8) = 0 (given base aligned -> align rest)
  EQ(2 ): C_main(64) % aw(8) = 0 = 0 (alignment across iterations)
  EQ(4a): (C_const( 16)             + C_pre(4) * pre_iter_C_const) % aw(8) = 0  (align const term individually)
          -> constrained
  EQ(4b): (C_invar(  4) * var_invar + C_pre(4) * pre_iter_C_invar) % aw(8) = 0  (align invar term individually)
          -> constrained
  EQ(4c): (C_init(   0) * var_init  + C_pre(4) * pre_iter_C_init ) % aw(8) = 0  (align init term individually)
          -> constrained
  EQ(4a, b, c) all constrained, hence:
  EQ(5a): C_const( 16) % abs(C_pre(4)) = 0
  EQ(5b): C_invar(  4) % abs(C_pre(4)) = 0
  EQ(5c): C_init(   0) % abs(C_pre(4)) = 0
  All terms in EQ(4a, b, c) are divisible by abs(C_pre(4)).
  X = C_const( 16) / abs(C_pre(4)) = 4       (6a)
  Y = C_invar(  4) / abs(C_pre(4)) = 1       (6b)
  Z = C_init(   0) / abs(C_pre(4)) = 0       (6c)
  q = aw(       8) / abs(C_pre(4)) = 2       (8)
  sign(C_pre) = (C_pre(4) > 0) ? 1 : -1 = 1  (7)
  EQ(9a): (X(  4)             + pre_iter_C_const * sign(C_pre)) % q(2) = 0
  EQ(9b): (Y(  1) * var_invar + pre_iter_C_invar * sign(C_pre)) % q(2) = 0
  EQ(9c): (Z(  0) * var_init  + pre_iter_C_init  * sign(C_pre)) % q(2) = 0
  EQ(10a): pre_iter_C_const = mx2 * q(2) - sign(C_pre) * X(4)
  EQ(10b): pre_iter_C_invar = my2 * q(2) - sign(C_pre) * Y(1) * var_invar
  EQ(10c): pre_iter_C_init  = mz2 * q(2) - sign(C_pre) * Z(0) * var_init 
  r = (-C_const(16) / (iv_scale(4) * pre_stride(1)) % q(2) = 0
  EQ(14):  pre_iter = m * q(  2) - r(0)
                                 - invar / (iv_scale(4) * pre_stride(1))
  solution for pack:         m * q(2) + r(0) - invar / (iv_scale(4) * pre_stride) [- init / pre_stride], mem_ref[1047]
  intersection with current: m * q(2) + r(0) - invar / (iv_scale(4) * pre_stride) [- init / pre_stride], mem_ref[1047]


Let me know what you think.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1917827612


More information about the hotspot-compiler-dev mailing list