RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v6]
Emanuel Peter
epeter at openjdk.org
Thu Jan 16 06:45:44 UTC 2025
On Wed, 15 Jan 2025 12:51:40 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:
>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>>
>> More fixes for vnkozlov
>
> src/hotspot/share/opto/vectorization.hpp line 1212:
>
>> 1210: tty->print("m * q(%d) + r(%d)", _q, _r);
>> 1211: if (_vpointer.count_invar_summands() > 0) {
>> 1212: tty->print(" - invar / (iv_scale(%d) * pre_stride)", _vpointer.iv_scale());
>
> Would it make sense to print all invariant summands here as well?
I think I would rather not do that, because it would be too verbose. With the `TraceAlignVector` `ALIGN_VECTOR` flag we do this printing here, which should be on one line:
`solution for pack: m * q(2) + r(0) - invar / (iv_scale(4) * pre_stride) [- init / pre_stride], mem_ref[1047]`
And the `invar` is already printed earlier, so it is known from the context:
invar = SUM(invar_summands), invar_summands:
4 * [101 LoadI] -> 101 LoadI === _ 7 100 [[ 1083 226 235 280 372 458 ]] @java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact+144 *, name=zero, idx=5; #int !orig=[225] !jvms: Test::test00001 @ bci:10 (line 94)
invar_factor = 4
This is the fuller context:
vector mem_ref: 1047 StoreI === 1119 1117 1056 1048 [[ 1021 1023 1029 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=902,764,625,185,650 !jvms: Test::test00001 @ bci:24 (line 94)
VPointer: VPointer[size: 4, object, base(37 CastPP) + con( 16) + iv_scale( 4) * iv + invar(4 * [101 LoadI])]
vector_width = 64
aw = alignment_width = min(vector_width(64), ObjectAlignmentInBytes(8)) = 8
invar = SUM(invar_summands), invar_summands:
4 * [101 LoadI] -> 101 LoadI === _ 7 100 [[ 1083 226 235 280 372 458 ]] @java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact+144 *, name=zero, idx=5; #int !orig=[225] !jvms: Test::test00001 @ bci:10 (line 94)
invar_factor = 4
iv = init( 0) + pre_iter * pre_stride(1) + main_iter * main_stride(16)
adr = base[37] + con(16) + invar + iv_scale(4) * iv = base[37] + C_const(16) + C_invar(4) * var_invar + C_init(0) * var_init + C_pre(4) * pre_iter + C_main(64) * main_iter
init is constant:
C_const_init = 0
C_init = 0
invariant present:
C_invar = invar_factor = 4
C_const = con(16) + iv_scale(4) * C_const_init(0) = 16
C_pre = iv_scale(4) * pre_stride(1) = 4
C_main = iv_scale(4) * main_stride(16) = 64
EQ(1 ): (C_const(16) + C_invar(4) * var_invar + C_init(0) * var_init + C_pre(4) * pre_iter + C_main(64) * main_iter) % aw(8) = 0 (given base aligned -> align rest)
EQ(2 ): C_main(64) % aw(8) = 0 = 0 (alignment across iterations)
EQ(4a): (C_const( 16) + C_pre(4) * pre_iter_C_const) % aw(8) = 0 (align const term individually)
-> constrained
EQ(4b): (C_invar( 4) * var_invar + C_pre(4) * pre_iter_C_invar) % aw(8) = 0 (align invar term individually)
-> constrained
EQ(4c): (C_init( 0) * var_init + C_pre(4) * pre_iter_C_init ) % aw(8) = 0 (align init term individually)
-> constrained
EQ(4a, b, c) all constrained, hence:
EQ(5a): C_const( 16) % abs(C_pre(4)) = 0
EQ(5b): C_invar( 4) % abs(C_pre(4)) = 0
EQ(5c): C_init( 0) % abs(C_pre(4)) = 0
All terms in EQ(4a, b, c) are divisible by abs(C_pre(4)).
X = C_const( 16) / abs(C_pre(4)) = 4 (6a)
Y = C_invar( 4) / abs(C_pre(4)) = 1 (6b)
Z = C_init( 0) / abs(C_pre(4)) = 0 (6c)
q = aw( 8) / abs(C_pre(4)) = 2 (8)
sign(C_pre) = (C_pre(4) > 0) ? 1 : -1 = 1 (7)
EQ(9a): (X( 4) + pre_iter_C_const * sign(C_pre)) % q(2) = 0
EQ(9b): (Y( 1) * var_invar + pre_iter_C_invar * sign(C_pre)) % q(2) = 0
EQ(9c): (Z( 0) * var_init + pre_iter_C_init * sign(C_pre)) % q(2) = 0
EQ(10a): pre_iter_C_const = mx2 * q(2) - sign(C_pre) * X(4)
EQ(10b): pre_iter_C_invar = my2 * q(2) - sign(C_pre) * Y(1) * var_invar
EQ(10c): pre_iter_C_init = mz2 * q(2) - sign(C_pre) * Z(0) * var_init
r = (-C_const(16) / (iv_scale(4) * pre_stride(1)) % q(2) = 0
EQ(14): pre_iter = m * q( 2) - r(0)
- invar / (iv_scale(4) * pre_stride(1))
solution for pack: m * q(2) + r(0) - invar / (iv_scale(4) * pre_stride) [- init / pre_stride], mem_ref[1047]
intersection with current: m * q(2) + r(0) - invar / (iv_scale(4) * pre_stride) [- init / pre_stride], mem_ref[1047]
Let me know what you think.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1917827612
More information about the hotspot-compiler-dev
mailing list