RFR: 8273322: Enhance macro logic optimization for masked logic operations.

Jatin Bhateja jbhateja at openjdk.java.net
Mon Dec 20 13:39:45 UTC 2021


Patch extends existing macrologic inferencing algorithm to handle masked logic operations.

Existing algorithm:

1. Identify logic cone roots.
2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met.
i.e. maximum number of inputs which a macro logic node can have.
3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column
to each input.
4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.

Modification:
Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following
rules define the criteria under which nodes gets packed into a macro logic node:-

1. Parent and both child nodes are all unmasked or masked with same predicates.
2. Masked parent can be packed with left child if it is predicated and both have same prediates.
3. Masked parent can be packed with right child if its un-predicated or has matching predication condition.
4. An unmasked parent can be packed with an unmasked child.

New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and
child nodes.

Following are the performance number for JMH benchmark included with the patch.

Machine Configuration:  Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)

Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline)
-- | -- | -- | -- | --
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751


Kindly review and share your feedback.

Best Regards,
Jatin

-------------

Commit messages:
 - 8273322: Enhance macro logic optimization for masked logic operations.

Changes: https://git.openjdk.java.net/jdk/pull/6893/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6893&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8273322
  Stats: 1413 lines in 12 files changed: 1370 ins; 6 del; 37 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6893.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6893/head:pull/6893

PR: https://git.openjdk.java.net/jdk/pull/6893


More information about the core-libs-dev mailing list