RFR: 8273322: Enhance macro logic optimization for masked logic operations.
Patch extends existing macrologic inferencing algorithm to handle masked logic operations. Existing algorithm: 1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table. Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:- 1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child. New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes. Following are the performance number for JMH benchmark included with the patch. Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server) Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751 Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8273322: Enhance macro logic optimization for masked logic operations. Changes: https://git.openjdk.java.net/jdk/pull/6893/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6893&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273322 Stats: 1413 lines in 12 files changed: 1370 ins; 6 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/6893.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6893/head:pull/6893 PR: https://git.openjdk.java.net/jdk/pull/6893
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8273322 - 8273322: Enhance macro logic optimization for masked logic operations. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6893/files - new: https://git.openjdk.java.net/jdk/pull/6893/files/b14079e9..f8120acb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6893&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6893&range=00-01 Stats: 6814 lines in 274 files changed: 5024 ins; 944 del; 846 mod Patch: https://git.openjdk.java.net/jdk/pull/6893.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6893/head:pull/6893 PR: https://git.openjdk.java.net/jdk/pull/6893
On Mon, 3 Jan 2022 12:31:50 GMT, Jatin Bhateja <jbhateja@openjdk.org> wrote:
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
- Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8273322 - 8273322: Enhance macro logic optimization for masked logic operations.
I think whole "Bitwise operation packing optimization" code should be moved out from `compile.cpp`. May be to `vectornode.cpp where `MacroLogicVNode` code is located. Copyright year should be updated to 2022 in all changed files. src/hotspot/cpu/x86/x86.ad line 1900:
1898: 1899: case Op_MacroLogicV: 1900: if(bt != T_INT && bt != T_LONG) {
Missing `VM_Version::supports_evex()` check? ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6893
On Tue, 4 Jan 2022 02:21:35 GMT, Vladimir Kozlov <kvn@openjdk.org> wrote:
Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
- Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8273322 - 8273322: Enhance macro logic optimization for masked logic operations.
src/hotspot/cpu/x86/x86.ad line 1900:
1898: 1899: case Op_MacroLogicV: 1900: if(bt != T_INT && bt != T_LONG) {
Missing `VM_Version::supports_evex()` check?
Hi @vnkozlov, we already have that check (UseAVX < 3) in match_rule_supported routine which gets called from this function. ------------- PR: https://git.openjdk.java.net/jdk/pull/6893
On Tue, 4 Jan 2022 15:01:22 GMT, Jatin Bhateja <jbhateja@openjdk.org> wrote:
src/hotspot/cpu/x86/x86.ad line 1900:
1898: 1899: case Op_MacroLogicV: 1900: if(bt != T_INT && bt != T_LONG) {
Missing `VM_Version::supports_evex()` check?
Hi @vnkozlov, we already have that check (UseAVX < 3) in match_rule_supported routine which gets called from this function.
Good. ------------- PR: https://git.openjdk.java.net/jdk/pull/6893
On Tue, 4 Jan 2022 02:25:36 GMT, Vladimir Kozlov <kvn@openjdk.org> wrote:
I think whole "Bitwise operation packing optimization" code should be moved out from `compile.cpp`. May be to `vectornode.cpp where `MacroLogicVNode` code is located.
Hi @vnkozlov , Yes we can also extended AndV/OrV/XorV/AndVMask/OrVMask/XorVMask idealizations to perform macro logic folding, current changes keeps the implementation clean and limited to one optimization stage.
Copyright year should be updated to 2022 in all changed files.
------------- PR: https://git.openjdk.java.net/jdk/pull/6893
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8273322: Updating copywrite header. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6893/files - new: https://git.openjdk.java.net/jdk/pull/6893/files/f8120acb..d18f504f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6893&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6893&range=01-02 Stats: 12 lines in 12 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/6893.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6893/head:pull/6893 PR: https://git.openjdk.java.net/jdk/pull/6893
On Tue, 4 Jan 2022 15:11:47 GMT, Jatin Bhateja <jbhateja@openjdk.org> wrote:
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
8273322: Updating copywrite header.
Let me test it before approval. You need second review. And file RFE to move vector code from `compile.cpp`. Let do it separately from these changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/6893
On Tue, 4 Jan 2022 15:11:47 GMT, Jatin Bhateja <jbhateja@openjdk.org> wrote:
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
8273322: Updating copywrite header.
`compiler/vectorapi/TestMaskedMacroLogicVector.java` test failed on Aarch64 machines: Unrecognized VM option 'UseAVX=3' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. java.lang.RuntimeException: TestFramework flag VM exited with 1 at compiler.lib.ir_framework.driver.FlagVMProcess.checkFlagVMExitCode(FlagVMProcess.java:135) at compiler.lib.ir_framework.driver.FlagVMProcess.start(FlagVMProcess.java:121) at compiler.lib.ir_framework.driver.FlagVMProcess.<init>(FlagVMProcess.java:63) at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:658) at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:322) at compiler.lib.ir_framework.TestFramework.runWithFlags(TestFramework.java:230) at compiler.vectorapi.TestMaskedMacroLogicVector.main(TestMaskedMacroLogicVector.java:837) ------------- PR: https://git.openjdk.java.net/jdk/pull/6893
On Tue, 4 Jan 2022 15:11:47 GMT, Jatin Bhateja <jbhateja@openjdk.org> wrote:
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
8273322: Updating copywrite header.
src/hotspot/cpu/x86/assembler_x86.cpp line 9740:
9738: emit_int8(0x25); 9739: emit_int8((unsigned char)(0xC0 | encode)); 9740: emit_int8(imm8);
Please use emit_int24() here. src/hotspot/cpu/x86/assembler_x86.cpp line 9773:
9771: emit_int8(0x25); 9772: emit_int8((unsigned char)(0xC0 | encode)); 9773: emit_int8(imm8);
Please use emit_int24() here. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4164:
4162: } 4163: } 4164:
"merge" argument not used in method body. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4174:
4172: } 4173: } 4174:
"merge" argument is not used in the method body. src/hotspot/cpu/x86/x86.ad line 9590:
9588: format %{ "vternlog_masked $dst,$src2,$src3,$func,$mask\t! vternlog masked operation" %} 9589: ins_encode %{ 9590: int vector_len = vector_length_encoding(this);
It would be good to name this as vlen_enc instead of vector_len. ------------- PR: https://git.openjdk.java.net/jdk/pull/6893
On Tue, 4 Jan 2022 15:11:47 GMT, Jatin Bhateja <jbhateja@openjdk.org> wrote:
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
8273322: Updating copywrite header.
Hi @sviswa7 , @vnkozlov , your comments have been addressed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6893
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8273322: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6893/files - new: https://git.openjdk.java.net/jdk/pull/6893/files/d18f504f..f101fff7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6893&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6893&range=02-03 Stats: 15 lines in 4 files changed: 1 ins; 4 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/6893.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6893/head:pull/6893 PR: https://git.openjdk.java.net/jdk/pull/6893
On Wed, 5 Jan 2022 08:59:00 GMT, Jatin Bhateja <jbhateja@openjdk.org> wrote:
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
8273322: Review comments resolution.
Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6893
On Wed, 5 Jan 2022 08:59:00 GMT, Jatin Bhateja <jbhateja@openjdk.org> wrote:
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
8273322: Review comments resolution.
test/hotspot/jtreg/compiler/vectorapi/TestMaskedMacroLogicVector.java line 26:
24: /** 25: * @test 26: * @bug 8273322
Needs @key randomness as we use random number without a fixed seed here. Please see: https://openjdk.java.net/jtreg/faq.html#when-should-i-use-the-intermittent-o... ------------- PR: https://git.openjdk.java.net/jdk/pull/6893
On Thu, 6 Jan 2022 17:39:20 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
8273322: Review comments resolution.
test/hotspot/jtreg/compiler/vectorapi/TestMaskedMacroLogicVector.java line 26:
24: /** 25: * @test 26: * @bug 8273322
Needs @key randomness as we use random number without a fixed seed here. Please see: https://openjdk.java.net/jtreg/faq.html#when-should-i-use-the-intermittent-o...
DONE ------------- PR: https://git.openjdk.java.net/jdk/pull/6893
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8273322: Adding missing randomness key. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6893/files - new: https://git.openjdk.java.net/jdk/pull/6893/files/f101fff7..2d196f71 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6893&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6893&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6893.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6893/head:pull/6893 PR: https://git.openjdk.java.net/jdk/pull/6893
On Thu, 6 Jan 2022 18:26:32 GMT, Jatin Bhateja <jbhateja@openjdk.org> wrote:
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
8273322: Adding missing randomness key.
Marked as reviewed by sviswanathan (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6893
On Mon, 20 Dec 2021 13:33:01 GMT, Jatin Bhateja <jbhateja@openjdk.org> wrote:
Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
Existing algorithm:
1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a macro logic node can have. 3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column to each input. 4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.
Modification: Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following rules define the criteria under which nodes gets packed into a macro logic node:-
1. Parent and both child nodes are all unmasked or masked with same predicates. 2. Masked parent can be packed with left child if it is predicated and both have same prediates. 3. Masked parent can be packed with right child if its un-predicated or has matching predication condition. 4. An unmasked parent can be packed with an unmasked child.
New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and child nodes.
Following are the performance number for JMH benchmark included with the patch.
Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( withopt/baseline) -- | -- | -- | -- | -- o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 2.171403315 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 2.002547072 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 1.792558013 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 1.882536419 o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 1.560787454 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 2.022003377 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 1.63814064 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 1.384211046 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 1.140933774 o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 1.121276084 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 1.205791374 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 1.087654397 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 1.002939661 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 1.031267884 o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 1.030794717 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 3435.989 | 4418.09 | 1.285827749 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 1524.803 | 1678.201 | 1.100601848 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 972.501 | 1166.734 | 1.199725244 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 5980.85 | 7584.17 | 1.268075608 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 3258.108 | 3939.23 | 1.209054457 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 1475.365 | 1511.159 | 1.024261115 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 4208.766 | 4220.678 | 1.002830283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 2056.651 | 2049.489 | 0.99651764 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 1110.461 | 1116.448 | 1.005391455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 3259.348 | 3947.94 | 1.211266793 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 1515.147 | 1536.647 | 1.014190042 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 | 911.58 | 1030.54 | 1.130498695 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 2034.611 | 2073.764 | 1.019243482 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 1110.659 | 1116.093 | 1.004892591 o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 | 559.269 | 559.651 | 1.000683034 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 3636.141 | 4446.505 | 1.222863745 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 1433.145 | 1681.261 | 1.173126934 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 1000.107 | 1172.866 | 1.172740517 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 5568.313 | 7670.259 | 1.37748345 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 3350.108 | 3927.803 | 1.172440709 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 1495.966 | 1541.56 | 1.030477965 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 4230.379 | 4282.154 | 1.012238856 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 2029.801 | 2049.638 | 1.009772879 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 1108.738 | 1118.897 | 1.00916267 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 3802.801 | 3783.537 | 0.99493426 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 1546.244 | 1552.691 | 1.004169458 o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 | 1017.512 | 1020.075 | 1.002518889 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 256 | 4159.835 | 4527.676 | 1.088426825 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 512 | 1665.335 | 1733.04 | 1.040655484 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 | 1024 | 1150.319 | 1181.935 | 1.02748455 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 256 | 6989.791 | 7382.883 | 1.056238019 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 512 | 3711.362 | 3911.921 | 1.054039191 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 | 1024 | 1540.341 | 1554.175 | 1.008981128 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 256 | 4164.559 | 4213.546 | 1.01176283 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 512 | 2072.91 | 2079.105 | 1.002988552 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 | 1024 | 1112.678 | 1116.675 | 1.003592234 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 256 | 3702.998 | 3906.093 | 1.0548461 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 512 | 1536.571 | 1546.043 | 1.006164375 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 | 1024 | 996.906 | 1013.649 | 1.016794964 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 256 | 2045.594 | 2048.966 | 1.001648421 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 512 | 1111.933 | 1117.689 | 1.005176571 o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 | 1024 | 559.971 | 561.144 | 1.002094751
Kindly review and share your feedback.
Best Regards, Jatin
This pull request has now been integrated. Changeset: 8703f148 Author: Jatin Bhateja <jbhateja@openjdk.org> URL: https://git.openjdk.java.net/jdk/commit/8703f14808d7256d4b07e7ea8a232889bbca... Stats: 1419 lines in 12 files changed: 1368 ins; 6 del; 45 mod 8273322: Enhance macro logic optimization for masked logic operations. Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.java.net/jdk/pull/6893
participants (3)
-
Jatin Bhateja
-
Sandhya Viswanathan
-
Vladimir Kozlov