[vectorIntrinsics+mask] RFR: 8273322: Enhance macro logic optimization for masked logic operations.

Jatin Bhateja jbhateja at openjdk.java.net
Wed Oct 13 04:35:02 UTC 2021


On Tue, 14 Sep 2021 16:06:14 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> Patch extends existing macrologic inferencing algorithm to handle masked logic operations.
> 
> Existing algorithm:
>  1) Identify logic cone roots.
>  2) Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met.
>     i.e. maximum number of inputs which a macro logic node can have.
>  3) Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column
>     to each input.
>  4) Inputs along with encoded function together represents a macro logic node which mimics a truth table.
> 
> Modification:
>  Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following
>  rules define the criteria under which nodes gets packed into a macro logic node:-
>  1) Parent and both child nodes are all unmasked or masked with same predicates.
>  2) Masked parent can be packed with left child if it is predicated and both have same prediates.
>  3) Masked parent can be packed with right child if its un-predicated or has matching predication condition.
>  4) An unmasked parent can be packed with an unmasked child.
> 
> New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and
> child nodes.
> 
> Following are the performance number for JMH benchmark included with the patch.
> 
> Machine Configuration:  Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascadelake Server 28C 2S)
> 
> 
> Benchmark | SPECIES | VECLEN | Baseline Score (ops/ms) | With Opt Score (ops/ms) | Gain Ratio
> -- | -- | -- | -- | -- | --
> MaskedLogicOpts.bitwiseBlendOperationInt | 128 | 256 | 594.425 | 616.74 | 1.03754048
> MaskedLogicOpts.bitwiseBlendOperationInt | 128 | 512 | 596.433 | 616.405 | 1.033485739
> MaskedLogicOpts.bitwiseBlendOperationInt | 128 | 1024 | 586.716 | 618.718 | 1.054544277
> MaskedLogicOpts.bitwiseBlendOperationInt | 128 | 2048 | 594.68 | 618.235 | 1.039609538
> MaskedLogicOpts.bitwiseBlendOperationInt | 128 | 4096 | 595.357 | 617.803 | 1.037701749
> MaskedLogicOpts.bitwiseBlendOperationInt | 256 | 256 | 503.396 | 602.252 | 1.196378199
> MaskedLogicOpts.bitwiseBlendOperationInt | 256 | 512 | 529.454 | 572.485 | 1.081274294
> MaskedLogicOpts.bitwiseBlendOperationInt | 256 | 1024 | 560.688 | 587.143 | 1.047183104
> MaskedLogicOpts.bitwiseBlendOperationInt | 256 | 2048 | 539.919 | 586.473 | 1.086224045
> MaskedLogicOpts.bitwiseBlendOperationInt | 256 | 4096 | 542.102 | 586.694 | 1.082257583
> MaskedLogicOpts.bitwiseBlendOperationInt | 512 | 256 | 401.552 | 474.281 | 1.181119755
> MaskedLogicOpts.bitwiseBlendOperationInt | 512 | 512 | 371.352 | 520.497 | 1.401627028
> MaskedLogicOpts.bitwiseBlendOperationInt | 512 | 1024 | 403.174 | 514.51 | 1.27614876
> MaskedLogicOpts.bitwiseBlendOperationInt | 512 | 2048 | 386.124 | 511.22 | 1.323978825
> MaskedLogicOpts.maskedLogicOperationsInt | 512 | 256 | 316.054 | 654.797 | 2.071788365
> MaskedLogicOpts.maskedLogicOperationsInt | 512 | 512 | 312.912 | 600.227 | 1.918197448
> MaskedLogicOpts.maskedLogicOperationsInt | 512 | 1024 | 305.86 | 614.129 | 2.007876152
> MaskedLogicOpts.maskedLogicOperationsInt | 512 | 2048 | 306.589 | 617.645 | 2.014569994
> MaskedLogicOpts.maskedLogicOperationsInt | 512 | 4096 | 314.896 | 619.618 | 1.96769092
> MaskedLogicOpts.maskedLogicOperationsLong | 128 | 256 | 12.32 | 17.629 | 1.430925325
> MaskedLogicOpts.maskedLogicOperationsLong | 128 | 512 | 12.296 | 17.632 | 1.433962264
> MaskedLogicOpts.maskedLogicOperationsLong | 128 | 1024 | 12.027 | 17.663 | 1.468612289
> MaskedLogicOpts.maskedLogicOperationsLong | 128 | 2048 | 12.33 | 17.601 | 1.427493917
> MaskedLogicOpts.maskedLogicOperationsLong | 128 | 4096 | 12.329 | 17.65 | 1.43158407
> MaskedLogicOpts.maskedLogicOperationsLong | 256 | 256 | 413.078 | 1184.616 | 2.867777998
> MaskedLogicOpts.maskedLogicOperationsLong | 256 | 512 | 431.578 | 1069.109 | 2.477209218
> MaskedLogicOpts.maskedLogicOperationsLong | 256 | 1024 | 430.099 | 1089.835 | 2.53391661
> MaskedLogicOpts.maskedLogicOperationsLong | 256 | 2048 | 420.341 | 1204.934 | 2.8665631
> MaskedLogicOpts.maskedLogicOperationsLong | 256 | 4096 | 431.571 | 1069.704 | 2.478628082
> MaskedLogicOpts.maskedLogicOperationsLong | 512 | 256 | 311.43 | 599.982 | 1.926538869
> MaskedLogicOpts.maskedLogicOperationsLong | 512 | 512 | 305.459 | 620.418 | 2.031100737
> MaskedLogicOpts.maskedLogicOperationsLong | 512 | 1024 | 304.885 | 611.37 | 2.00524788
> MaskedLogicOpts.maskedLogicOperationsLong | 512 | 2048 | 305.198 | 619.347 | 2.029328501
> MaskedLogicOpts.maskedLogicOperationsLong | 512 | 4096 | 305.317 | 615.882 | 2.017188692
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 128 | 256 | 781.922 | 856.605 | 1.095512084
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 128 | 512 | 752.428 | 856.559 | 1.138393308
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 128 | 1024 | 764.4 | 837.68 | 1.095866039
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 128 | 2048 | 780.311 | 857.797 | 1.099301432
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 128 | 4096 | 780.489 | 837.536 | 1.073091357
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 256 | 256 | 703.881 | 820.539 | 1.165735401
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 256 | 512 | 698.958 | 822.174 | 1.17628527
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 256 | 1024 | 715.533 | 806.71 | 1.12742529
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 256 | 2048 | 678.087 | 797.53 | 1.176147014
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 256 | 4096 | 714.427 | 824.008 | 1.153383061
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 512 | 256 | 400.801 | 547.414 | 1.365799986
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 512 | 512 | 453.713 | 602.492 | 1.327914342
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 512 | 1024 | 467.685 | 612.172 | 1.308940847
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 512 | 2048 | 467.286 | 612.659 | 1.311100696
> MaskedLogicOpts.partiallyMaskedLogicOperationsInt | 512 | 4096 | 465.71 | 671.911 | 1.442766958
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 128 | 256 | 18.25 | 24.524 | 1.343780822
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 128 | 512 | 18.634 | 24.408 | 1.30986369
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 128 | 1024 | 18.566 | 24.839 | 1.337875687
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 128 | 2048 | 18.568 | 24.65 | 1.327552779
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 128 | 4096 | 18.685 | 24.448 | 1.308429221
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 256 | 256 | 658.381 | 788.086 | 1.197005989
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 256 | 512 | 679.09 | 780.808 | 1.149785743
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 256 | 1024 | 675.793 | 783.304 | 1.159088656
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 256 | 2048 | 679.09 | 823.756 | 1.213029201
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 256 | 4096 | 677.724 | 782.655 | 1.154828514
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 512 | 256 | 456.231 | 620.547 | 1.360159656
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 512 | 512 | 468.07 | 604.75 | 1.292007606
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 512 | 1024 | 467.188 | 605.256 | 1.295529851
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 512 | 2048 | 468.52 | 605.854 | 1.293123026
> MaskedLogicOpts.partiallyMaskedLogicOperationsLong | 512 | 4096 | 467.954 | 605.996 | 1.294990533

This patch will be posted on JDK-mainline after integration of following pull request.
8271515: Integration of JEP 417: Vector API (Third Incubator) #5873

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/125


More information about the panama-dev mailing list