RFR(L): 8161211: better inlining support for loop bytecode intrinsics
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Sep 23 16:41:53 UTC 2016
Looks even better :-) Reviewed.
Best regards,
Vladimir Ivanov
On 9/22/16 10:23 AM, Michael Haupt wrote:
> Hi John,
>
> thanks for your review, and thanks Vladimir! I've had another go at the implementation to use a dedicated loop clause holder class with a stable array; performance is roughly on par with that of the BMHs-as-arrays approach (see below).
>
> The new webrev is at http://cr.openjdk.java.net/~mhaupt/8161211/webrev.01/; please review.
>
> Thanks,
>
> Michael
>
>
>
>
> Benchmark (iterations) unpatched patched
> CntL.Cr.cr3 N/A 16039.108 15821.583
> CntL.Cr.cr4 N/A 15621.959 15869.730
> CntL.Inv.bl3 0 2.858 2.835
> CntL.Inv.bl3 1 5.125 5.179
> CntL.Inv.bl3 10 11.887 12.005
> CntL.Inv.bl3 100 67.441 67.279
> CntL.Inv.bl4 0 2.855 2.858
> CntL.Inv.bl4 1 5.120 5.210
> CntL.Inv.bl4 10 11.875 12.012
> CntL.Inv.bl4 100 67.607 67.296
> CntL.Inv.blMH3 0 9.734 9.722
> CntL.Inv.blMH3 1 15.689 15.865
> CntL.Inv.blMH3 10 68.912 69.098
> CntL.Inv.blMH3 100 605.666 605.526
> CntL.Inv.blMH4 0 14.561 13.274
> CntL.Inv.blMH4 1 19.543 19.709
> CntL.Inv.blMH4 10 71.977 72.446
> CntL.Inv.blMH4 100 596.842 598.271
> CntL.Inv.cntL3 0 49.339 6.311
> CntL.Inv.cntL3 1 95.444 7.333
> CntL.Inv.cntL3 10 508.746 20.930
> CntL.Inv.cntL3 100 4701.808 147.383
> CntL.Inv.cntL4 0 49.443 5.780
> CntL.Inv.cntL4 1 98.721 7.465
> CntL.Inv.cntL4 10 503.825 20.932
> CntL.Inv.cntL4 100 4681.803 147.278
> DoWhL.Cr.cr N/A 7628.312 7803.187
> DoWhL.Inv.bl 1 3.868 3.869
> DoWhL.Inv.bl 10 16.480 16.528
> DoWhL.Inv.bl 100 144.260 144.290
> DoWhL.Inv.blMH 1 14.434 14.430
> DoWhL.Inv.blMH 10 92.542 92.733
> DoWhL.Inv.blMH 100 877.480 876.735
> DoWhL.Inv.doWhL 1 26.791 7.134
> DoWhL.Inv.doWhL 10 158.985 17.004
> DoWhL.Inv.doWhL 100 1391.746 133.253
> ItrL.Cr.cr N/A 13547.499 13248.913
> ItrL.Inv.bl 0 2.973 2.983
> ItrL.Inv.bl 1 6.771 6.705
> ItrL.Inv.bl 10 14.955 14.952
> ItrL.Inv.bl 100 81.842 82.152
> ItrL.Inv.blMH 0 14.893 15.014
> ItrL.Inv.blMH 1 20.998 21.459
> ItrL.Inv.blMH 10 73.677 73.888
> ItrL.Inv.blMH 100 613.913 615.208
> ItrL.Inv.itrL 0 33.583 10.842
> ItrL.Inv.itrL 1 82.239 13.573
> ItrL.Inv.itrL 10 448.356 38.773
> ItrL.Inv.itrL 100 4189.034 279.918
> L.Cr.cr N/A 15505.970 15640.994
> L.Inv0.bl 1 3.179 3.186
> L.Inv0.bl 10 5.952 5.912
> L.Inv0.bl 100 50.942 50.964
> L.Inv0.lo 1 46.454 5.290
> L.Inv0.lo 10 514.230 8.492
> L.Inv0.lo 100 5166.251 52.187
> L.Inv1.lo 1 34.321 5.291
> L.Inv1.lo 10 430.839 8.474
> L.Inv1.lo 100 4095.302 52.173
> TF.blEx N/A 3.005 2.986
> TF.blMHEx N/A 166.316 165.856
> TF.blMHNor N/A 9.337 9.290
> TF.blNor N/A 2.696 2.682
> TF.cr N/A 406.255 415.090
> TF.invTFEx N/A 154.121 154.826
> TF.invTFNor N/A 5.350 5.328
> WhL.Cr.cr N/A 12214.383 12112.535
> WhL.Inv.bl 0 3.886 3.931
> WhL.Inv.bl 1 5.379 5.411
> WhL.Inv.bl 10 16.000 16.203
> WhL.Inv.bl 100 142.066 142.127
> WhL.Inv.blMH 0 11.028 10.915
> WhL.Inv.blMH 1 21.269 21.419
> WhL.Inv.blMH 10 97.493 98.373
> WhL.Inv.blMH 100 887.579 892.955
> WhL.Inv.whL 0 24.829 7.082
> WhL.Inv.whL 1 46.039 8.598
> WhL.Inv.whL 10 240.963 21.108
> WhL.Inv.whL 100 2092.671 167.619
>
>
>
>
>
>> Am 20.09.2016 um 21:54 schrieb John Rose <john.r.rose at oracle.com>:
>>
>> There should also be an assert in the new LF constructor, which ensures that the two
>> arguments are congruent. Better yet, just supply one argument (the speciesData),
>> and derive the MT. These new LFs are pretty confusing, and it's best to nail down
>> unused degrees of freedom.
>>
>> — John
>>
>> P.S. I would have expected this problem to be solved by having the MHI.toArray function
>> return a box object with a single @Stable array field. Did that approach fail?
>>
>> I.e., this wrapper emulates a frozen array (until that happy day when we have real
>> frozen arrays):
>>
>> class ArrayConstant<T> {
>> private final @Stable T[] values;
>> public ArrayConstant(T[] values) {
>> for (T v : values) Objects.requireNonNull(v);
>> this.values = values.clone();
>> }
>> public T get(int i) { return values[i]; }
>> //public int length() { return values.length; }
>> }
>>
>> The JIT should be able to constant fold through ac.get(i) whenever ac and i are constants.
>>
>> On Sep 20, 2016, at 8:17 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>>
>>> Looks good.
>>>
>>> src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java:
>>> + LambdaForm bmhArrayForm(MethodType type, BoundMethodHandle.SpeciesData speciesData) {
>>> + int size = type.parameterCount();
>>> + Transform key = Transform.of(Transform.BMH_AS_ARRAY, size);
>>> + LambdaForm form = getInCache(key);
>>> + if (form != null) {
>>> + return form;
>>> + }
>>>
>>> Please, add an assert to ensure the cached LF has the same constraint as requested (speciesData).
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> On 9/20/16 3:53 PM, Michael Haupt wrote:
>>>> Dear all,
>>>>
>>>> please review this change.
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8161211
>>>> Webrev: http://cr.openjdk.java.net/~mhaupt/8161211/webrev.00/
>>>>
>>>> The method handle loop combinators introduced with JEP 274 were originally not intrinsified, leading to poor performance as compared to a pure-Java baseline, but also to handwired method handle combinations. The intrinsics introduced with 8143211 [1] improved on the situation somewhat, but still did not provide good inlining opportunities for the JIT compiler. This change introduces a usage of BoundMethodHandles as arrays to carry the various handles involved in loop execution.
>>>>
>>>> Extra credits to Vladimir Ivanov, who suggested the BMH-as-arrays approach in the first place, and Claes Redestad, who suggested to use LambdaForm editing to neatly enable caching. Thanks!
>>>>
>>>> Performance improves considerably. The table below reports scores in ns/op. The "unpatched" column contains results from before applying the patch for 8161211; the "patched" column, from thereafter.
>>>>
>>>> The create benchmarks measure the cost of loop handle creation. The baseline and baselineMH benchmarks measure the cost of running a pure Java and handwired method handle construct.
>>>>
>>>> Relevant comparisons include loop combinator results versus baselines, and versus unpatched loop combinator results. For the latter, there are significant improvements, except for the creation benchmarks (creation has a more complex workflow now). For the former, it can be seen that the BMH-array intrinsics generally perform better than handwired handle constructs, and have moved much closer to.
>>>>
>>>> Thanks,
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8143211
>>>>
>>>>
>>>>
>>>> Benchmark (iterations) unpatched patched
>>>> MethodHandlesCountedLoop.Create.create3 N/A 16039.108 18400.405
>>>> MethodHandlesCountedLoop.Create.create4 N/A 15621.959 17924.696
>>>> MethodHandlesCountedLoop.Invoke.baseline3 0 2.858 2.839
>>>> MethodHandlesCountedLoop.Invoke.baseline3 1 5.125 5.164
>>>> MethodHandlesCountedLoop.Invoke.baseline3 10 11.887 11.924
>>>> MethodHandlesCountedLoop.Invoke.baseline3 100 67.441 67.281
>>>> MethodHandlesCountedLoop.Invoke.baseline4 0 2.855 2.838
>>>> MethodHandlesCountedLoop.Invoke.baseline4 1 5.120 5.179
>>>> MethodHandlesCountedLoop.Invoke.baseline4 10 11.875 11.906
>>>> MethodHandlesCountedLoop.Invoke.baseline4 100 67.607 67.374
>>>> MethodHandlesCountedLoop.Invoke.baselineMH3 0 9.734 9.606
>>>> MethodHandlesCountedLoop.Invoke.baselineMH3 1 15.689 15.674
>>>> MethodHandlesCountedLoop.Invoke.baselineMH3 10 68.912 69.303
>>>> MethodHandlesCountedLoop.Invoke.baselineMH3 100 605.666 606.432
>>>> MethodHandlesCountedLoop.Invoke.baselineMH4 0 14.561 13.234
>>>> MethodHandlesCountedLoop.Invoke.baselineMH4 1 19.543 19.773
>>>> MethodHandlesCountedLoop.Invoke.baselineMH4 10 71.977 72.466
>>>> MethodHandlesCountedLoop.Invoke.baselineMH4 100 596.842 602.469
>>>> MethodHandlesCountedLoop.Invoke.countedLoop3 0 49.339 5.810
>>>> MethodHandlesCountedLoop.Invoke.countedLoop3 1 95.444 7.441
>>>> MethodHandlesCountedLoop.Invoke.countedLoop3 10 508.746 21.002
>>>> MethodHandlesCountedLoop.Invoke.countedLoop3 100 4701.808 145.996
>>>> MethodHandlesCountedLoop.Invoke.countedLoop4 0 49.443 5.798
>>>> MethodHandlesCountedLoop.Invoke.countedLoop4 1 98.721 7.438
>>>> MethodHandlesCountedLoop.Invoke.countedLoop4 10 503.825 21.049
>>>> MethodHandlesCountedLoop.Invoke.countedLoop4 100 4681.803 147.020
>>>> MethodHandlesDoWhileLoop.Create.create N/A 7628.312 9100.332
>>>> MethodHandlesDoWhileLoop.Invoke.baseline 1 3.868 3.909
>>>> MethodHandlesDoWhileLoop.Invoke.baseline 10 16.480 16.461
>>>> MethodHandlesDoWhileLoop.Invoke.baseline 100 144.260 144.232
>>>> MethodHandlesDoWhileLoop.Invoke.baselineMH 1 14.434 14.494
>>>> MethodHandlesDoWhileLoop.Invoke.baselineMH 10 92.542 93.454
>>>> MethodHandlesDoWhileLoop.Invoke.baselineMH 100 877.480 880.496
>>>> MethodHandlesDoWhileLoop.Invoke.doWhileLoop 1 26.791 7.153
>>>> MethodHandlesDoWhileLoop.Invoke.doWhileLoop 10 158.985 16.990
>>>> MethodHandlesDoWhileLoop.Invoke.doWhileLoop 100 1391.746 130.946
>>>> MethodHandlesIteratedLoop.Create.create N/A 13547.499 15478.542
>>>> MethodHandlesIteratedLoop.Invoke.baseline 0 2.973 2.980
>>>> MethodHandlesIteratedLoop.Invoke.baseline 1 6.771 6.658
>>>> MethodHandlesIteratedLoop.Invoke.baseline 10 14.955 14.955
>>>> MethodHandlesIteratedLoop.Invoke.baseline 100 81.842 82.582
>>>> MethodHandlesIteratedLoop.Invoke.baselineMH 0 14.893 14.668
>>>> MethodHandlesIteratedLoop.Invoke.baselineMH 1 20.998 21.304
>>>> MethodHandlesIteratedLoop.Invoke.baselineMH 10 73.677 72.703
>>>> MethodHandlesIteratedLoop.Invoke.baselineMH 100 613.913 614.475
>>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop 0 33.583 9.603
>>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop 1 82.239 14.433
>>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop 10 448.356 38.650
>>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop 100 4189.034 279.779
>>>> MethodHandlesLoop.Create.create N/A 15505.970 17559.399
>>>> MethodHandlesLoop.Invoke0.baseline 1 3.179 3.181
>>>> MethodHandlesLoop.Invoke0.baseline 10 5.952 6.115
>>>> MethodHandlesLoop.Invoke0.baseline 100 50.942 50.943
>>>> MethodHandlesLoop.Invoke0.loop 1 46.454 5.353
>>>> MethodHandlesLoop.Invoke0.loop 10 514.230 8.487
>>>> MethodHandlesLoop.Invoke0.loop 100 5166.251 52.188
>>>> MethodHandlesLoop.Invoke1.loop 1 34.321 5.277
>>>> MethodHandlesLoop.Invoke1.loop 10 430.839 8.481
>>>> MethodHandlesLoop.Invoke1.loop 100 4095.302 52.206
>>>> MethodHandlesTryFinally.baselineExceptional N/A 3.005 3.002
>>>> MethodHandlesTryFinally.baselineMHExceptional N/A 166.316 166.087
>>>> MethodHandlesTryFinally.baselineMHNormal N/A 9.337 9.276
>>>> MethodHandlesTryFinally.baselineNormal N/A 2.696 2.683
>>>> MethodHandlesTryFinally.create N/A 406.255 406.594
>>>> MethodHandlesTryFinally.invokeTryFinallyExceptional N/A 154.121 154.692
>>>> MethodHandlesTryFinally.invokeTryFinallyNormal N/A 5.350 5.334
>>>> MethodHandlesWhileLoop.Create.create N/A 12214.383 14503.515
>>>> MethodHandlesWhileLoop.Invoke.baseline 0 3.886 3.888
>>>> MethodHandlesWhileLoop.Invoke.baseline 1 5.379 5.377
>>>> MethodHandlesWhileLoop.Invoke.baseline 10 16.000 16.201
>>>> MethodHandlesWhileLoop.Invoke.baseline 100 142.066 143.338
>>>> MethodHandlesWhileLoop.Invoke.baselineMH 0 11.028 11.012
>>>> MethodHandlesWhileLoop.Invoke.baselineMH 1 21.269 21.159
>>>> MethodHandlesWhileLoop.Invoke.baselineMH 10 97.493 97.656
>>>> MethodHandlesWhileLoop.Invoke.baselineMH 100 887.579 886.532
>>>> MethodHandlesWhileLoop.Invoke.whileLoop 0 24.829 7.108
>>>> MethodHandlesWhileLoop.Invoke.whileLoop 1 46.039 8.573
>>>> MethodHandlesWhileLoop.Invoke.whileLoop 10 240.963 21.088
>>>> MethodHandlesWhileLoop.Invoke.whileLoop 100 2092.671 159.016
>>>>
>>>>
>>>>
>>
>
More information about the core-libs-dev
mailing list