RFR(L): 8161211: better inlining support for loop bytecode intrinsics
Michael Haupt
michael.haupt at oracle.com
Thu Sep 22 07:23:39 UTC 2016
Hi John,
thanks for your review, and thanks Vladimir! I've had another go at the implementation to use a dedicated loop clause holder class with a stable array; performance is roughly on par with that of the BMHs-as-arrays approach (see below).
The new webrev is at http://cr.openjdk.java.net/~mhaupt/8161211/webrev.01/; please review.
Thanks,
Michael
Benchmark (iterations) unpatched patched
CntL.Cr.cr3 N/A 16039.108 15821.583
CntL.Cr.cr4 N/A 15621.959 15869.730
CntL.Inv.bl3 0 2.858 2.835
CntL.Inv.bl3 1 5.125 5.179
CntL.Inv.bl3 10 11.887 12.005
CntL.Inv.bl3 100 67.441 67.279
CntL.Inv.bl4 0 2.855 2.858
CntL.Inv.bl4 1 5.120 5.210
CntL.Inv.bl4 10 11.875 12.012
CntL.Inv.bl4 100 67.607 67.296
CntL.Inv.blMH3 0 9.734 9.722
CntL.Inv.blMH3 1 15.689 15.865
CntL.Inv.blMH3 10 68.912 69.098
CntL.Inv.blMH3 100 605.666 605.526
CntL.Inv.blMH4 0 14.561 13.274
CntL.Inv.blMH4 1 19.543 19.709
CntL.Inv.blMH4 10 71.977 72.446
CntL.Inv.blMH4 100 596.842 598.271
CntL.Inv.cntL3 0 49.339 6.311
CntL.Inv.cntL3 1 95.444 7.333
CntL.Inv.cntL3 10 508.746 20.930
CntL.Inv.cntL3 100 4701.808 147.383
CntL.Inv.cntL4 0 49.443 5.780
CntL.Inv.cntL4 1 98.721 7.465
CntL.Inv.cntL4 10 503.825 20.932
CntL.Inv.cntL4 100 4681.803 147.278
DoWhL.Cr.cr N/A 7628.312 7803.187
DoWhL.Inv.bl 1 3.868 3.869
DoWhL.Inv.bl 10 16.480 16.528
DoWhL.Inv.bl 100 144.260 144.290
DoWhL.Inv.blMH 1 14.434 14.430
DoWhL.Inv.blMH 10 92.542 92.733
DoWhL.Inv.blMH 100 877.480 876.735
DoWhL.Inv.doWhL 1 26.791 7.134
DoWhL.Inv.doWhL 10 158.985 17.004
DoWhL.Inv.doWhL 100 1391.746 133.253
ItrL.Cr.cr N/A 13547.499 13248.913
ItrL.Inv.bl 0 2.973 2.983
ItrL.Inv.bl 1 6.771 6.705
ItrL.Inv.bl 10 14.955 14.952
ItrL.Inv.bl 100 81.842 82.152
ItrL.Inv.blMH 0 14.893 15.014
ItrL.Inv.blMH 1 20.998 21.459
ItrL.Inv.blMH 10 73.677 73.888
ItrL.Inv.blMH 100 613.913 615.208
ItrL.Inv.itrL 0 33.583 10.842
ItrL.Inv.itrL 1 82.239 13.573
ItrL.Inv.itrL 10 448.356 38.773
ItrL.Inv.itrL 100 4189.034 279.918
L.Cr.cr N/A 15505.970 15640.994
L.Inv0.bl 1 3.179 3.186
L.Inv0.bl 10 5.952 5.912
L.Inv0.bl 100 50.942 50.964
L.Inv0.lo 1 46.454 5.290
L.Inv0.lo 10 514.230 8.492
L.Inv0.lo 100 5166.251 52.187
L.Inv1.lo 1 34.321 5.291
L.Inv1.lo 10 430.839 8.474
L.Inv1.lo 100 4095.302 52.173
TF.blEx N/A 3.005 2.986
TF.blMHEx N/A 166.316 165.856
TF.blMHNor N/A 9.337 9.290
TF.blNor N/A 2.696 2.682
TF.cr N/A 406.255 415.090
TF.invTFEx N/A 154.121 154.826
TF.invTFNor N/A 5.350 5.328
WhL.Cr.cr N/A 12214.383 12112.535
WhL.Inv.bl 0 3.886 3.931
WhL.Inv.bl 1 5.379 5.411
WhL.Inv.bl 10 16.000 16.203
WhL.Inv.bl 100 142.066 142.127
WhL.Inv.blMH 0 11.028 10.915
WhL.Inv.blMH 1 21.269 21.419
WhL.Inv.blMH 10 97.493 98.373
WhL.Inv.blMH 100 887.579 892.955
WhL.Inv.whL 0 24.829 7.082
WhL.Inv.whL 1 46.039 8.598
WhL.Inv.whL 10 240.963 21.108
WhL.Inv.whL 100 2092.671 167.619
> Am 20.09.2016 um 21:54 schrieb John Rose <john.r.rose at oracle.com>:
>
> There should also be an assert in the new LF constructor, which ensures that the two
> arguments are congruent. Better yet, just supply one argument (the speciesData),
> and derive the MT. These new LFs are pretty confusing, and it's best to nail down
> unused degrees of freedom.
>
> — John
>
> P.S. I would have expected this problem to be solved by having the MHI.toArray function
> return a box object with a single @Stable array field. Did that approach fail?
>
> I.e., this wrapper emulates a frozen array (until that happy day when we have real
> frozen arrays):
>
> class ArrayConstant<T> {
> private final @Stable T[] values;
> public ArrayConstant(T[] values) {
> for (T v : values) Objects.requireNonNull(v);
> this.values = values.clone();
> }
> public T get(int i) { return values[i]; }
> //public int length() { return values.length; }
> }
>
> The JIT should be able to constant fold through ac.get(i) whenever ac and i are constants.
>
> On Sep 20, 2016, at 8:17 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> Looks good.
>>
>> src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java:
>> + LambdaForm bmhArrayForm(MethodType type, BoundMethodHandle.SpeciesData speciesData) {
>> + int size = type.parameterCount();
>> + Transform key = Transform.of(Transform.BMH_AS_ARRAY, size);
>> + LambdaForm form = getInCache(key);
>> + if (form != null) {
>> + return form;
>> + }
>>
>> Please, add an assert to ensure the cached LF has the same constraint as requested (speciesData).
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 9/20/16 3:53 PM, Michael Haupt wrote:
>>> Dear all,
>>>
>>> please review this change.
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8161211
>>> Webrev: http://cr.openjdk.java.net/~mhaupt/8161211/webrev.00/
>>>
>>> The method handle loop combinators introduced with JEP 274 were originally not intrinsified, leading to poor performance as compared to a pure-Java baseline, but also to handwired method handle combinations. The intrinsics introduced with 8143211 [1] improved on the situation somewhat, but still did not provide good inlining opportunities for the JIT compiler. This change introduces a usage of BoundMethodHandles as arrays to carry the various handles involved in loop execution.
>>>
>>> Extra credits to Vladimir Ivanov, who suggested the BMH-as-arrays approach in the first place, and Claes Redestad, who suggested to use LambdaForm editing to neatly enable caching. Thanks!
>>>
>>> Performance improves considerably. The table below reports scores in ns/op. The "unpatched" column contains results from before applying the patch for 8161211; the "patched" column, from thereafter.
>>>
>>> The create benchmarks measure the cost of loop handle creation. The baseline and baselineMH benchmarks measure the cost of running a pure Java and handwired method handle construct.
>>>
>>> Relevant comparisons include loop combinator results versus baselines, and versus unpatched loop combinator results. For the latter, there are significant improvements, except for the creation benchmarks (creation has a more complex workflow now). For the former, it can be seen that the BMH-array intrinsics generally perform better than handwired handle constructs, and have moved much closer to.
>>>
>>> Thanks,
>>>
>>> Michael
>>>
>>>
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8143211
>>>
>>>
>>>
>>> Benchmark (iterations) unpatched patched
>>> MethodHandlesCountedLoop.Create.create3 N/A 16039.108 18400.405
>>> MethodHandlesCountedLoop.Create.create4 N/A 15621.959 17924.696
>>> MethodHandlesCountedLoop.Invoke.baseline3 0 2.858 2.839
>>> MethodHandlesCountedLoop.Invoke.baseline3 1 5.125 5.164
>>> MethodHandlesCountedLoop.Invoke.baseline3 10 11.887 11.924
>>> MethodHandlesCountedLoop.Invoke.baseline3 100 67.441 67.281
>>> MethodHandlesCountedLoop.Invoke.baseline4 0 2.855 2.838
>>> MethodHandlesCountedLoop.Invoke.baseline4 1 5.120 5.179
>>> MethodHandlesCountedLoop.Invoke.baseline4 10 11.875 11.906
>>> MethodHandlesCountedLoop.Invoke.baseline4 100 67.607 67.374
>>> MethodHandlesCountedLoop.Invoke.baselineMH3 0 9.734 9.606
>>> MethodHandlesCountedLoop.Invoke.baselineMH3 1 15.689 15.674
>>> MethodHandlesCountedLoop.Invoke.baselineMH3 10 68.912 69.303
>>> MethodHandlesCountedLoop.Invoke.baselineMH3 100 605.666 606.432
>>> MethodHandlesCountedLoop.Invoke.baselineMH4 0 14.561 13.234
>>> MethodHandlesCountedLoop.Invoke.baselineMH4 1 19.543 19.773
>>> MethodHandlesCountedLoop.Invoke.baselineMH4 10 71.977 72.466
>>> MethodHandlesCountedLoop.Invoke.baselineMH4 100 596.842 602.469
>>> MethodHandlesCountedLoop.Invoke.countedLoop3 0 49.339 5.810
>>> MethodHandlesCountedLoop.Invoke.countedLoop3 1 95.444 7.441
>>> MethodHandlesCountedLoop.Invoke.countedLoop3 10 508.746 21.002
>>> MethodHandlesCountedLoop.Invoke.countedLoop3 100 4701.808 145.996
>>> MethodHandlesCountedLoop.Invoke.countedLoop4 0 49.443 5.798
>>> MethodHandlesCountedLoop.Invoke.countedLoop4 1 98.721 7.438
>>> MethodHandlesCountedLoop.Invoke.countedLoop4 10 503.825 21.049
>>> MethodHandlesCountedLoop.Invoke.countedLoop4 100 4681.803 147.020
>>> MethodHandlesDoWhileLoop.Create.create N/A 7628.312 9100.332
>>> MethodHandlesDoWhileLoop.Invoke.baseline 1 3.868 3.909
>>> MethodHandlesDoWhileLoop.Invoke.baseline 10 16.480 16.461
>>> MethodHandlesDoWhileLoop.Invoke.baseline 100 144.260 144.232
>>> MethodHandlesDoWhileLoop.Invoke.baselineMH 1 14.434 14.494
>>> MethodHandlesDoWhileLoop.Invoke.baselineMH 10 92.542 93.454
>>> MethodHandlesDoWhileLoop.Invoke.baselineMH 100 877.480 880.496
>>> MethodHandlesDoWhileLoop.Invoke.doWhileLoop 1 26.791 7.153
>>> MethodHandlesDoWhileLoop.Invoke.doWhileLoop 10 158.985 16.990
>>> MethodHandlesDoWhileLoop.Invoke.doWhileLoop 100 1391.746 130.946
>>> MethodHandlesIteratedLoop.Create.create N/A 13547.499 15478.542
>>> MethodHandlesIteratedLoop.Invoke.baseline 0 2.973 2.980
>>> MethodHandlesIteratedLoop.Invoke.baseline 1 6.771 6.658
>>> MethodHandlesIteratedLoop.Invoke.baseline 10 14.955 14.955
>>> MethodHandlesIteratedLoop.Invoke.baseline 100 81.842 82.582
>>> MethodHandlesIteratedLoop.Invoke.baselineMH 0 14.893 14.668
>>> MethodHandlesIteratedLoop.Invoke.baselineMH 1 20.998 21.304
>>> MethodHandlesIteratedLoop.Invoke.baselineMH 10 73.677 72.703
>>> MethodHandlesIteratedLoop.Invoke.baselineMH 100 613.913 614.475
>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop 0 33.583 9.603
>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop 1 82.239 14.433
>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop 10 448.356 38.650
>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop 100 4189.034 279.779
>>> MethodHandlesLoop.Create.create N/A 15505.970 17559.399
>>> MethodHandlesLoop.Invoke0.baseline 1 3.179 3.181
>>> MethodHandlesLoop.Invoke0.baseline 10 5.952 6.115
>>> MethodHandlesLoop.Invoke0.baseline 100 50.942 50.943
>>> MethodHandlesLoop.Invoke0.loop 1 46.454 5.353
>>> MethodHandlesLoop.Invoke0.loop 10 514.230 8.487
>>> MethodHandlesLoop.Invoke0.loop 100 5166.251 52.188
>>> MethodHandlesLoop.Invoke1.loop 1 34.321 5.277
>>> MethodHandlesLoop.Invoke1.loop 10 430.839 8.481
>>> MethodHandlesLoop.Invoke1.loop 100 4095.302 52.206
>>> MethodHandlesTryFinally.baselineExceptional N/A 3.005 3.002
>>> MethodHandlesTryFinally.baselineMHExceptional N/A 166.316 166.087
>>> MethodHandlesTryFinally.baselineMHNormal N/A 9.337 9.276
>>> MethodHandlesTryFinally.baselineNormal N/A 2.696 2.683
>>> MethodHandlesTryFinally.create N/A 406.255 406.594
>>> MethodHandlesTryFinally.invokeTryFinallyExceptional N/A 154.121 154.692
>>> MethodHandlesTryFinally.invokeTryFinallyNormal N/A 5.350 5.334
>>> MethodHandlesWhileLoop.Create.create N/A 12214.383 14503.515
>>> MethodHandlesWhileLoop.Invoke.baseline 0 3.886 3.888
>>> MethodHandlesWhileLoop.Invoke.baseline 1 5.379 5.377
>>> MethodHandlesWhileLoop.Invoke.baseline 10 16.000 16.201
>>> MethodHandlesWhileLoop.Invoke.baseline 100 142.066 143.338
>>> MethodHandlesWhileLoop.Invoke.baselineMH 0 11.028 11.012
>>> MethodHandlesWhileLoop.Invoke.baselineMH 1 21.269 21.159
>>> MethodHandlesWhileLoop.Invoke.baselineMH 10 97.493 97.656
>>> MethodHandlesWhileLoop.Invoke.baselineMH 100 887.579 886.532
>>> MethodHandlesWhileLoop.Invoke.whileLoop 0 24.829 7.108
>>> MethodHandlesWhileLoop.Invoke.whileLoop 1 46.039 8.573
>>> MethodHandlesWhileLoop.Invoke.whileLoop 10 240.963 21.088
>>> MethodHandlesWhileLoop.Invoke.whileLoop 100 2092.671 159.016
>>>
>>>
>>>
>
--
<http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | LangTools Team | Nashorn
Oracle Deutschland B.V. & Co. KG | Schiffbauergasse 14 | 14467 Potsdam, Germany
ORACLE Deutschland B.V. & Co. KG | Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603
Komplementärin: ORACLE Deutschland Verwaltung B.V. | Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Nederland, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val Maher
<http://www.oracle.com/commitment> Oracle is committed to developing practices and products that help protect the environment
More information about the core-libs-dev
mailing list