RFR(L): 8161211: better inlining support for loop bytecode intrinsics

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Fri Sep 23 16:41:53 UTC 2016


Looks even better :-) Reviewed.

Best regards,
Vladimir Ivanov

On 9/22/16 10:23 AM, Michael Haupt wrote:
> Hi John,
>
> thanks for your review, and thanks Vladimir! I've had another go at the implementation to use a dedicated loop clause holder class with a stable array; performance is roughly on par with that of the BMHs-as-arrays approach (see below).
>
> The new webrev is at http://cr.openjdk.java.net/~mhaupt/8161211/webrev.01/; please review.
>
> Thanks,
>
> Michael
>
>
>
>
> Benchmark        (iterations)     unpatched        patched
> CntL.Cr.cr3      N/A              16039.108        15821.583
> CntL.Cr.cr4      N/A              15621.959        15869.730
> CntL.Inv.bl3     0                2.858            2.835
> CntL.Inv.bl3     1                5.125            5.179
> CntL.Inv.bl3     10               11.887           12.005
> CntL.Inv.bl3     100              67.441           67.279
> CntL.Inv.bl4     0                2.855            2.858
> CntL.Inv.bl4     1                5.120            5.210
> CntL.Inv.bl4     10               11.875           12.012
> CntL.Inv.bl4     100              67.607           67.296
> CntL.Inv.blMH3   0                9.734            9.722
> CntL.Inv.blMH3   1                15.689           15.865
> CntL.Inv.blMH3   10               68.912           69.098
> CntL.Inv.blMH3   100              605.666          605.526
> CntL.Inv.blMH4   0                14.561           13.274
> CntL.Inv.blMH4   1                19.543           19.709
> CntL.Inv.blMH4   10               71.977           72.446
> CntL.Inv.blMH4   100              596.842          598.271
> CntL.Inv.cntL3   0                49.339           6.311
> CntL.Inv.cntL3   1                95.444           7.333
> CntL.Inv.cntL3   10               508.746          20.930
> CntL.Inv.cntL3   100              4701.808         147.383
> CntL.Inv.cntL4   0                49.443           5.780
> CntL.Inv.cntL4   1                98.721           7.465
> CntL.Inv.cntL4   10               503.825          20.932
> CntL.Inv.cntL4   100              4681.803         147.278
> DoWhL.Cr.cr      N/A              7628.312         7803.187
> DoWhL.Inv.bl     1                3.868            3.869
> DoWhL.Inv.bl     10               16.480           16.528
> DoWhL.Inv.bl     100              144.260          144.290
> DoWhL.Inv.blMH   1                14.434           14.430
> DoWhL.Inv.blMH   10               92.542           92.733
> DoWhL.Inv.blMH   100              877.480          876.735
> DoWhL.Inv.doWhL  1                26.791           7.134
> DoWhL.Inv.doWhL  10               158.985          17.004
> DoWhL.Inv.doWhL  100              1391.746         133.253
> ItrL.Cr.cr       N/A              13547.499        13248.913
> ItrL.Inv.bl      0                2.973            2.983
> ItrL.Inv.bl      1                6.771            6.705
> ItrL.Inv.bl      10               14.955           14.952
> ItrL.Inv.bl      100              81.842           82.152
> ItrL.Inv.blMH    0                14.893           15.014
> ItrL.Inv.blMH    1                20.998           21.459
> ItrL.Inv.blMH    10               73.677           73.888
> ItrL.Inv.blMH    100              613.913          615.208
> ItrL.Inv.itrL    0                33.583           10.842
> ItrL.Inv.itrL    1                82.239           13.573
> ItrL.Inv.itrL    10               448.356          38.773
> ItrL.Inv.itrL    100              4189.034         279.918
> L.Cr.cr          N/A              15505.970        15640.994
> L.Inv0.bl        1                3.179            3.186
> L.Inv0.bl        10               5.952            5.912
> L.Inv0.bl        100              50.942           50.964
> L.Inv0.lo        1                46.454           5.290
> L.Inv0.lo        10               514.230          8.492
> L.Inv0.lo        100              5166.251         52.187
> L.Inv1.lo        1                34.321           5.291
> L.Inv1.lo        10               430.839          8.474
> L.Inv1.lo        100              4095.302         52.173
> TF.blEx          N/A              3.005            2.986
> TF.blMHEx        N/A              166.316          165.856
> TF.blMHNor       N/A              9.337            9.290
> TF.blNor         N/A              2.696            2.682
> TF.cr            N/A              406.255          415.090
> TF.invTFEx       N/A              154.121          154.826
> TF.invTFNor      N/A              5.350            5.328
> WhL.Cr.cr        N/A              12214.383        12112.535
> WhL.Inv.bl       0                3.886            3.931
> WhL.Inv.bl       1                5.379            5.411
> WhL.Inv.bl       10               16.000           16.203
> WhL.Inv.bl       100              142.066          142.127
> WhL.Inv.blMH     0                11.028           10.915
> WhL.Inv.blMH     1                21.269           21.419
> WhL.Inv.blMH     10               97.493           98.373
> WhL.Inv.blMH     100              887.579          892.955
> WhL.Inv.whL      0                24.829           7.082
> WhL.Inv.whL      1                46.039           8.598
> WhL.Inv.whL      10               240.963          21.108
> WhL.Inv.whL      100              2092.671         167.619
>
>
>
>
>
>> Am 20.09.2016 um 21:54 schrieb John Rose <john.r.rose at oracle.com>:
>>
>> There should also be an assert in the new LF constructor, which ensures that the two
>> arguments are congruent.  Better yet, just supply one argument (the speciesData),
>> and derive the MT.  These new LFs are pretty confusing, and it's best to nail down
>> unused degrees of freedom.
>>
>> — John
>>
>> P.S.  I would have expected this problem to be solved by having the MHI.toArray function
>> return a box object with a single @Stable array field.  Did that approach fail?
>>
>> I.e., this wrapper emulates a frozen array (until that happy day when we have real
>> frozen arrays):
>>
>> class ArrayConstant<T> {
>>  private final @Stable T[] values;
>>  public ArrayConstant(T[] values) {
>>    for (T v : values)  Objects.requireNonNull(v);
>>    this.values = values.clone();
>>  }
>>  public T get(int i) { return values[i]; }
>>  //public int length() { return values.length; }
>> }
>>
>> The JIT should be able to constant fold through ac.get(i) whenever ac and i are constants.
>>
>> On Sep 20, 2016, at 8:17 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>>
>>> Looks good.
>>>
>>> src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java:
>>> +    LambdaForm bmhArrayForm(MethodType type, BoundMethodHandle.SpeciesData speciesData) {
>>> +        int size = type.parameterCount();
>>> +        Transform key = Transform.of(Transform.BMH_AS_ARRAY, size);
>>> +        LambdaForm form = getInCache(key);
>>> +        if (form != null) {
>>> +            return form;
>>> +        }
>>>
>>> Please, add an assert to ensure the cached LF has the same constraint as requested (speciesData).
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> On 9/20/16 3:53 PM, Michael Haupt wrote:
>>>> Dear all,
>>>>
>>>> please review this change.
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8161211
>>>> Webrev: http://cr.openjdk.java.net/~mhaupt/8161211/webrev.00/
>>>>
>>>> The method handle loop combinators introduced with JEP 274 were originally not intrinsified, leading to poor performance as compared to a pure-Java baseline, but also to handwired method handle combinations. The intrinsics introduced with 8143211 [1] improved on the situation somewhat, but still did not provide good inlining opportunities for the JIT compiler. This change introduces a usage of BoundMethodHandles as arrays to carry the various handles involved in loop execution.
>>>>
>>>> Extra credits to Vladimir Ivanov, who suggested the BMH-as-arrays approach in the first place, and Claes Redestad, who suggested to use LambdaForm editing to neatly enable caching. Thanks!
>>>>
>>>> Performance improves considerably. The table below reports scores in ns/op. The "unpatched" column contains results from before applying the patch for 8161211; the "patched" column, from thereafter.
>>>>
>>>> The create benchmarks measure the cost of loop handle creation. The baseline and baselineMH benchmarks measure the cost of running a pure Java and handwired method handle construct.
>>>>
>>>> Relevant comparisons include loop combinator results versus baselines, and versus unpatched loop combinator results. For the latter, there are significant improvements, except for the creation benchmarks (creation has a more complex workflow now). For the former, it can be seen that the BMH-array intrinsics generally perform better than handwired handle constructs, and have moved much closer to.
>>>>
>>>> Thanks,
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8143211
>>>>
>>>>
>>>>
>>>> Benchmark                                           (iterations)     unpatched        patched
>>>> MethodHandlesCountedLoop.Create.create3             N/A              16039.108        18400.405
>>>> MethodHandlesCountedLoop.Create.create4             N/A              15621.959        17924.696
>>>> MethodHandlesCountedLoop.Invoke.baseline3           0                2.858            2.839
>>>> MethodHandlesCountedLoop.Invoke.baseline3           1                5.125            5.164
>>>> MethodHandlesCountedLoop.Invoke.baseline3           10               11.887           11.924
>>>> MethodHandlesCountedLoop.Invoke.baseline3           100              67.441           67.281
>>>> MethodHandlesCountedLoop.Invoke.baseline4           0                2.855            2.838
>>>> MethodHandlesCountedLoop.Invoke.baseline4           1                5.120            5.179
>>>> MethodHandlesCountedLoop.Invoke.baseline4           10               11.875           11.906
>>>> MethodHandlesCountedLoop.Invoke.baseline4           100              67.607           67.374
>>>> MethodHandlesCountedLoop.Invoke.baselineMH3         0                9.734            9.606
>>>> MethodHandlesCountedLoop.Invoke.baselineMH3         1                15.689           15.674
>>>> MethodHandlesCountedLoop.Invoke.baselineMH3         10               68.912           69.303
>>>> MethodHandlesCountedLoop.Invoke.baselineMH3         100              605.666          606.432
>>>> MethodHandlesCountedLoop.Invoke.baselineMH4         0                14.561           13.234
>>>> MethodHandlesCountedLoop.Invoke.baselineMH4         1                19.543           19.773
>>>> MethodHandlesCountedLoop.Invoke.baselineMH4         10               71.977           72.466
>>>> MethodHandlesCountedLoop.Invoke.baselineMH4         100              596.842          602.469
>>>> MethodHandlesCountedLoop.Invoke.countedLoop3        0                49.339           5.810
>>>> MethodHandlesCountedLoop.Invoke.countedLoop3        1                95.444           7.441
>>>> MethodHandlesCountedLoop.Invoke.countedLoop3        10               508.746          21.002
>>>> MethodHandlesCountedLoop.Invoke.countedLoop3        100              4701.808         145.996
>>>> MethodHandlesCountedLoop.Invoke.countedLoop4        0                49.443           5.798
>>>> MethodHandlesCountedLoop.Invoke.countedLoop4        1                98.721           7.438
>>>> MethodHandlesCountedLoop.Invoke.countedLoop4        10               503.825          21.049
>>>> MethodHandlesCountedLoop.Invoke.countedLoop4        100              4681.803         147.020
>>>> MethodHandlesDoWhileLoop.Create.create              N/A              7628.312         9100.332
>>>> MethodHandlesDoWhileLoop.Invoke.baseline            1                3.868            3.909
>>>> MethodHandlesDoWhileLoop.Invoke.baseline            10               16.480           16.461
>>>> MethodHandlesDoWhileLoop.Invoke.baseline            100              144.260          144.232
>>>> MethodHandlesDoWhileLoop.Invoke.baselineMH          1                14.434           14.494
>>>> MethodHandlesDoWhileLoop.Invoke.baselineMH          10               92.542           93.454
>>>> MethodHandlesDoWhileLoop.Invoke.baselineMH          100              877.480          880.496
>>>> MethodHandlesDoWhileLoop.Invoke.doWhileLoop         1                26.791           7.153
>>>> MethodHandlesDoWhileLoop.Invoke.doWhileLoop         10               158.985          16.990
>>>> MethodHandlesDoWhileLoop.Invoke.doWhileLoop         100              1391.746         130.946
>>>> MethodHandlesIteratedLoop.Create.create             N/A              13547.499        15478.542
>>>> MethodHandlesIteratedLoop.Invoke.baseline           0                2.973            2.980
>>>> MethodHandlesIteratedLoop.Invoke.baseline           1                6.771            6.658
>>>> MethodHandlesIteratedLoop.Invoke.baseline           10               14.955           14.955
>>>> MethodHandlesIteratedLoop.Invoke.baseline           100              81.842           82.582
>>>> MethodHandlesIteratedLoop.Invoke.baselineMH         0                14.893           14.668
>>>> MethodHandlesIteratedLoop.Invoke.baselineMH         1                20.998           21.304
>>>> MethodHandlesIteratedLoop.Invoke.baselineMH         10               73.677           72.703
>>>> MethodHandlesIteratedLoop.Invoke.baselineMH         100              613.913          614.475
>>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop       0                33.583           9.603
>>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop       1                82.239           14.433
>>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop       10               448.356          38.650
>>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop       100              4189.034         279.779
>>>> MethodHandlesLoop.Create.create                     N/A              15505.970        17559.399
>>>> MethodHandlesLoop.Invoke0.baseline                  1                3.179            3.181
>>>> MethodHandlesLoop.Invoke0.baseline                  10               5.952            6.115
>>>> MethodHandlesLoop.Invoke0.baseline                  100              50.942           50.943
>>>> MethodHandlesLoop.Invoke0.loop                      1                46.454           5.353
>>>> MethodHandlesLoop.Invoke0.loop                      10               514.230          8.487
>>>> MethodHandlesLoop.Invoke0.loop                      100              5166.251         52.188
>>>> MethodHandlesLoop.Invoke1.loop                      1                34.321           5.277
>>>> MethodHandlesLoop.Invoke1.loop                      10               430.839          8.481
>>>> MethodHandlesLoop.Invoke1.loop                      100              4095.302         52.206
>>>> MethodHandlesTryFinally.baselineExceptional         N/A              3.005            3.002
>>>> MethodHandlesTryFinally.baselineMHExceptional       N/A              166.316          166.087
>>>> MethodHandlesTryFinally.baselineMHNormal            N/A              9.337            9.276
>>>> MethodHandlesTryFinally.baselineNormal              N/A              2.696            2.683
>>>> MethodHandlesTryFinally.create                      N/A              406.255          406.594
>>>> MethodHandlesTryFinally.invokeTryFinallyExceptional N/A              154.121          154.692
>>>> MethodHandlesTryFinally.invokeTryFinallyNormal      N/A              5.350            5.334
>>>> MethodHandlesWhileLoop.Create.create                N/A              12214.383        14503.515
>>>> MethodHandlesWhileLoop.Invoke.baseline              0                3.886            3.888
>>>> MethodHandlesWhileLoop.Invoke.baseline              1                5.379            5.377
>>>> MethodHandlesWhileLoop.Invoke.baseline              10               16.000           16.201
>>>> MethodHandlesWhileLoop.Invoke.baseline              100              142.066          143.338
>>>> MethodHandlesWhileLoop.Invoke.baselineMH            0                11.028           11.012
>>>> MethodHandlesWhileLoop.Invoke.baselineMH            1                21.269           21.159
>>>> MethodHandlesWhileLoop.Invoke.baselineMH            10               97.493           97.656
>>>> MethodHandlesWhileLoop.Invoke.baselineMH            100              887.579          886.532
>>>> MethodHandlesWhileLoop.Invoke.whileLoop             0                24.829           7.108
>>>> MethodHandlesWhileLoop.Invoke.whileLoop             1                46.039           8.573
>>>> MethodHandlesWhileLoop.Invoke.whileLoop             10               240.963          21.088
>>>> MethodHandlesWhileLoop.Invoke.whileLoop             100              2092.671         159.016
>>>>
>>>>
>>>>
>>
>


More information about the core-libs-dev mailing list