RFR(L): 8161211: better inlining support for loop bytecode intrinsics

Michael Haupt michael.haupt at oracle.com
Thu Sep 22 07:23:39 UTC 2016


Hi John,

thanks for your review, and thanks Vladimir! I've had another go at the implementation to use a dedicated loop clause holder class with a stable array; performance is roughly on par with that of the BMHs-as-arrays approach (see below).

The new webrev is at http://cr.openjdk.java.net/~mhaupt/8161211/webrev.01/; please review.

Thanks,

Michael




Benchmark        (iterations)     unpatched        patched
CntL.Cr.cr3      N/A              16039.108        15821.583
CntL.Cr.cr4      N/A              15621.959        15869.730
CntL.Inv.bl3     0                2.858            2.835
CntL.Inv.bl3     1                5.125            5.179
CntL.Inv.bl3     10               11.887           12.005
CntL.Inv.bl3     100              67.441           67.279
CntL.Inv.bl4     0                2.855            2.858
CntL.Inv.bl4     1                5.120            5.210
CntL.Inv.bl4     10               11.875           12.012
CntL.Inv.bl4     100              67.607           67.296
CntL.Inv.blMH3   0                9.734            9.722
CntL.Inv.blMH3   1                15.689           15.865
CntL.Inv.blMH3   10               68.912           69.098
CntL.Inv.blMH3   100              605.666          605.526
CntL.Inv.blMH4   0                14.561           13.274
CntL.Inv.blMH4   1                19.543           19.709
CntL.Inv.blMH4   10               71.977           72.446
CntL.Inv.blMH4   100              596.842          598.271
CntL.Inv.cntL3   0                49.339           6.311
CntL.Inv.cntL3   1                95.444           7.333
CntL.Inv.cntL3   10               508.746          20.930
CntL.Inv.cntL3   100              4701.808         147.383
CntL.Inv.cntL4   0                49.443           5.780
CntL.Inv.cntL4   1                98.721           7.465
CntL.Inv.cntL4   10               503.825          20.932
CntL.Inv.cntL4   100              4681.803         147.278
DoWhL.Cr.cr      N/A              7628.312         7803.187
DoWhL.Inv.bl     1                3.868            3.869
DoWhL.Inv.bl     10               16.480           16.528
DoWhL.Inv.bl     100              144.260          144.290
DoWhL.Inv.blMH   1                14.434           14.430
DoWhL.Inv.blMH   10               92.542           92.733
DoWhL.Inv.blMH   100              877.480          876.735
DoWhL.Inv.doWhL  1                26.791           7.134
DoWhL.Inv.doWhL  10               158.985          17.004
DoWhL.Inv.doWhL  100              1391.746         133.253
ItrL.Cr.cr       N/A              13547.499        13248.913
ItrL.Inv.bl      0                2.973            2.983
ItrL.Inv.bl      1                6.771            6.705
ItrL.Inv.bl      10               14.955           14.952
ItrL.Inv.bl      100              81.842           82.152
ItrL.Inv.blMH    0                14.893           15.014
ItrL.Inv.blMH    1                20.998           21.459
ItrL.Inv.blMH    10               73.677           73.888
ItrL.Inv.blMH    100              613.913          615.208
ItrL.Inv.itrL    0                33.583           10.842
ItrL.Inv.itrL    1                82.239           13.573
ItrL.Inv.itrL    10               448.356          38.773
ItrL.Inv.itrL    100              4189.034         279.918
L.Cr.cr          N/A              15505.970        15640.994
L.Inv0.bl        1                3.179            3.186
L.Inv0.bl        10               5.952            5.912
L.Inv0.bl        100              50.942           50.964
L.Inv0.lo        1                46.454           5.290
L.Inv0.lo        10               514.230          8.492
L.Inv0.lo        100              5166.251         52.187
L.Inv1.lo        1                34.321           5.291
L.Inv1.lo        10               430.839          8.474
L.Inv1.lo        100              4095.302         52.173
TF.blEx          N/A              3.005            2.986
TF.blMHEx        N/A              166.316          165.856
TF.blMHNor       N/A              9.337            9.290
TF.blNor         N/A              2.696            2.682
TF.cr            N/A              406.255          415.090
TF.invTFEx       N/A              154.121          154.826
TF.invTFNor      N/A              5.350            5.328
WhL.Cr.cr        N/A              12214.383        12112.535
WhL.Inv.bl       0                3.886            3.931
WhL.Inv.bl       1                5.379            5.411
WhL.Inv.bl       10               16.000           16.203
WhL.Inv.bl       100              142.066          142.127
WhL.Inv.blMH     0                11.028           10.915
WhL.Inv.blMH     1                21.269           21.419
WhL.Inv.blMH     10               97.493           98.373
WhL.Inv.blMH     100              887.579          892.955
WhL.Inv.whL      0                24.829           7.082
WhL.Inv.whL      1                46.039           8.598
WhL.Inv.whL      10               240.963          21.108
WhL.Inv.whL      100              2092.671         167.619





> Am 20.09.2016 um 21:54 schrieb John Rose <john.r.rose at oracle.com>:
> 
> There should also be an assert in the new LF constructor, which ensures that the two
> arguments are congruent.  Better yet, just supply one argument (the speciesData),
> and derive the MT.  These new LFs are pretty confusing, and it's best to nail down
> unused degrees of freedom.
> 
> — John
> 
> P.S.  I would have expected this problem to be solved by having the MHI.toArray function
> return a box object with a single @Stable array field.  Did that approach fail?
> 
> I.e., this wrapper emulates a frozen array (until that happy day when we have real
> frozen arrays):
> 
> class ArrayConstant<T> {
>  private final @Stable T[] values;
>  public ArrayConstant(T[] values) {
>    for (T v : values)  Objects.requireNonNull(v);
>    this.values = values.clone();
>  }
>  public T get(int i) { return values[i]; }
>  //public int length() { return values.length; }
> }
> 
> The JIT should be able to constant fold through ac.get(i) whenever ac and i are constants.
> 
> On Sep 20, 2016, at 8:17 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>> 
>> Looks good.
>> 
>> src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java:
>> +    LambdaForm bmhArrayForm(MethodType type, BoundMethodHandle.SpeciesData speciesData) {
>> +        int size = type.parameterCount();
>> +        Transform key = Transform.of(Transform.BMH_AS_ARRAY, size);
>> +        LambdaForm form = getInCache(key);
>> +        if (form != null) {
>> +            return form;
>> +        }
>> 
>> Please, add an assert to ensure the cached LF has the same constraint as requested (speciesData).
>> 
>> Best regards,
>> Vladimir Ivanov
>> 
>> On 9/20/16 3:53 PM, Michael Haupt wrote:
>>> Dear all,
>>> 
>>> please review this change.
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8161211
>>> Webrev: http://cr.openjdk.java.net/~mhaupt/8161211/webrev.00/
>>> 
>>> The method handle loop combinators introduced with JEP 274 were originally not intrinsified, leading to poor performance as compared to a pure-Java baseline, but also to handwired method handle combinations. The intrinsics introduced with 8143211 [1] improved on the situation somewhat, but still did not provide good inlining opportunities for the JIT compiler. This change introduces a usage of BoundMethodHandles as arrays to carry the various handles involved in loop execution.
>>> 
>>> Extra credits to Vladimir Ivanov, who suggested the BMH-as-arrays approach in the first place, and Claes Redestad, who suggested to use LambdaForm editing to neatly enable caching. Thanks!
>>> 
>>> Performance improves considerably. The table below reports scores in ns/op. The "unpatched" column contains results from before applying the patch for 8161211; the "patched" column, from thereafter.
>>> 
>>> The create benchmarks measure the cost of loop handle creation. The baseline and baselineMH benchmarks measure the cost of running a pure Java and handwired method handle construct.
>>> 
>>> Relevant comparisons include loop combinator results versus baselines, and versus unpatched loop combinator results. For the latter, there are significant improvements, except for the creation benchmarks (creation has a more complex workflow now). For the former, it can be seen that the BMH-array intrinsics generally perform better than handwired handle constructs, and have moved much closer to.
>>> 
>>> Thanks,
>>> 
>>> Michael
>>> 
>>> 
>>> 
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8143211
>>> 
>>> 
>>> 
>>> Benchmark                                           (iterations)     unpatched        patched
>>> MethodHandlesCountedLoop.Create.create3             N/A              16039.108        18400.405
>>> MethodHandlesCountedLoop.Create.create4             N/A              15621.959        17924.696
>>> MethodHandlesCountedLoop.Invoke.baseline3           0                2.858            2.839
>>> MethodHandlesCountedLoop.Invoke.baseline3           1                5.125            5.164
>>> MethodHandlesCountedLoop.Invoke.baseline3           10               11.887           11.924
>>> MethodHandlesCountedLoop.Invoke.baseline3           100              67.441           67.281
>>> MethodHandlesCountedLoop.Invoke.baseline4           0                2.855            2.838
>>> MethodHandlesCountedLoop.Invoke.baseline4           1                5.120            5.179
>>> MethodHandlesCountedLoop.Invoke.baseline4           10               11.875           11.906
>>> MethodHandlesCountedLoop.Invoke.baseline4           100              67.607           67.374
>>> MethodHandlesCountedLoop.Invoke.baselineMH3         0                9.734            9.606
>>> MethodHandlesCountedLoop.Invoke.baselineMH3         1                15.689           15.674
>>> MethodHandlesCountedLoop.Invoke.baselineMH3         10               68.912           69.303
>>> MethodHandlesCountedLoop.Invoke.baselineMH3         100              605.666          606.432
>>> MethodHandlesCountedLoop.Invoke.baselineMH4         0                14.561           13.234
>>> MethodHandlesCountedLoop.Invoke.baselineMH4         1                19.543           19.773
>>> MethodHandlesCountedLoop.Invoke.baselineMH4         10               71.977           72.466
>>> MethodHandlesCountedLoop.Invoke.baselineMH4         100              596.842          602.469
>>> MethodHandlesCountedLoop.Invoke.countedLoop3        0                49.339           5.810
>>> MethodHandlesCountedLoop.Invoke.countedLoop3        1                95.444           7.441
>>> MethodHandlesCountedLoop.Invoke.countedLoop3        10               508.746          21.002
>>> MethodHandlesCountedLoop.Invoke.countedLoop3        100              4701.808         145.996
>>> MethodHandlesCountedLoop.Invoke.countedLoop4        0                49.443           5.798
>>> MethodHandlesCountedLoop.Invoke.countedLoop4        1                98.721           7.438
>>> MethodHandlesCountedLoop.Invoke.countedLoop4        10               503.825          21.049
>>> MethodHandlesCountedLoop.Invoke.countedLoop4        100              4681.803         147.020
>>> MethodHandlesDoWhileLoop.Create.create              N/A              7628.312         9100.332
>>> MethodHandlesDoWhileLoop.Invoke.baseline            1                3.868            3.909
>>> MethodHandlesDoWhileLoop.Invoke.baseline            10               16.480           16.461
>>> MethodHandlesDoWhileLoop.Invoke.baseline            100              144.260          144.232
>>> MethodHandlesDoWhileLoop.Invoke.baselineMH          1                14.434           14.494
>>> MethodHandlesDoWhileLoop.Invoke.baselineMH          10               92.542           93.454
>>> MethodHandlesDoWhileLoop.Invoke.baselineMH          100              877.480          880.496
>>> MethodHandlesDoWhileLoop.Invoke.doWhileLoop         1                26.791           7.153
>>> MethodHandlesDoWhileLoop.Invoke.doWhileLoop         10               158.985          16.990
>>> MethodHandlesDoWhileLoop.Invoke.doWhileLoop         100              1391.746         130.946
>>> MethodHandlesIteratedLoop.Create.create             N/A              13547.499        15478.542
>>> MethodHandlesIteratedLoop.Invoke.baseline           0                2.973            2.980
>>> MethodHandlesIteratedLoop.Invoke.baseline           1                6.771            6.658
>>> MethodHandlesIteratedLoop.Invoke.baseline           10               14.955           14.955
>>> MethodHandlesIteratedLoop.Invoke.baseline           100              81.842           82.582
>>> MethodHandlesIteratedLoop.Invoke.baselineMH         0                14.893           14.668
>>> MethodHandlesIteratedLoop.Invoke.baselineMH         1                20.998           21.304
>>> MethodHandlesIteratedLoop.Invoke.baselineMH         10               73.677           72.703
>>> MethodHandlesIteratedLoop.Invoke.baselineMH         100              613.913          614.475
>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop       0                33.583           9.603
>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop       1                82.239           14.433
>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop       10               448.356          38.650
>>> MethodHandlesIteratedLoop.Invoke.iteratedLoop       100              4189.034         279.779
>>> MethodHandlesLoop.Create.create                     N/A              15505.970        17559.399
>>> MethodHandlesLoop.Invoke0.baseline                  1                3.179            3.181
>>> MethodHandlesLoop.Invoke0.baseline                  10               5.952            6.115
>>> MethodHandlesLoop.Invoke0.baseline                  100              50.942           50.943
>>> MethodHandlesLoop.Invoke0.loop                      1                46.454           5.353
>>> MethodHandlesLoop.Invoke0.loop                      10               514.230          8.487
>>> MethodHandlesLoop.Invoke0.loop                      100              5166.251         52.188
>>> MethodHandlesLoop.Invoke1.loop                      1                34.321           5.277
>>> MethodHandlesLoop.Invoke1.loop                      10               430.839          8.481
>>> MethodHandlesLoop.Invoke1.loop                      100              4095.302         52.206
>>> MethodHandlesTryFinally.baselineExceptional         N/A              3.005            3.002
>>> MethodHandlesTryFinally.baselineMHExceptional       N/A              166.316          166.087
>>> MethodHandlesTryFinally.baselineMHNormal            N/A              9.337            9.276
>>> MethodHandlesTryFinally.baselineNormal              N/A              2.696            2.683
>>> MethodHandlesTryFinally.create                      N/A              406.255          406.594
>>> MethodHandlesTryFinally.invokeTryFinallyExceptional N/A              154.121          154.692
>>> MethodHandlesTryFinally.invokeTryFinallyNormal      N/A              5.350            5.334
>>> MethodHandlesWhileLoop.Create.create                N/A              12214.383        14503.515
>>> MethodHandlesWhileLoop.Invoke.baseline              0                3.886            3.888
>>> MethodHandlesWhileLoop.Invoke.baseline              1                5.379            5.377
>>> MethodHandlesWhileLoop.Invoke.baseline              10               16.000           16.201
>>> MethodHandlesWhileLoop.Invoke.baseline              100              142.066          143.338
>>> MethodHandlesWhileLoop.Invoke.baselineMH            0                11.028           11.012
>>> MethodHandlesWhileLoop.Invoke.baselineMH            1                21.269           21.159
>>> MethodHandlesWhileLoop.Invoke.baselineMH            10               97.493           97.656
>>> MethodHandlesWhileLoop.Invoke.baselineMH            100              887.579          886.532
>>> MethodHandlesWhileLoop.Invoke.whileLoop             0                24.829           7.108
>>> MethodHandlesWhileLoop.Invoke.whileLoop             1                46.039           8.573
>>> MethodHandlesWhileLoop.Invoke.whileLoop             10               240.963          21.088
>>> MethodHandlesWhileLoop.Invoke.whileLoop             100              2092.671         159.016
>>> 
>>> 
>>> 
> 

-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | LangTools Team | Nashorn
Oracle Deutschland B.V. & Co. KG | Schiffbauergasse 14 | 14467 Potsdam, Germany

ORACLE Deutschland B.V. & Co. KG | Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V. | Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Nederland, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val Maher
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment



More information about the core-libs-dev mailing list