RFR: 8071571: Move substring of same string to slow path

Wed May 13 23:26:48 UTC 2015

Need JIT generated assembly, not bytecode :).  That will tell you at least
which optimizations JIT applied, how it register allocated things, etc.  If
nothing obvious there, see my other reply regarding cpu event based
profiling.  I'm sure Aleksey Shipilev could help out if you're really
inclined to figure this out.

sent from my phone
On May 13, 2015 7:23 PM, "Ivan Gerasimov" <ivan.gerasimov at oracle.com> wrote:

>
>
> On 14.05.2015 2:06, Vitaly Davidovich wrote:
>
> Why not look at the generated asm and not guess? :) The branch avoiding
> versions may cause data dependence hazards whereas the branchy one just has
> branches but assuming perfectly predicted (and microbenchmarks typically
> are) can pipeline through.  Ivan, could you please post the asm here?
> Assuming you guys are interested in investigating this further.
>
> Sure, here they are:
>
>   void substring_1(int, int, char[]);
>     Code:
>        0: iload_1
>        1: iflt          15
>        4: iload_2
>        5: aload_3
>        6: arraylength
>        7: if_icmpgt     15
>       10: iload_1
>       11: iload_2
>       12: if_icmple     23
>       15: new           #4                  // class java/lang/Error
>       18: dup
>       19: invokespecial #5                  // Method
> java/lang/Error."<init>":()V
>       22: athrow
>       23: return
>
>   void substring_2(int, int, char[]);
>     Code:
>        0: iload_1
>        1: aload_3
>        2: arraylength
>        3: iload_2
>        4: isub
>        5: ior
>        6: iload_2
>        7: iload_1
>        8: isub
>        9: ior
>       10: ifge          21
>       13: new           #4                  // class java/lang/Error
>       16: dup
>       17: invokespecial #5                  // Method
> java/lang/Error."<init>":()V
>       20: athrow
>       21: return
>
>   void substring_3(int, int, char[]);
>     Code:
>        0: iload_1
>        1: aload_3
>        2: arraylength
>        3: iload_2
>        4: isub
>        5: ior
>        6: iflt          18
>        9: iload_2
>       10: iload_1
>       11: isub
>       12: dup
>       13: istore        4
>       15: ifge          26
>       18: new           #4                  // class java/lang/Error
>       21: dup
>       22: invokespecial #5                  // Method
> java/lang/Error."<init>":()V
>       25: athrow
>       26: return
>
> Sincerely yours,
> Ivan
>
>  sent from my phone
> On May 13, 2015 6:51 PM, "Martin Buchholz" <martinrb at google.com> wrote:
>
>> On Wed, May 13, 2015 at 2:25 PM, Ivan Gerasimov <
>> ivan.gerasimov at oracle.com>
>> wrote:
>>
>> >
>> > Benchmark                  Mode  Cnt           Score          Error
>> Units
>> > MyBenchmark.testMethod_1  thrpt   60  1132911599.680 ± 42375177.640
>> ops/s
>> > MyBenchmark.testMethod_2  thrpt   60   813737659.576 ± 14226427.823
>> ops/s
>> > MyBenchmark.testMethod_3  thrpt   60   810406621.145 ± 12316864.045
>> ops/s
>> >
>> > The plain old ||-combined check was faster in this round.
>> > Some other tests showed different results.
>> > The speed seems to depend on the scope of the checked variables and
>> > complexity of the expressions to calculate.
>> > However, I still don't have a clear understanding of all the aspects we
>> > need to pay attention to when doing such optimizations.
>> >
>>
>> I'm not sure, but the only thing that could explain such a huge
>> performance
>> gap is that hotspot was able to determine at jit time that some of the
>> comparisons did not need to be performed at all.  If true, is this
>> cheating
>> or not?  (you could retry with -Xint)  One of the ideas is to separate hot
>> and cold code (hotspot does not yet split code inside a single method) so
>> that hotspot is more likely to inline, so that hotspot is more likely to
>> optimize, and optimizing beginIndex < 0 away entirely is much easier than
>> my more complex expression.  So yeah, I could be persuaded that keeping
>> beginIndex < 0 as an independent expression likely to be eliminated.
>> Micro-optimizing is hard, but for the very core of the platform, important
>> (more than readability).
>>
>> One of these days I have to learn how to write a jmh benchmark.
>>
>
>