RFR: 8071571: Move substring of same string to slow path
Vitaly Davidovich
vitalyd at gmail.com
Wed May 13 23:26:48 UTC 2015
Need JIT generated assembly, not bytecode :). That will tell you at least
which optimizations JIT applied, how it register allocated things, etc. If
nothing obvious there, see my other reply regarding cpu event based
profiling. I'm sure Aleksey Shipilev could help out if you're really
inclined to figure this out.
sent from my phone
On May 13, 2015 7:23 PM, "Ivan Gerasimov" <ivan.gerasimov at oracle.com> wrote:
>
>
> On 14.05.2015 2:06, Vitaly Davidovich wrote:
>
> Why not look at the generated asm and not guess? :) The branch avoiding
> versions may cause data dependence hazards whereas the branchy one just has
> branches but assuming perfectly predicted (and microbenchmarks typically
> are) can pipeline through. Ivan, could you please post the asm here?
> Assuming you guys are interested in investigating this further.
>
> Sure, here they are:
>
> void substring_1(int, int, char[]);
> Code:
> 0: iload_1
> 1: iflt 15
> 4: iload_2
> 5: aload_3
> 6: arraylength
> 7: if_icmpgt 15
> 10: iload_1
> 11: iload_2
> 12: if_icmple 23
> 15: new #4 // class java/lang/Error
> 18: dup
> 19: invokespecial #5 // Method
> java/lang/Error."<init>":()V
> 22: athrow
> 23: return
>
> void substring_2(int, int, char[]);
> Code:
> 0: iload_1
> 1: aload_3
> 2: arraylength
> 3: iload_2
> 4: isub
> 5: ior
> 6: iload_2
> 7: iload_1
> 8: isub
> 9: ior
> 10: ifge 21
> 13: new #4 // class java/lang/Error
> 16: dup
> 17: invokespecial #5 // Method
> java/lang/Error."<init>":()V
> 20: athrow
> 21: return
>
> void substring_3(int, int, char[]);
> Code:
> 0: iload_1
> 1: aload_3
> 2: arraylength
> 3: iload_2
> 4: isub
> 5: ior
> 6: iflt 18
> 9: iload_2
> 10: iload_1
> 11: isub
> 12: dup
> 13: istore 4
> 15: ifge 26
> 18: new #4 // class java/lang/Error
> 21: dup
> 22: invokespecial #5 // Method
> java/lang/Error."<init>":()V
> 25: athrow
> 26: return
>
> Sincerely yours,
> Ivan
>
> sent from my phone
> On May 13, 2015 6:51 PM, "Martin Buchholz" <martinrb at google.com> wrote:
>
>> On Wed, May 13, 2015 at 2:25 PM, Ivan Gerasimov <
>> ivan.gerasimov at oracle.com>
>> wrote:
>>
>> >
>> > Benchmark Mode Cnt Score Error
>> Units
>> > MyBenchmark.testMethod_1 thrpt 60 1132911599.680 ± 42375177.640
>> ops/s
>> > MyBenchmark.testMethod_2 thrpt 60 813737659.576 ± 14226427.823
>> ops/s
>> > MyBenchmark.testMethod_3 thrpt 60 810406621.145 ± 12316864.045
>> ops/s
>> >
>> > The plain old ||-combined check was faster in this round.
>> > Some other tests showed different results.
>> > The speed seems to depend on the scope of the checked variables and
>> > complexity of the expressions to calculate.
>> > However, I still don't have a clear understanding of all the aspects we
>> > need to pay attention to when doing such optimizations.
>> >
>>
>> I'm not sure, but the only thing that could explain such a huge
>> performance
>> gap is that hotspot was able to determine at jit time that some of the
>> comparisons did not need to be performed at all. If true, is this
>> cheating
>> or not? (you could retry with -Xint) One of the ideas is to separate hot
>> and cold code (hotspot does not yet split code inside a single method) so
>> that hotspot is more likely to inline, so that hotspot is more likely to
>> optimize, and optimizing beginIndex < 0 away entirely is much easier than
>> my more complex expression. So yeah, I could be persuaded that keeping
>> beginIndex < 0 as an independent expression likely to be eliminated.
>> Micro-optimizing is hard, but for the very core of the platform, important
>> (more than readability).
>>
>> One of these days I have to learn how to write a jmh benchmark.
>>
>
>
More information about the core-libs-dev
mailing list