RFR: 8071571: Move substring of same string to slow path

Wed May 13 23:24:43 UTC 2015

By the way, this is pure speculation on my part by just looking at the
code.  To truly find out, at least say for Intel, you'd have to run these
benchmarks under a cpu event profiler and see what it tells you for IPC,
branch mispredictions, the various stalls, etc.  JMH has perfasm I believe
which you can use to run this under linux perf.

sent from my phone
On May 13, 2015 7:19 PM, "Vitaly Davidovich" <vitalyd at gmail.com> wrote:

> Yes, that should be the general rule of thumb for code targeting out of
> order chips.  The caveat is that microbenchmarks have the advantage of
> being the only (typically very small) code running on the cpu, and will get
> full use of execution resources; specifically in this case, it's very
> likely that the branch history of this code will stay in the buffer and not
> be evicted by other code, as you'd find in a more complex (non-microbench)
> scenario.  Microbenching is hard :).
>
> sent from my phone
> On May 13, 2015 7:12 PM, "Martin Buchholz" <martinrb at google.com> wrote:
>
>>
>>
>> On Wed, May 13, 2015 at 4:06 PM, Vitaly Davidovich <vitalyd at gmail.com>
>> wrote:
>>
>>> :) The branch avoiding versions may cause data dependence hazards
>>> whereas the branchy one just has branches but assuming perfectly predicted
>>> (and microbenchmarks typically are) can pipeline through.  Ivan, could you
>>> please post the asm here? Assuming you guys are interested in investigating
>>> this further.
>>>
>> there might be a new rule of thumb there: only eliminate branches if they
>> are unlikely to be predicted.
>>
>