RFR: 8071571: Move substring of same string to slow path

Wed May 13 21:25:19 UTC 2015

Hi Martin!

On 13.05.2015 0:22, Martin Buchholz wrote:
> All your changes look good.  But:
>
> ---
> Not your bug, but it looks like the below should instead be:
> throw new StringIndexOutOfBoundsException(beginIndex);
> Perhaps fix in a follow-on change.
>
> 1935         if (subLen < 0) {
> 1936             throw new StringIndexOutOfBoundsException(subLen);
>
I guess, it was done so in order to make the error messages consistent 
for  str.substring(tooBigIndex)  and str.substring(tooBigIndex, str.length).
There are also other places, where StringIndexOutOfBoundsException() is 
passed a (wrong) distance, not an index.

It might be a good mini-project to make all the IndexOutOfBounds error 
messages uniform and fully informative, I'll file a RFE for this.

> ---
> We seem to have clean progress, but for substring fast-path we can do 
> a little better, I think, thus:
>
>     public String substring(int beginIndex, int endIndex) {
>         int subLen;
>         return ((beginIndex | (value.length - endIndex)) > 0
>                 && (subLen = endIndex - beginIndex) > 0)
>             ? new String(value, beginIndex, subLen)
>             : substringSlowPath(beginIndex, endIndex);
>     }
>
>     private String substringSlowPath(int beginIndex, int endIndex) {
>         if (beginIndex <= 0) {
>             if (beginIndex < 0) {
>                 throw new StringIndexOutOfBoundsException(beginIndex);
>             }
>             if (endIndex == value.length) {
>                 return this;
>             }
>         }
>         if (endIndex > value.length) {
>             throw new StringIndexOutOfBoundsException(endIndex);
>         }
>         if (endIndex == beginIndex) {
>             return "";
>         }
>         throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
>     }
>
> additional white-box tests would be required if we adopt that.
>

I think this is great, and it's lot of fun to rewrite the checks this 
way, but I have concerns.
First, I think this style would reduce readability of the code.
Second, it's not obvious to me that such combined check will always be 
faster.
I did some experimenting with jmh, and the results are unclear.
For example, please take a look at the following three tests:

@State(Scope.Benchmark)
public class MyBenchmark {
     static char[] buff = new char[12];

     @Benchmark public void testMethod_1() { substring_1(1, 10, buff); }
     void substring_1(int from, int to, char[] buff) {
         if ((from < 0) || (to > buff.length) || (from > to))
             throw new Error();
     }

     @Benchmark public void testMethod_2() { substring_2(1, 10, buff); }
     void substring_2(int from, int to, char[] buff) {
         if ((from | (buff.length - to) | (to - from)) < 0)
             throw new Error();
     }

     @Benchmark public void testMethod_3() { substring_3(1, 10, buff); }
     void substring_3(int from, int to, char[] buff) {
         int len;
         if ((from | (buff.length - to)) < 0 || (len = (to - from)) < 0)
             throw new Error();
     }
}
Benchmark                  Mode  Cnt           Score          Error Units
MyBenchmark.testMethod_1  thrpt   60  1132911599.680 ± 42375177.640 ops/s
MyBenchmark.testMethod_2  thrpt   60   813737659.576 ± 14226427.823 ops/s
MyBenchmark.testMethod_3  thrpt   60   810406621.145 ± 12316864.045 ops/s

The plain old ||-combined check was faster in this round.
Some other tests showed different results.
The speed seems to depend on the scope of the checked variables and 
complexity of the expressions to calculate.
However, I still don't have a clear understanding of all the aspects we 
need to pay attention to when doing such optimizations.

My third concern is this:  Wouldn't it be possible to implement this 
type of optimization at jvm level?
At least some conditions can automatically be combined into one.
Given the information about which execution path is expected to be fast, 
hotspot should be able to quickly perform that one combined condition 
check and move to the fast path in most situations.

Sincerely yours,
Ivan