8058779: Faster implementation of String.replace(CharSequence, CharSequence)

Ivan Gerasimov ivan.gerasimov at oracle.com
Wed May 27 17:59:05 UTC 2015



On 27.05.2015 20:40, Xueming Shen wrote:
>
> The .append(srepl) -> append(srep.value)
>
> And I would simply remove the "fastpath" for the target.length() = 0 
> special case ...
> And maybe pick an appropriate initial size (maybe this.length + 16 * 
> diff) for the sb
> to further reduce its internal expanding ...
>
Sorry, I forgot to comment this.  This is not really a fastpath.
This is a special case, which isn't handled by the general algorithm.
We can modify the general algorithm, of course to handle empty target, 
but I thought it would be more clear to have a separate branch.

I think your variant below would loop forever in the loop with an empty 
target:
i = j + targLen; // targLen == 0
j = indexOf(starget, i)     // j == i

Sincerely yours,
Ivan

> Personally this code is straightforward and I would be happy with the 
> 800+ vs 1000+
> score diff.
>
>     public String replace(CharSequence target, CharSequence 
> replacement) {
>         String starget = target.toString();
>         int targLen = starget.length();
>         String srepl = replacement.toString();
>
>         int j = indexOf(starget);
>         // special case: nothing to replace
>         if (j<  0) {
>             return this;
>         }
>
>         final char[] value = this.value;
>         StringBuilder sb = new StringBuilder();
>         int i = 0;
>         do {
>             sb.append(value, i, j - i)
>                 .append(srepl.value);
>             i = j + targLen;
>         } while ((j = indexOf(starget, i))>  0);
>
>         return sb.append(this, i, length()).toString();
>     }
>
> -Sherman
>
>
> On 05/27/2015 10:06 AM, Ivan Gerasimov wrote:
>>
>> On 27.05.2015 18:44, Xueming Shen wrote:
>>> You might want to use directly sb.append.(char[], off, len), instead 
>>> of append(str, off, len)/append(str).
>>
>> Ah, yes sure!  It would improve things.
>> I updated the webrev in place:
>> http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/
>>
>> However the general picture is almost the same:
>> performance:
>> Baseline:      MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  
>> ops/s
>> StringBuilder: MyBenchmark.test  thrpt   40  808'033.808 ± 22152.166  
>> ops/s
>> Arrays:        MyBenchmark.test  thrpt   40  1'049'235.602 ± 
>> 15501.803  ops/s
>>
>> Memory efficiency is still less then for array-version:
>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>> java version "1.9.0-internal"
>> Java(TM) SE Runtime Environment (build 
>> 1.9.0-internal-igerasim_2015_05_27_14_47-b00)
>> Java HotSpot(TM) 64-Bit Server VM (build 
>> 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)
>>
>> 25) 100663296
>>
>> real    0m2.429s
>> user    0m2.112s
>> sys    0m0.792s
>>
>> Sincerely yours,
>> Ivan
>>
>>> Btw, is the target.lenght==0 the popular use case/scenario? Just 
>>> wonder why do we need a special case
>>> here for it.
>>>
>>> thanks,
>>> -Sherman
>>>
>>> On 5/27/15 8:29 AM, Ivan Gerasimov wrote:
>>>> For completeness, here's another webrev, which uses StringBuilder:
>>>> http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/
>>>>
>>>> Its performance is somewhere in between the current implementation 
>>>> and the array-based implementation:
>>>> MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970  ops/s
>>>>
>>>> Memory efficiency is less then the one of array-based implementation:
>>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>>> java version "1.9.0-internal"
>>>> Java(TM) SE Runtime Environment (build 
>>>> 1.9.0-internal-igerasim_2015_05_27_14_47-b00)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 
>>>> 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)
>>>>
>>>> 25) 100663296
>>>>
>>>> real    0m2.585s
>>>> user    0m1.875s
>>>> sys    0m0.887s
>>>>
>>>> Sincerely yours,
>>>> Ivan
>>>>
>>>> On 27.05.2015 2:38, Xueming Shen wrote:
>>>>> Ivan,
>>>>>
>>>>> It might be worth trying String.index + StringBuilder, instead of 
>>>>> writing/handling everything yourself.
>>>>> Yes, it inevitably adds an arraycopy at the end to convert the 
>>>>> StrinbBuilder to String, but it might have
>>>>> better balance between performance and code complexity. The regex 
>>>>> is probably a little heavy for
>>>>> literal string replacement, but StringBuilder should not be that 
>>>>> bad ...
>>>>>
>>>>> -Sherman
>>>>>
>>>>> On 5/26/15 4:11 PM, Ivan Gerasimov wrote:
>>>>>> I updated the webrev:
>>>>>> http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/
>>>>>>
>>>>>> In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
>>>>>> with Integer.MAX_VALUE, which seems to be more accurate here.
>>>>>>
>>>>>> And  I want to add that this proposed implementation is not only 
>>>>>> faster, but also more memory efficient.
>>>>>> The following simple stress-test shows that the proposed version 
>>>>>> is able to handle twice larger strings, comparing to the current 
>>>>>> implementation.
>>>>>>
>>>>>> ----------------------------------------
>>>>>> public class C {
>>>>>>     public static void main(String[] args) throws Throwable {
>>>>>>         String s = "string";
>>>>>>         for (int i = 1; i < Integer.MAX_VALUE; ++i) {
>>>>>>             try {
>>>>>>                 s = s.replace("string", "stringstring");
>>>>>>             } catch (OutOfMemoryError o) {
>>>>>>                 System.out.println(i + ") " + s.length());
>>>>>>                 break;
>>>>>>             }
>>>>>>         }
>>>>>>     }
>>>>>> }
>>>>>> ----------------------------------------
>>>>>>
>>>>>> $ time ~/java9/jdk/bin/java -showversion -Xmx1g C
>>>>>> java version "1.9.0-ea"
>>>>>> Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)
>>>>>>
>>>>>> 25) 100663296
>>>>>>
>>>>>> real    0m4.525s
>>>>>> user    0m4.402s
>>>>>> sys    0m1.189s
>>>>>>
>>>>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>>>>> java version "1.9.0-internal"
>>>>>> Java(TM) SE Runtime Environment (build 
>>>>>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00)
>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 
>>>>>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)
>>>>>>
>>>>>> 26) 201326592
>>>>>>
>>>>>> real    0m2.139s
>>>>>> user    0m1.960s
>>>>>> sys    0m0.461s
>>>>>>
>>>>>> Sincerely yours,
>>>>>> Ivan
>>>>>>
>>>>>> On 24.05.2015 23:17, Ivan Gerasimov wrote:
>>>>>>> Hello everybody!
>>>>>>>
>>>>>>> I know many people here like it when the performance is getting 
>>>>>>> better.
>>>>>>> It was suggested to make the literal variant of String.replace() 
>>>>>>> faster.
>>>>>>> Currently, this method is implemented as a few calls to regexp 
>>>>>>> API, so that the whole implementation takes only two lines of code.
>>>>>>>
>>>>>>> I've created two versions of the fix.
>>>>>>> In the first one, we scan the string and store indices of the 
>>>>>>> found substrings in an array.
>>>>>>> Then, we allocate the precisely sized char array and fill it it.
>>>>>>> The case with the empty target has to be handled separately.
>>>>>>>
>>>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
>>>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/
>>>>>>>
>>>>>>> The second variant is much less verbose, however it's less 
>>>>>>> efficient too.
>>>>>>> Here the StringJoiner is used as an intermediate storage.
>>>>>>>
>>>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
>>>>>>>
>>>>>>>
>>>>>>> Here are the micro-benchmark results (in a string of ~300 chars 
>>>>>>> do ~15 replacements).
>>>>>>> 0) Baseline
>>>>>>> MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484 ops/s
>>>>>>>
>>>>>>> 1) Heavy-duty +308%
>>>>>>> MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803 ops/s
>>>>>>>
>>>>>>> 2) StringJoiner +190%
>>>>>>> MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036 ops/s
>>>>>>>
>>>>>>>
>>>>>>> Personally, I like my first variant better, even though it adds 
>>>>>>> almost 300 lines of code.
>>>>>>> But I'd like to hear what people think of it.
>>>>>>>
>>>>>>> Sincerely yours,
>>>>>>> Ivan
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>




More information about the core-libs-dev mailing list