8058779: Faster implementation of String.replace(CharSequence, CharSequence)

Ivan Gerasimov ivan.gerasimov at oracle.com
Wed May 27 17:06:16 UTC 2015


On 27.05.2015 18:44, Xueming Shen wrote:
> You might want to use directly sb.append.(char[], off, len), instead 
> of append(str, off, len)/append(str).

Ah, yes sure!  It would improve things.
I updated the webrev in place:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

However the general picture is almost the same:
performance:
Baseline:      MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s
StringBuilder: MyBenchmark.test  thrpt   40  808'033.808 ± 22152.166  ops/s
Arrays:        MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  
ops/s

Memory efficiency is still less then for array-version:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)

25) 100663296

real    0m2.429s
user    0m2.112s
sys    0m0.792s

Sincerely yours,
Ivan

> Btw, is the target.lenght==0 the popular use case/scenario? Just 
> wonder why do we need a special case
> here for it.
>
> thanks,
> -Sherman
>
> On 5/27/15 8:29 AM, Ivan Gerasimov wrote:
>> For completeness, here's another webrev, which uses StringBuilder:
>> http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/
>>
>> Its performance is somewhere in between the current implementation 
>> and the array-based implementation:
>> MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970  ops/s
>>
>> Memory efficiency is less then the one of array-based implementation:
>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>> java version "1.9.0-internal"
>> Java(TM) SE Runtime Environment (build 
>> 1.9.0-internal-igerasim_2015_05_27_14_47-b00)
>> Java HotSpot(TM) 64-Bit Server VM (build 
>> 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)
>>
>> 25) 100663296
>>
>> real    0m2.585s
>> user    0m1.875s
>> sys    0m0.887s
>>
>> Sincerely yours,
>> Ivan
>>
>> On 27.05.2015 2:38, Xueming Shen wrote:
>>> Ivan,
>>>
>>> It might be worth trying String.index + StringBuilder, instead of 
>>> writing/handling everything yourself.
>>> Yes, it inevitably adds an arraycopy at the end to convert the 
>>> StrinbBuilder to String, but it might have
>>> better balance between performance and code complexity. The regex is 
>>> probably a little heavy for
>>> literal string replacement, but StringBuilder should not be that bad 
>>> ...
>>>
>>> -Sherman
>>>
>>> On 5/26/15 4:11 PM, Ivan Gerasimov wrote:
>>>> I updated the webrev:
>>>> http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/
>>>>
>>>> In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
>>>> with Integer.MAX_VALUE, which seems to be more accurate here.
>>>>
>>>> And  I want to add that this proposed implementation is not only 
>>>> faster, but also more memory efficient.
>>>> The following simple stress-test shows that the proposed version is 
>>>> able to handle twice larger strings, comparing to the current 
>>>> implementation.
>>>>
>>>> ----------------------------------------
>>>> public class C {
>>>>     public static void main(String[] args) throws Throwable {
>>>>         String s = "string";
>>>>         for (int i = 1; i < Integer.MAX_VALUE; ++i) {
>>>>             try {
>>>>                 s = s.replace("string", "stringstring");
>>>>             } catch (OutOfMemoryError o) {
>>>>                 System.out.println(i + ") " + s.length());
>>>>                 break;
>>>>             }
>>>>         }
>>>>     }
>>>> }
>>>> ----------------------------------------
>>>>
>>>> $ time ~/java9/jdk/bin/java -showversion -Xmx1g C
>>>> java version "1.9.0-ea"
>>>> Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)
>>>>
>>>> 25) 100663296
>>>>
>>>> real    0m4.525s
>>>> user    0m4.402s
>>>> sys    0m1.189s
>>>>
>>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>>> java version "1.9.0-internal"
>>>> Java(TM) SE Runtime Environment (build 
>>>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 
>>>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)
>>>>
>>>> 26) 201326592
>>>>
>>>> real    0m2.139s
>>>> user    0m1.960s
>>>> sys    0m0.461s
>>>>
>>>> Sincerely yours,
>>>> Ivan
>>>>
>>>> On 24.05.2015 23:17, Ivan Gerasimov wrote:
>>>>> Hello everybody!
>>>>>
>>>>> I know many people here like it when the performance is getting 
>>>>> better.
>>>>> It was suggested to make the literal variant of String.replace() 
>>>>> faster.
>>>>> Currently, this method is implemented as a few calls to regexp 
>>>>> API, so that the whole implementation takes only two lines of code.
>>>>>
>>>>> I've created two versions of the fix.
>>>>> In the first one, we scan the string and store indices of the 
>>>>> found substrings in an array.
>>>>> Then, we allocate the precisely sized char array and fill it it.
>>>>> The case with the empty target has to be handled separately.
>>>>>
>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/
>>>>>
>>>>> The second variant is much less verbose, however it's less 
>>>>> efficient too.
>>>>> Here the StringJoiner is used as an intermediate storage.
>>>>>
>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
>>>>>
>>>>>
>>>>> Here are the micro-benchmark results (in a string of ~300 chars do 
>>>>> ~15 replacements).
>>>>> 0) Baseline
>>>>> MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484 ops/s
>>>>>
>>>>> 1) Heavy-duty +308%
>>>>> MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803 ops/s
>>>>>
>>>>> 2) StringJoiner +190%
>>>>> MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036 ops/s
>>>>>
>>>>>
>>>>> Personally, I like my first variant better, even though it adds 
>>>>> almost 300 lines of code.
>>>>> But I'd like to hear what people think of it.
>>>>>
>>>>> Sincerely yours,
>>>>> Ivan
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>




More information about the core-libs-dev mailing list