8058779: Faster implementation of String.replace(CharSequence, CharSequence)

Ivan Gerasimov ivan.gerasimov at oracle.com
Wed May 27 15:29:26 UTC 2015


For completeness, here's another webrev, which uses StringBuilder:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

Its performance is somewhere in between the current implementation and 
the array-based implementation:
MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970  ops/s

Memory efficiency is less then the one of array-based implementation:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)

25) 100663296

real    0m2.585s
user    0m1.875s
sys    0m0.887s

Sincerely yours,
Ivan

On 27.05.2015 2:38, Xueming Shen wrote:
> Ivan,
>
> It might be worth trying String.index + StringBuilder, instead of 
> writing/handling everything yourself.
> Yes, it inevitably adds an arraycopy at the end to convert the 
> StrinbBuilder to String, but it might have
> better balance between performance and code complexity. The regex is 
> probably a little heavy for
> literal string replacement, but StringBuilder should not be that bad ...
>
> -Sherman
>
> On 5/26/15 4:11 PM, Ivan Gerasimov wrote:
>> I updated the webrev:
>> http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/
>>
>> In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
>> with Integer.MAX_VALUE, which seems to be more accurate here.
>>
>> And  I want to add that this proposed implementation is not only 
>> faster, but also more memory efficient.
>> The following simple stress-test shows that the proposed version is 
>> able to handle twice larger strings, comparing to the current 
>> implementation.
>>
>> ----------------------------------------
>> public class C {
>>     public static void main(String[] args) throws Throwable {
>>         String s = "string";
>>         for (int i = 1; i < Integer.MAX_VALUE; ++i) {
>>             try {
>>                 s = s.replace("string", "stringstring");
>>             } catch (OutOfMemoryError o) {
>>                 System.out.println(i + ") " + s.length());
>>                 break;
>>             }
>>         }
>>     }
>> }
>> ----------------------------------------
>>
>> $ time ~/java9/jdk/bin/java -showversion -Xmx1g C
>> java version "1.9.0-ea"
>> Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)
>>
>> 25) 100663296
>>
>> real    0m4.525s
>> user    0m4.402s
>> sys    0m1.189s
>>
>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>> java version "1.9.0-internal"
>> Java(TM) SE Runtime Environment (build 
>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00)
>> Java HotSpot(TM) 64-Bit Server VM (build 
>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)
>>
>> 26) 201326592
>>
>> real    0m2.139s
>> user    0m1.960s
>> sys    0m0.461s
>>
>> Sincerely yours,
>> Ivan
>>
>> On 24.05.2015 23:17, Ivan Gerasimov wrote:
>>> Hello everybody!
>>>
>>> I know many people here like it when the performance is getting better.
>>> It was suggested to make the literal variant of String.replace() 
>>> faster.
>>> Currently, this method is implemented as a few calls to regexp API, 
>>> so that the whole implementation takes only two lines of code.
>>>
>>> I've created two versions of the fix.
>>> In the first one, we scan the string and store indices of the found 
>>> substrings in an array.
>>> Then, we allocate the precisely sized char array and fill it it.
>>> The case with the empty target has to be handled separately.
>>>
>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/
>>>
>>> The second variant is much less verbose, however it's less efficient 
>>> too.
>>> Here the StringJoiner is used as an intermediate storage.
>>>
>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
>>>
>>>
>>> Here are the micro-benchmark results (in a string of ~300 chars do 
>>> ~15 replacements).
>>> 0) Baseline
>>> MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s
>>>
>>> 1) Heavy-duty +308%
>>> MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  ops/s
>>>
>>> 2) StringJoiner +190%
>>> MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036  ops/s
>>>
>>>
>>> Personally, I like my first variant better, even though it adds 
>>> almost 300 lines of code.
>>> But I'd like to hear what people think of it.
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>>
>>
>
>
>




More information about the core-libs-dev mailing list