8058779: Faster implementation of String.replace(CharSequence, CharSequence)

Xueming Shen xueming.shen at oracle.com
Wed May 27 15:44:39 UTC 2015


You might want to use directly sb.append.(char[], off, len), instead of 
append(str, off, len)/append(str).
Btw, is the target.lenght==0 the popular use case/scenario? Just wonder 
why do we need a special case
here for it.

thanks,
-Sherman

On 5/27/15 8:29 AM, Ivan Gerasimov wrote:
> For completeness, here's another webrev, which uses StringBuilder:
> http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/
>
> Its performance is somewhere in between the current implementation and 
> the array-based implementation:
> MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970  ops/s
>
> Memory efficiency is less then the one of array-based implementation:
> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
> java version "1.9.0-internal"
> Java(TM) SE Runtime Environment (build 
> 1.9.0-internal-igerasim_2015_05_27_14_47-b00)
> Java HotSpot(TM) 64-Bit Server VM (build 
> 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)
>
> 25) 100663296
>
> real    0m2.585s
> user    0m1.875s
> sys    0m0.887s
>
> Sincerely yours,
> Ivan
>
> On 27.05.2015 2:38, Xueming Shen wrote:
>> Ivan,
>>
>> It might be worth trying String.index + StringBuilder, instead of 
>> writing/handling everything yourself.
>> Yes, it inevitably adds an arraycopy at the end to convert the 
>> StrinbBuilder to String, but it might have
>> better balance between performance and code complexity. The regex is 
>> probably a little heavy for
>> literal string replacement, but StringBuilder should not be that bad ...
>>
>> -Sherman
>>
>> On 5/26/15 4:11 PM, Ivan Gerasimov wrote:
>>> I updated the webrev:
>>> http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/
>>>
>>> In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
>>> with Integer.MAX_VALUE, which seems to be more accurate here.
>>>
>>> And  I want to add that this proposed implementation is not only 
>>> faster, but also more memory efficient.
>>> The following simple stress-test shows that the proposed version is 
>>> able to handle twice larger strings, comparing to the current 
>>> implementation.
>>>
>>> ----------------------------------------
>>> public class C {
>>>     public static void main(String[] args) throws Throwable {
>>>         String s = "string";
>>>         for (int i = 1; i < Integer.MAX_VALUE; ++i) {
>>>             try {
>>>                 s = s.replace("string", "stringstring");
>>>             } catch (OutOfMemoryError o) {
>>>                 System.out.println(i + ") " + s.length());
>>>                 break;
>>>             }
>>>         }
>>>     }
>>> }
>>> ----------------------------------------
>>>
>>> $ time ~/java9/jdk/bin/java -showversion -Xmx1g C
>>> java version "1.9.0-ea"
>>> Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)
>>>
>>> 25) 100663296
>>>
>>> real    0m4.525s
>>> user    0m4.402s
>>> sys    0m1.189s
>>>
>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>> java version "1.9.0-internal"
>>> Java(TM) SE Runtime Environment (build 
>>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00)
>>> Java HotSpot(TM) 64-Bit Server VM (build 
>>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)
>>>
>>> 26) 201326592
>>>
>>> real    0m2.139s
>>> user    0m1.960s
>>> sys    0m0.461s
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>> On 24.05.2015 23:17, Ivan Gerasimov wrote:
>>>> Hello everybody!
>>>>
>>>> I know many people here like it when the performance is getting 
>>>> better.
>>>> It was suggested to make the literal variant of String.replace() 
>>>> faster.
>>>> Currently, this method is implemented as a few calls to regexp API, 
>>>> so that the whole implementation takes only two lines of code.
>>>>
>>>> I've created two versions of the fix.
>>>> In the first one, we scan the string and store indices of the found 
>>>> substrings in an array.
>>>> Then, we allocate the precisely sized char array and fill it it.
>>>> The case with the empty target has to be handled separately.
>>>>
>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/
>>>>
>>>> The second variant is much less verbose, however it's less 
>>>> efficient too.
>>>> Here the StringJoiner is used as an intermediate storage.
>>>>
>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
>>>>
>>>>
>>>> Here are the micro-benchmark results (in a string of ~300 chars do 
>>>> ~15 replacements).
>>>> 0) Baseline
>>>> MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s
>>>>
>>>> 1) Heavy-duty +308%
>>>> MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803 ops/s
>>>>
>>>> 2) StringJoiner +190%
>>>> MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036  ops/s
>>>>
>>>>
>>>> Personally, I like my first variant better, even though it adds 
>>>> almost 300 lines of code.
>>>> But I'd like to hear what people think of it.
>>>>
>>>> Sincerely yours,
>>>> Ivan
>>>>
>>>>
>>>
>>
>>
>>
>




More information about the core-libs-dev mailing list