8058779: Faster implementation of String.replace(CharSequence, CharSequence)

Ivan Gerasimov ivan.gerasimov at oracle.com
Wed May 27 14:27:28 UTC 2015


Thank you Peter for looking at this!

On 27.05.2015 16:15, Peter Levart wrote:
> Hi Ivan,
>
> How does this implementation compare to your's two regarding speed:
>
> http://cr.openjdk.java.net/~plevart/jdk9-dev/String.replace/webrev.01/
>

You need to check for possible overflow here:
2287             resLen += repLen + ss.length();

> It is not much shorter than your's first, but more object oriented ;-)
>
Well, the main difference is that you store the positions of substrings 
in a linked-list instead of an array.
But arrays are more efficient.

On small data the performance is almost the same as for the 'array' version:
MyBenchmark.test  thrpt   40  1'046'137.437 ± 13961.121  ops/s

But on huge data the performance is much worse, comparing to even 
current implementation.
Here are the results of running the stress test, I had posted earlier:
-------------------------
time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)

25) 100663296

real    0m12.945s
user    0m15.670s
sys    0m7.189s
-------------------------

It took 6 times more time to complete, comparing to the current 
implementation, which uses regexp engine.

Sincerely yours,
Ivan

> Regards, Peter
>
> On 05/27/2015 11:36 AM, Ivan Gerasimov wrote:
>> Hi Sherman!
>>
>> Please take a look at my other webrev, that I provided in the first 
>> message.
>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
>>
>> I used StringJoiner there, which in some aspects seems to fit better 
>> here, comparing to StringBuilder.
>> It does reduce the code, but of course runs slower.
>> In that version every part of the source string had to be converted 
>> to a separate String, which add another redundant copying.
>>
>> I still prefer the version, which deals with arrays.  I don't think 
>> it's too complex.
>> It surely adds some complexity to the original code, but it's only 70 
>> lines of code in total.
>> Everything else are comments and empty lines.
>>
>> Sincerely yours,
>> Ivan
>>
>> On 27.05.2015 2:38, Xueming Shen wrote:
>>> Ivan,
>>>
>>> It might be worth trying String.index + StringBuilder, instead of 
>>> writing/handling everything yourself.
>>> Yes, it inevitably adds an arraycopy at the end to convert the 
>>> StrinbBuilder to String, but it might have
>>> better balance between performance and code complexity. The regex is 
>>> probably a little heavy for
>>> literal string replacement, but StringBuilder should not be that bad 
>>> ...
>>>
>>> -Sherman
>>>
>>> On 5/26/15 4:11 PM, Ivan Gerasimov wrote:
>>>> I updated the webrev:
>>>> http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/
>>>>
>>>> In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
>>>> with Integer.MAX_VALUE, which seems to be more accurate here.
>>>>
>>>> And  I want to add that this proposed implementation is not only 
>>>> faster, but also more memory efficient.
>>>> The following simple stress-test shows that the proposed version is 
>>>> able to handle twice larger strings, comparing to the current 
>>>> implementation.
>>>>
>>>> ----------------------------------------
>>>> public class C {
>>>>     public static void main(String[] args) throws Throwable {
>>>>         String s = "string";
>>>>         for (int i = 1; i < Integer.MAX_VALUE; ++i) {
>>>>             try {
>>>>                 s = s.replace("string", "stringstring");
>>>>             } catch (OutOfMemoryError o) {
>>>>                 System.out.println(i + ") " + s.length());
>>>>                 break;
>>>>             }
>>>>         }
>>>>     }
>>>> }
>>>> ----------------------------------------
>>>>
>>>> $ time ~/java9/jdk/bin/java -showversion -Xmx1g C
>>>> java version "1.9.0-ea"
>>>> Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)
>>>>
>>>> 25) 100663296
>>>>
>>>> real    0m4.525s
>>>> user    0m4.402s
>>>> sys    0m1.189s
>>>>
>>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>>> java version "1.9.0-internal"
>>>> Java(TM) SE Runtime Environment (build 
>>>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 
>>>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)
>>>>
>>>> 26) 201326592
>>>>
>>>> real    0m2.139s
>>>> user    0m1.960s
>>>> sys    0m0.461s
>>>>
>>>> Sincerely yours,
>>>> Ivan
>>>>
>>>> On 24.05.2015 23:17, Ivan Gerasimov wrote:
>>>>> Hello everybody!
>>>>>
>>>>> I know many people here like it when the performance is getting 
>>>>> better.
>>>>> It was suggested to make the literal variant of String.replace() 
>>>>> faster.
>>>>> Currently, this method is implemented as a few calls to regexp 
>>>>> API, so that the whole implementation takes only two lines of code.
>>>>>
>>>>> I've created two versions of the fix.
>>>>> In the first one, we scan the string and store indices of the 
>>>>> found substrings in an array.
>>>>> Then, we allocate the precisely sized char array and fill it it.
>>>>> The case with the empty target has to be handled separately.
>>>>>
>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/
>>>>>
>>>>> The second variant is much less verbose, however it's less 
>>>>> efficient too.
>>>>> Here the StringJoiner is used as an intermediate storage.
>>>>>
>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
>>>>>
>>>>>
>>>>> Here are the micro-benchmark results (in a string of ~300 chars do 
>>>>> ~15 replacements).
>>>>> 0) Baseline
>>>>> MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484 ops/s
>>>>>
>>>>> 1) Heavy-duty +308%
>>>>> MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803 ops/s
>>>>>
>>>>> 2) StringJoiner +190%
>>>>> MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036 ops/s
>>>>>
>>>>>
>>>>> Personally, I like my first variant better, even though it adds 
>>>>> almost 300 lines of code.
>>>>> But I'd like to hear what people think of it.
>>>>>
>>>>> Sincerely yours,
>>>>> Ivan
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>




More information about the core-libs-dev mailing list