8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Ivan Gerasimov
ivan.gerasimov at oracle.com
Wed May 27 14:27:28 UTC 2015
Thank you Peter for looking at this!
On 27.05.2015 16:15, Peter Levart wrote:
> Hi Ivan,
>
> How does this implementation compare to your's two regarding speed:
>
> http://cr.openjdk.java.net/~plevart/jdk9-dev/String.replace/webrev.01/
>
You need to check for possible overflow here:
2287 resLen += repLen + ss.length();
> It is not much shorter than your's first, but more object oriented ;-)
>
Well, the main difference is that you store the positions of substrings
in a linked-list instead of an array.
But arrays are more efficient.
On small data the performance is almost the same as for the 'array' version:
MyBenchmark.test thrpt 40 1'046'137.437 ± 13961.121 ops/s
But on huge data the performance is much worse, comparing to even
current implementation.
Here are the results of running the stress test, I had posted earlier:
-------------------------
time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)
25) 100663296
real 0m12.945s
user 0m15.670s
sys 0m7.189s
-------------------------
It took 6 times more time to complete, comparing to the current
implementation, which uses regexp engine.
Sincerely yours,
Ivan
> Regards, Peter
>
> On 05/27/2015 11:36 AM, Ivan Gerasimov wrote:
>> Hi Sherman!
>>
>> Please take a look at my other webrev, that I provided in the first
>> message.
>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
>>
>> I used StringJoiner there, which in some aspects seems to fit better
>> here, comparing to StringBuilder.
>> It does reduce the code, but of course runs slower.
>> In that version every part of the source string had to be converted
>> to a separate String, which add another redundant copying.
>>
>> I still prefer the version, which deals with arrays. I don't think
>> it's too complex.
>> It surely adds some complexity to the original code, but it's only 70
>> lines of code in total.
>> Everything else are comments and empty lines.
>>
>> Sincerely yours,
>> Ivan
>>
>> On 27.05.2015 2:38, Xueming Shen wrote:
>>> Ivan,
>>>
>>> It might be worth trying String.index + StringBuilder, instead of
>>> writing/handling everything yourself.
>>> Yes, it inevitably adds an arraycopy at the end to convert the
>>> StrinbBuilder to String, but it might have
>>> better balance between performance and code complexity. The regex is
>>> probably a little heavy for
>>> literal string replacement, but StringBuilder should not be that bad
>>> ...
>>>
>>> -Sherman
>>>
>>> On 5/26/15 4:11 PM, Ivan Gerasimov wrote:
>>>> I updated the webrev:
>>>> http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/
>>>>
>>>> In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE
>>>> with Integer.MAX_VALUE, which seems to be more accurate here.
>>>>
>>>> And I want to add that this proposed implementation is not only
>>>> faster, but also more memory efficient.
>>>> The following simple stress-test shows that the proposed version is
>>>> able to handle twice larger strings, comparing to the current
>>>> implementation.
>>>>
>>>> ----------------------------------------
>>>> public class C {
>>>> public static void main(String[] args) throws Throwable {
>>>> String s = "string";
>>>> for (int i = 1; i < Integer.MAX_VALUE; ++i) {
>>>> try {
>>>> s = s.replace("string", "stringstring");
>>>> } catch (OutOfMemoryError o) {
>>>> System.out.println(i + ") " + s.length());
>>>> break;
>>>> }
>>>> }
>>>> }
>>>> }
>>>> ----------------------------------------
>>>>
>>>> $ time ~/java9/jdk/bin/java -showversion -Xmx1g C
>>>> java version "1.9.0-ea"
>>>> Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)
>>>>
>>>> 25) 100663296
>>>>
>>>> real 0m4.525s
>>>> user 0m4.402s
>>>> sys 0m1.189s
>>>>
>>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>>> java version "1.9.0-internal"
>>>> Java(TM) SE Runtime Environment (build
>>>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00)
>>>> Java HotSpot(TM) 64-Bit Server VM (build
>>>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)
>>>>
>>>> 26) 201326592
>>>>
>>>> real 0m2.139s
>>>> user 0m1.960s
>>>> sys 0m0.461s
>>>>
>>>> Sincerely yours,
>>>> Ivan
>>>>
>>>> On 24.05.2015 23:17, Ivan Gerasimov wrote:
>>>>> Hello everybody!
>>>>>
>>>>> I know many people here like it when the performance is getting
>>>>> better.
>>>>> It was suggested to make the literal variant of String.replace()
>>>>> faster.
>>>>> Currently, this method is implemented as a few calls to regexp
>>>>> API, so that the whole implementation takes only two lines of code.
>>>>>
>>>>> I've created two versions of the fix.
>>>>> In the first one, we scan the string and store indices of the
>>>>> found substrings in an array.
>>>>> Then, we allocate the precisely sized char array and fill it it.
>>>>> The case with the empty target has to be handled separately.
>>>>>
>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/
>>>>>
>>>>> The second variant is much less verbose, however it's less
>>>>> efficient too.
>>>>> Here the StringJoiner is used as an intermediate storage.
>>>>>
>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
>>>>>
>>>>>
>>>>> Here are the micro-benchmark results (in a string of ~300 chars do
>>>>> ~15 replacements).
>>>>> 0) Baseline
>>>>> MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s
>>>>>
>>>>> 1) Heavy-duty +308%
>>>>> MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s
>>>>>
>>>>> 2) StringJoiner +190%
>>>>> MyBenchmark.test thrpt 40 746'000.629 ± 15387.036 ops/s
>>>>>
>>>>>
>>>>> Personally, I like my first variant better, even though it adds
>>>>> almost 300 lines of code.
>>>>> But I'd like to hear what people think of it.
>>>>>
>>>>> Sincerely yours,
>>>>> Ivan
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
More information about the core-libs-dev
mailing list