8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Ivan Gerasimov
ivan.gerasimov at oracle.com
Wed May 27 17:59:05 UTC 2015
On 27.05.2015 20:40, Xueming Shen wrote:
>
> The .append(srepl) -> append(srep.value)
>
> And I would simply remove the "fastpath" for the target.length() = 0
> special case ...
> And maybe pick an appropriate initial size (maybe this.length + 16 *
> diff) for the sb
> to further reduce its internal expanding ...
>
Sorry, I forgot to comment this. This is not really a fastpath.
This is a special case, which isn't handled by the general algorithm.
We can modify the general algorithm, of course to handle empty target,
but I thought it would be more clear to have a separate branch.
I think your variant below would loop forever in the loop with an empty
target:
i = j + targLen; // targLen == 0
j = indexOf(starget, i) // j == i
Sincerely yours,
Ivan
> Personally this code is straightforward and I would be happy with the
> 800+ vs 1000+
> score diff.
>
> public String replace(CharSequence target, CharSequence
> replacement) {
> String starget = target.toString();
> int targLen = starget.length();
> String srepl = replacement.toString();
>
> int j = indexOf(starget);
> // special case: nothing to replace
> if (j< 0) {
> return this;
> }
>
> final char[] value = this.value;
> StringBuilder sb = new StringBuilder();
> int i = 0;
> do {
> sb.append(value, i, j - i)
> .append(srepl.value);
> i = j + targLen;
> } while ((j = indexOf(starget, i))> 0);
>
> return sb.append(this, i, length()).toString();
> }
>
> -Sherman
>
>
> On 05/27/2015 10:06 AM, Ivan Gerasimov wrote:
>>
>> On 27.05.2015 18:44, Xueming Shen wrote:
>>> You might want to use directly sb.append.(char[], off, len), instead
>>> of append(str, off, len)/append(str).
>>
>> Ah, yes sure! It would improve things.
>> I updated the webrev in place:
>> http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/
>>
>> However the general picture is almost the same:
>> performance:
>> Baseline: MyBenchmark.test thrpt 40 257'051.948 ± 4537.484
>> ops/s
>> StringBuilder: MyBenchmark.test thrpt 40 808'033.808 ± 22152.166
>> ops/s
>> Arrays: MyBenchmark.test thrpt 40 1'049'235.602 ±
>> 15501.803 ops/s
>>
>> Memory efficiency is still less then for array-version:
>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>> java version "1.9.0-internal"
>> Java(TM) SE Runtime Environment (build
>> 1.9.0-internal-igerasim_2015_05_27_14_47-b00)
>> Java HotSpot(TM) 64-Bit Server VM (build
>> 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)
>>
>> 25) 100663296
>>
>> real 0m2.429s
>> user 0m2.112s
>> sys 0m0.792s
>>
>> Sincerely yours,
>> Ivan
>>
>>> Btw, is the target.lenght==0 the popular use case/scenario? Just
>>> wonder why do we need a special case
>>> here for it.
>>>
>>> thanks,
>>> -Sherman
>>>
>>> On 5/27/15 8:29 AM, Ivan Gerasimov wrote:
>>>> For completeness, here's another webrev, which uses StringBuilder:
>>>> http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/
>>>>
>>>> Its performance is somewhere in between the current implementation
>>>> and the array-based implementation:
>>>> MyBenchmark.test thrpt 40 796'059.192 ± 12455.970 ops/s
>>>>
>>>> Memory efficiency is less then the one of array-based implementation:
>>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>>> java version "1.9.0-internal"
>>>> Java(TM) SE Runtime Environment (build
>>>> 1.9.0-internal-igerasim_2015_05_27_14_47-b00)
>>>> Java HotSpot(TM) 64-Bit Server VM (build
>>>> 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)
>>>>
>>>> 25) 100663296
>>>>
>>>> real 0m2.585s
>>>> user 0m1.875s
>>>> sys 0m0.887s
>>>>
>>>> Sincerely yours,
>>>> Ivan
>>>>
>>>> On 27.05.2015 2:38, Xueming Shen wrote:
>>>>> Ivan,
>>>>>
>>>>> It might be worth trying String.index + StringBuilder, instead of
>>>>> writing/handling everything yourself.
>>>>> Yes, it inevitably adds an arraycopy at the end to convert the
>>>>> StrinbBuilder to String, but it might have
>>>>> better balance between performance and code complexity. The regex
>>>>> is probably a little heavy for
>>>>> literal string replacement, but StringBuilder should not be that
>>>>> bad ...
>>>>>
>>>>> -Sherman
>>>>>
>>>>> On 5/26/15 4:11 PM, Ivan Gerasimov wrote:
>>>>>> I updated the webrev:
>>>>>> http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/
>>>>>>
>>>>>> In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE
>>>>>> with Integer.MAX_VALUE, which seems to be more accurate here.
>>>>>>
>>>>>> And I want to add that this proposed implementation is not only
>>>>>> faster, but also more memory efficient.
>>>>>> The following simple stress-test shows that the proposed version
>>>>>> is able to handle twice larger strings, comparing to the current
>>>>>> implementation.
>>>>>>
>>>>>> ----------------------------------------
>>>>>> public class C {
>>>>>> public static void main(String[] args) throws Throwable {
>>>>>> String s = "string";
>>>>>> for (int i = 1; i < Integer.MAX_VALUE; ++i) {
>>>>>> try {
>>>>>> s = s.replace("string", "stringstring");
>>>>>> } catch (OutOfMemoryError o) {
>>>>>> System.out.println(i + ") " + s.length());
>>>>>> break;
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>> ----------------------------------------
>>>>>>
>>>>>> $ time ~/java9/jdk/bin/java -showversion -Xmx1g C
>>>>>> java version "1.9.0-ea"
>>>>>> Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)
>>>>>>
>>>>>> 25) 100663296
>>>>>>
>>>>>> real 0m4.525s
>>>>>> user 0m4.402s
>>>>>> sys 0m1.189s
>>>>>>
>>>>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>>>>> java version "1.9.0-internal"
>>>>>> Java(TM) SE Runtime Environment (build
>>>>>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00)
>>>>>> Java HotSpot(TM) 64-Bit Server VM (build
>>>>>> 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)
>>>>>>
>>>>>> 26) 201326592
>>>>>>
>>>>>> real 0m2.139s
>>>>>> user 0m1.960s
>>>>>> sys 0m0.461s
>>>>>>
>>>>>> Sincerely yours,
>>>>>> Ivan
>>>>>>
>>>>>> On 24.05.2015 23:17, Ivan Gerasimov wrote:
>>>>>>> Hello everybody!
>>>>>>>
>>>>>>> I know many people here like it when the performance is getting
>>>>>>> better.
>>>>>>> It was suggested to make the literal variant of String.replace()
>>>>>>> faster.
>>>>>>> Currently, this method is implemented as a few calls to regexp
>>>>>>> API, so that the whole implementation takes only two lines of code.
>>>>>>>
>>>>>>> I've created two versions of the fix.
>>>>>>> In the first one, we scan the string and store indices of the
>>>>>>> found substrings in an array.
>>>>>>> Then, we allocate the precisely sized char array and fill it it.
>>>>>>> The case with the empty target has to be handled separately.
>>>>>>>
>>>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
>>>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/
>>>>>>>
>>>>>>> The second variant is much less verbose, however it's less
>>>>>>> efficient too.
>>>>>>> Here the StringJoiner is used as an intermediate storage.
>>>>>>>
>>>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
>>>>>>>
>>>>>>>
>>>>>>> Here are the micro-benchmark results (in a string of ~300 chars
>>>>>>> do ~15 replacements).
>>>>>>> 0) Baseline
>>>>>>> MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s
>>>>>>>
>>>>>>> 1) Heavy-duty +308%
>>>>>>> MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s
>>>>>>>
>>>>>>> 2) StringJoiner +190%
>>>>>>> MyBenchmark.test thrpt 40 746'000.629 ± 15387.036 ops/s
>>>>>>>
>>>>>>>
>>>>>>> Personally, I like my first variant better, even though it adds
>>>>>>> almost 300 lines of code.
>>>>>>> But I'd like to hear what people think of it.
>>>>>>>
>>>>>>> Sincerely yours,
>>>>>>> Ivan
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
More information about the core-libs-dev
mailing list