8058779: Faster implementation of String.replace(CharSequence, CharSequence)

Xueming Shen xueming.shen at oracle.com
Wed May 27 18:08:05 UTC 2015


targLen = max(1, tagLen);   ?

On 05/27/2015 10:59 AM, Ivan Gerasimov wrote:
>
>
> On 27.05.2015 20:40, Xueming Shen wrote:
>>
>> The .append(srepl) -> append(srep.value)
>>
>> And I would simply remove the "fastpath" for the target.length() = 0 special case ...
>> And maybe pick an appropriate initial size (maybe this.length + 16 * diff) for the sb
>> to further reduce its internal expanding ...
>>
> Sorry, I forgot to comment this.  This is not really a fastpath.
> This is a special case, which isn't handled by the general algorithm.
> We can modify the general algorithm, of course to handle empty target, but I thought it would be more clear to have a separate branch.
>
> I think your variant below would loop forever in the loop with an empty target:
> i = j + targLen; // targLen == 0
> j = indexOf(starget, i)     // j == i
>
> Sincerely yours,
> Ivan
>
>> Personally this code is straightforward and I would be happy with the 800+ vs 1000+
>> score diff.
>>
>>     public String replace(CharSequence target, CharSequence replacement) {
>>         String starget = target.toString();
>>         int targLen = starget.length();
>>         String srepl = replacement.toString();
>>
>>         int j = indexOf(starget);
>>         // special case: nothing to replace
>>         if (j<  0) {
>>             return this;
>>         }
>>
>>         final char[] value = this.value;
>>         StringBuilder sb = new StringBuilder();
>>         int i = 0;
>>         do {
>>             sb.append(value, i, j - i)
>>                 .append(srepl.value);
>>             i = j + targLen;
>>         } while ((j = indexOf(starget, i))>  0);
>>
>>         return sb.append(this, i, length()).toString();
>>     }
>>
>> -Sherman
>>
>>
>> On 05/27/2015 10:06 AM, Ivan Gerasimov wrote:
>>>
>>> On 27.05.2015 18:44, Xueming Shen wrote:
>>>> You might want to use directly sb.append.(char[], off, len), instead of append(str, off, len)/append(str).
>>>
>>> Ah, yes sure!  It would improve things.
>>> I updated the webrev in place:
>>> http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/
>>>
>>> However the general picture is almost the same:
>>> performance:
>>> Baseline:      MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s
>>> StringBuilder: MyBenchmark.test  thrpt   40  808'033.808 ± 22152.166  ops/s
>>> Arrays:        MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  ops/s
>>>
>>> Memory efficiency is still less then for array-version:
>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>> java version "1.9.0-internal"
>>> Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00)
>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)
>>>
>>> 25) 100663296
>>>
>>> real    0m2.429s
>>> user    0m2.112s
>>> sys    0m0.792s
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>>> Btw, is the target.lenght==0 the popular use case/scenario? Just wonder why do we need a special case
>>>> here for it.
>>>>
>>>> thanks,
>>>> -Sherman
>>>>
>>>> On 5/27/15 8:29 AM, Ivan Gerasimov wrote:
>>>>> For completeness, here's another webrev, which uses StringBuilder:
>>>>> http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/
>>>>>
>>>>> Its performance is somewhere in between the current implementation and the array-based implementation:
>>>>> MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970  ops/s
>>>>>
>>>>> Memory efficiency is less then the one of array-based implementation:
>>>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>>>> java version "1.9.0-internal"
>>>>> Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00)
>>>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)
>>>>>
>>>>> 25) 100663296
>>>>>
>>>>> real    0m2.585s
>>>>> user    0m1.875s
>>>>> sys    0m0.887s
>>>>>
>>>>> Sincerely yours,
>>>>> Ivan
>>>>>
>>>>> On 27.05.2015 2:38, Xueming Shen wrote:
>>>>>> Ivan,
>>>>>>
>>>>>> It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself.
>>>>>> Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have
>>>>>> better balance between performance and code complexity. The regex is probably a little heavy for
>>>>>> literal string replacement, but StringBuilder should not be that bad ...
>>>>>>
>>>>>> -Sherman
>>>>>>
>>>>>> On 5/26/15 4:11 PM, Ivan Gerasimov wrote:
>>>>>>> I updated the webrev:
>>>>>>> http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/
>>>>>>>
>>>>>>> In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here.
>>>>>>>
>>>>>>> And  I want to add that this proposed implementation is not only faster, but also more memory efficient.
>>>>>>> The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation.
>>>>>>>
>>>>>>> ----------------------------------------
>>>>>>> public class C {
>>>>>>>     public static void main(String[] args) throws Throwable {
>>>>>>>         String s = "string";
>>>>>>>         for (int i = 1; i < Integer.MAX_VALUE; ++i) {
>>>>>>>             try {
>>>>>>>                 s = s.replace("string", "stringstring");
>>>>>>>             } catch (OutOfMemoryError o) {
>>>>>>>                 System.out.println(i + ") " + s.length());
>>>>>>>                 break;
>>>>>>>             }
>>>>>>>         }
>>>>>>>     }
>>>>>>> }
>>>>>>> ----------------------------------------
>>>>>>>
>>>>>>> $ time ~/java9/jdk/bin/java -showversion -Xmx1g C
>>>>>>> java version "1.9.0-ea"
>>>>>>> Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
>>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)
>>>>>>>
>>>>>>> 25) 100663296
>>>>>>>
>>>>>>> real    0m4.525s
>>>>>>> user    0m4.402s
>>>>>>> sys    0m1.189s
>>>>>>>
>>>>>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>>>>>> java version "1.9.0-internal"
>>>>>>> Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00)
>>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)
>>>>>>>
>>>>>>> 26) 201326592
>>>>>>>
>>>>>>> real    0m2.139s
>>>>>>> user    0m1.960s
>>>>>>> sys    0m0.461s
>>>>>>>
>>>>>>> Sincerely yours,
>>>>>>> Ivan
>>>>>>>
>>>>>>> On 24.05.2015 23:17, Ivan Gerasimov wrote:
>>>>>>>> Hello everybody!
>>>>>>>>
>>>>>>>> I know many people here like it when the performance is getting better.
>>>>>>>> It was suggested to make the literal variant of String.replace() faster.
>>>>>>>> Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code.
>>>>>>>>
>>>>>>>> I've created two versions of the fix.
>>>>>>>> In the first one, we scan the string and store indices of the found substrings in an array.
>>>>>>>> Then, we allocate the precisely sized char array and fill it it.
>>>>>>>> The case with the empty target has to be handled separately.
>>>>>>>>
>>>>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
>>>>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/
>>>>>>>>
>>>>>>>> The second variant is much less verbose, however it's less efficient too.
>>>>>>>> Here the StringJoiner is used as an intermediate storage.
>>>>>>>>
>>>>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
>>>>>>>>
>>>>>>>>
>>>>>>>> Here are the micro-benchmark results (in a string of ~300 chars do ~15 replacements).
>>>>>>>> 0) Baseline
>>>>>>>> MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484 ops/s
>>>>>>>>
>>>>>>>> 1) Heavy-duty +308%
>>>>>>>> MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803 ops/s
>>>>>>>>
>>>>>>>> 2) StringJoiner +190%
>>>>>>>> MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036 ops/s
>>>>>>>>
>>>>>>>>
>>>>>>>> Personally, I like my first variant better, even though it adds almost 300 lines of code.
>>>>>>>> But I'd like to hear what people think of it.
>>>>>>>>
>>>>>>>> Sincerely yours,
>>>>>>>> Ivan
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>




More information about the core-libs-dev mailing list