8058779: Faster implementation of String.replace(CharSequence, CharSequence)

Xueming Shen xueming.shen at oracle.com
Wed May 27 17:40:16 UTC 2015


The .append(srepl) -> append(srep.value)

And I would simply remove the "fastpath" for the target.length() = 0 special case ...
And maybe pick an appropriate initial size (maybe this.length + 16 * diff) for the sb
to further reduce its internal expanding ...

Personally this code is straightforward and I would be happy with the 800+ vs 1000+
score diff.

     public String replace(CharSequence target, CharSequence replacement) {
         String starget = target.toString();
         int targLen = starget.length();
         String srepl = replacement.toString();

         int j = indexOf(starget);
         // special case: nothing to replace
         if (j<  0) {
             return this;
         }

         final char[] value = this.value;
         StringBuilder sb = new StringBuilder();
         int i = 0;
         do {
             sb.append(value, i, j - i)
                 .append(srepl.value);
             i = j + targLen;
         } while ((j = indexOf(starget, i))>  0);

         return sb.append(this, i, length()).toString();
     }

-Sherman


On 05/27/2015 10:06 AM, Ivan Gerasimov wrote:
>
> On 27.05.2015 18:44, Xueming Shen wrote:
>> You might want to use directly sb.append.(char[], off, len), instead of append(str, off, len)/append(str).
>
> Ah, yes sure!  It would improve things.
> I updated the webrev in place:
> http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/
>
> However the general picture is almost the same:
> performance:
> Baseline:      MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s
> StringBuilder: MyBenchmark.test  thrpt   40  808'033.808 ± 22152.166  ops/s
> Arrays:        MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  ops/s
>
> Memory efficiency is still less then for array-version:
> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
> java version "1.9.0-internal"
> Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00)
> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)
>
> 25) 100663296
>
> real    0m2.429s
> user    0m2.112s
> sys    0m0.792s
>
> Sincerely yours,
> Ivan
>
>> Btw, is the target.lenght==0 the popular use case/scenario? Just wonder why do we need a special case
>> here for it.
>>
>> thanks,
>> -Sherman
>>
>> On 5/27/15 8:29 AM, Ivan Gerasimov wrote:
>>> For completeness, here's another webrev, which uses StringBuilder:
>>> http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/
>>>
>>> Its performance is somewhere in between the current implementation and the array-based implementation:
>>> MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970  ops/s
>>>
>>> Memory efficiency is less then the one of array-based implementation:
>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>> java version "1.9.0-internal"
>>> Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00)
>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)
>>>
>>> 25) 100663296
>>>
>>> real    0m2.585s
>>> user    0m1.875s
>>> sys    0m0.887s
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>> On 27.05.2015 2:38, Xueming Shen wrote:
>>>> Ivan,
>>>>
>>>> It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself.
>>>> Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have
>>>> better balance between performance and code complexity. The regex is probably a little heavy for
>>>> literal string replacement, but StringBuilder should not be that bad ...
>>>>
>>>> -Sherman
>>>>
>>>> On 5/26/15 4:11 PM, Ivan Gerasimov wrote:
>>>>> I updated the webrev:
>>>>> http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/
>>>>>
>>>>> In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here.
>>>>>
>>>>> And  I want to add that this proposed implementation is not only faster, but also more memory efficient.
>>>>> The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation.
>>>>>
>>>>> ----------------------------------------
>>>>> public class C {
>>>>>     public static void main(String[] args) throws Throwable {
>>>>>         String s = "string";
>>>>>         for (int i = 1; i < Integer.MAX_VALUE; ++i) {
>>>>>             try {
>>>>>                 s = s.replace("string", "stringstring");
>>>>>             } catch (OutOfMemoryError o) {
>>>>>                 System.out.println(i + ") " + s.length());
>>>>>                 break;
>>>>>             }
>>>>>         }
>>>>>     }
>>>>> }
>>>>> ----------------------------------------
>>>>>
>>>>> $ time ~/java9/jdk/bin/java -showversion -Xmx1g C
>>>>> java version "1.9.0-ea"
>>>>> Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
>>>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)
>>>>>
>>>>> 25) 100663296
>>>>>
>>>>> real    0m4.525s
>>>>> user    0m4.402s
>>>>> sys    0m1.189s
>>>>>
>>>>> $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
>>>>> java version "1.9.0-internal"
>>>>> Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00)
>>>>> Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)
>>>>>
>>>>> 26) 201326592
>>>>>
>>>>> real    0m2.139s
>>>>> user    0m1.960s
>>>>> sys    0m0.461s
>>>>>
>>>>> Sincerely yours,
>>>>> Ivan
>>>>>
>>>>> On 24.05.2015 23:17, Ivan Gerasimov wrote:
>>>>>> Hello everybody!
>>>>>>
>>>>>> I know many people here like it when the performance is getting better.
>>>>>> It was suggested to make the literal variant of String.replace() faster.
>>>>>> Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code.
>>>>>>
>>>>>> I've created two versions of the fix.
>>>>>> In the first one, we scan the string and store indices of the found substrings in an array.
>>>>>> Then, we allocate the precisely sized char array and fill it it.
>>>>>> The case with the empty target has to be handled separately.
>>>>>>
>>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
>>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/
>>>>>>
>>>>>> The second variant is much less verbose, however it's less efficient too.
>>>>>> Here the StringJoiner is used as an intermediate storage.
>>>>>>
>>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
>>>>>>
>>>>>>
>>>>>> Here are the micro-benchmark results (in a string of ~300 chars do ~15 replacements).
>>>>>> 0) Baseline
>>>>>> MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484 ops/s
>>>>>>
>>>>>> 1) Heavy-duty +308%
>>>>>> MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803 ops/s
>>>>>>
>>>>>> 2) StringJoiner +190%
>>>>>> MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036 ops/s
>>>>>>
>>>>>>
>>>>>> Personally, I like my first variant better, even though it adds almost 300 lines of code.
>>>>>> But I'd like to hear what people think of it.
>>>>>>
>>>>>> Sincerely yours,
>>>>>> Ivan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>




More information about the core-libs-dev mailing list