RFR 8170348: Appendable.appendN(char, int) method to append multiple copies of char

Wed Dec 7 22:28:04 UTC 2016

Hi Ivan,

A few comments...

Appendable.appendN default method:
    I'd expect many time n is small so allocating a new StringBuilder 
every time seems
    not optimal.  A simple loop with append(c) would have fewer (memory) 
side effects.
    And in particular for n=0, it could avoid doing anything.

AbstractStringBuilder:
    I agree with Claes' comment suggesting that IAE for negative lengths 
is a pain
    and defining it to append 0 would be natural in many use cases.

MethodHandleImpl.toString():
    You could beneficially replace StringBuffer with StringBuilder while 
you are there.
    (and maybe pre-size the builder).

$.02, Roger

On 12/5/2016 2:17 PM, Paul Sandoz wrote:
>> On 4 Dec 2016, at 08:42, Ivan Gerasimov <ivan.gerasimov at oracle.com> wrote:
>>
>> Thank you Claes for looking into it!
>>
>>
>> On 04.12.2016 16:48, Claes Redestad wrote:
>>> Hi Ivan,
>>>
>>> as this adds a new public API I guess it's too late for 9 at this point, but here's a few
>>> comments anyhow:
>>>
>> Yes, of course.
>> If people find it useful, I would expect it to go to jdk 10.
> Add a fix version of 10, then it’s clear on the intent. Once tests are added :-) we can review for that release.
>
> Personally i would use Arrays.fill, i gather the pattern is already recognised:
>
> https://bugs.openjdk.java.net/browse/JDK-4809552
> http://hg.openjdk.java.net/hsx/hsx19/baseline/rev/d64a8c7aa9d5
>
> The advantage of using Arrays.fill is we know that the pattern will be recognized (if not it’s a bug, and i suppose it could be made intrinsic to wire up faster). (see -XX:+TraceOptimizeFill). I suspect we should review the filling optimization to see if it can be enhanced with newer SIMD instructions, as i gather the current implementation writes a max of 8 bytes at a time (with an explicit unrolled loop for 32 bytes at a time, containing 4 separate stores).
>
>
> It’s good that you found places in the JDK, that adds justification for such methods, especially for BigInteger and that static array of strings full of ‘0’ characters.
>
> The updates to MethodHandleImpl are not perf sensitive, i question the usage here but it does reduce stuff in the constant pool i suppose.
>
> Paul.
>
>
>>> - you could use Arrays.fill(byte[], int, int, byte) for LATIN-1 case in AbstractStringBuilder.
>>> Might not make it much faster (unless there are intrinsics at play, but perhaps a bit less
>>> verbose.
>>>
>> The do-while loop saves us one comparison, comparing to the loop in Arrays.fill().
>> I'd prefer to keep the explicit loop, as it's only three lines of code long.
>>
>>> - for a convenience API like this, I think it's slightly awkward that a negative n throws IAE
>>> since users have to think carefully about whether they need to guard the call to appendN
>>> with a range check or not. I'd find this utility more useful if it was more forgiving and
>>> allowed simplifying the caller further.
>>>
>> Yes, I understand your point.  There are different approaches to handling arguments.
>> E.g. for indices it might be allowed to have from > to (treat it as from == to), and from < 0.
>> Or, like in Perl, negative index might be treated as offset from the end of a list or array.
>> However, in Java, the tradition seems to have formed to have strong argument checks, not allowing much interpretation.
>> For example, similarly looking Collections.nCopies(int n, T) also throws IAE for negative n.
>>
>>> Benchmark comments:
>>>
>>> - since you're reusing a StringBuilder you're effectively removing the impact of resizing
>>> the underlying buffer, which is typically a significant part of appending, so while this
>>> zooms in on the cost of actually appending to a prepared builder, it might overstate the
>>> effect.
>>>
>> It was intentional, to be honest.
>> If appending several chars causes reallocation, then appending chars in a loop can only be slower, comparing to appendN() or append(String).
>> I didn't want to find the sharp constant of the speedup factor, but just wanted to be sure that the increase in performance is observable.
>>
>>> Creating new StringBuilders (of varying sizes) in a setup method outside the
>>> @Benchmark method might be more in line with typical use, in addition to what you have
>>> now (which is zooming in on the cost of appending without allocation overhead).
>>> setLength(0) could also be moved to an invocation level @Setup method)
>>>
>>> - seeing that appending a String, which uses System.arraycopy, can be slower for small
>>> strings is a bit surprising as I'd assume it'd be completely intrinsified. Is the compiled code
>>> making a JNI transition or are things not being inlined properly?
>>>
>> I'm not sure exactly why appending short String is slower then filling.
>> Might it be because the former means both reading from and writing to the memory, and the later only means writing?
>> Anyways, I only wanted to make sure that replacing the code in BigInteger and FDBigInteger won't make things slower.
>>
>>> - please use -tu us -bm avgt or annotate benchmarks to output scores with a reasonable
>>> number of digits.
>>>
>> Sure.  Here you go:
>>
>> Benchmark                 (size)  Mode  Cnt  Score   Error Units
>> MyBenchmark.test_0_New         0  avgt    4  0.003 ± 0.004 us/op
>> MyBenchmark.test_0_New         1  avgt    4  0.005 ± 0.008 us/op
>> MyBenchmark.test_0_New         5  avgt    4  0.014 ± 0.015 us/op
>> MyBenchmark.test_0_New        10  avgt    4  0.016 ± 0.019 us/op
>> MyBenchmark.test_0_New        20  avgt    4  0.018 ± 0.010 us/op
>> MyBenchmark.test_1_Old         0  avgt    4  0.003 ± 0.001 us/op
>> MyBenchmark.test_1_Old         1  avgt    4  0.006 ± 0.004 us/op
>> MyBenchmark.test_1_Old         5  avgt    4  0.023 ± 0.021 us/op
>> MyBenchmark.test_1_Old        10  avgt    4  0.049 ± 0.071 us/op
>> MyBenchmark.test_1_Old        20  avgt    4  0.089 ± 0.110 us/op
>> MyBenchmark.test_2_Solid       0  avgt    4  0.007 ± 0.003 us/op
>> MyBenchmark.test_2_Solid       1  avgt    4  0.018 ± 0.024 us/op
>> MyBenchmark.test_2_Solid       5  avgt    4  0.016 ± 0.011 us/op
>> MyBenchmark.test_2_Solid      10  avgt    4  0.017 ± 0.016 us/op
>> MyBenchmark.test_2_Solid      20  avgt    4  0.016 ± 0.007 us/op
>>
>>
>> With kind regards,
>> Ivan
>>
>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> On 12/04/2016 04:07 AM, Ivan Gerasimov wrote:
>>>> Hello!
>>>>
>>>> There are several places in JDK where the same character is appended to a StringBuilder object multiple times (usually padding).
>>>> With each append there are a few routine checks performed.
>>>> They could have been done only once, if we had a method for appending multiple copies at a time.
>>>> A simple benchmark shows that such method may save us a few machine cycles (see the results below).
>>>>
>>>> In the benchmark, three approaches were compared:
>>>> 0) Using the new appendN(char, int) method to append several chars at once,
>>>> 1) Calling append(char) in a loop,
>>>> 2) Appending a prepared-in-advance string
>>>>
>>>> On my machine, the new method demonstrates better or comparable performance for all sizes up to 20.
>>>>
>>>> In the webrev, there are two changesets included:
>>>> - the new default Appendable.appendN(char, int) method, its overrides in StringBuilder/Buffer and a basic test,
>>>> - several applications of the new method across JDK.
>>>>
>>>> Would you please help review?
>>>> Comments, suggestions are welcome.
>>>>
>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8170348
>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8170348/00/webrev/
>>>> Benchmark: http://cr.openjdk.java.net/~igerasim/8170348/00/MyBenchmark.java
>>>>
>>>>
>>>> Benchmark                 (size)   Mode  Cnt Score Error Units
>>>> MyBenchmark.test_0_New         0  thrpt   70  331922128.215 ± 16399254.452  ops/s
>>>> MyBenchmark.test_0_New         1  thrpt   70  209207932.893 ± 14955800.231  ops/s
>>>> MyBenchmark.test_0_New         5  thrpt   70   72926671.621 ± 4841791.555  ops/s
>>>> MyBenchmark.test_0_New        10  thrpt   70   67779575.053 ± 3234366.239  ops/s
>>>> MyBenchmark.test_0_New        20  thrpt   70   59731629.661 ± 2769497.288  ops/s
>>>> MyBenchmark.test_1_Old         0  thrpt   70  333467628.860 ± 15981678.430  ops/s
>>>> MyBenchmark.test_1_Old         1  thrpt   70  156126381.967 ± 9619653.294  ops/s
>>>> MyBenchmark.test_1_Old         5  thrpt   70   46550204.382 ± 2009987.637  ops/s
>>>> MyBenchmark.test_1_Old        10  thrpt   70   23309297.849 ± 1268874.282  ops/s
>>>> MyBenchmark.test_1_Old        20  thrpt   70   13143637.821 ± 662265.103  ops/s
>>>> MyBenchmark.test_2_Solid       0  thrpt   70  138548108.540 ± 6408775.462  ops/s
>>>> MyBenchmark.test_2_Solid       1  thrpt   70   63890936.132 ± 3918274.970  ops/s
>>>> MyBenchmark.test_2_Solid       5  thrpt   70   65838879.075 ± 2701493.698  ops/s
>>>> MyBenchmark.test_2_Solid      10  thrpt   70   65387238.993 ± 3131562.548  ops/s
>>>> MyBenchmark.test_2_Solid      20  thrpt   70   57528150.828 ± 3171453.716  ops/s
>>>>
>>>>
>>>> With kind regards,
>>>> Ivan
>>>>
>>>