RFR 8170348: Appendable.appendN(char, int) method to append multiple copies of char

Mon Dec 5 19:17:38 UTC 2016

> On 4 Dec 2016, at 08:42, Ivan Gerasimov <ivan.gerasimov at oracle.com> wrote:
> 
> Thank you Claes for looking into it!
> 
> 
> On 04.12.2016 16:48, Claes Redestad wrote:
>> Hi Ivan,
>> 
>> as this adds a new public API I guess it's too late for 9 at this point, but here's a few
>> comments anyhow:
>> 
> Yes, of course.
> If people find it useful, I would expect it to go to jdk 10.

Add a fix version of 10, then it’s clear on the intent. Once tests are added :-) we can review for that release.

Personally i would use Arrays.fill, i gather the pattern is already recognised:

https://bugs.openjdk.java.net/browse/JDK-4809552
http://hg.openjdk.java.net/hsx/hsx19/baseline/rev/d64a8c7aa9d5

The advantage of using Arrays.fill is we know that the pattern will be recognized (if not it’s a bug, and i suppose it could be made intrinsic to wire up faster). (see -XX:+TraceOptimizeFill). I suspect we should review the filling optimization to see if it can be enhanced with newer SIMD instructions, as i gather the current implementation writes a max of 8 bytes at a time (with an explicit unrolled loop for 32 bytes at a time, containing 4 separate stores).

It’s good that you found places in the JDK, that adds justification for such methods, especially for BigInteger and that static array of strings full of ‘0’ characters.

The updates to MethodHandleImpl are not perf sensitive, i question the usage here but it does reduce stuff in the constant pool i suppose.

Paul.

> 
>> - you could use Arrays.fill(byte[], int, int, byte) for LATIN-1 case in AbstractStringBuilder.
>> Might not make it much faster (unless there are intrinsics at play, but perhaps a bit less
>> verbose.
>> 
> The do-while loop saves us one comparison, comparing to the loop in Arrays.fill().
> I'd prefer to keep the explicit loop, as it's only three lines of code long.
> 
>> - for a convenience API like this, I think it's slightly awkward that a negative n throws IAE
>> since users have to think carefully about whether they need to guard the call to appendN
>> with a range check or not. I'd find this utility more useful if it was more forgiving and
>> allowed simplifying the caller further.
>> 
> Yes, I understand your point.  There are different approaches to handling arguments.
> E.g. for indices it might be allowed to have from > to (treat it as from == to), and from < 0.
> Or, like in Perl, negative index might be treated as offset from the end of a list or array.
> However, in Java, the tradition seems to have formed to have strong argument checks, not allowing much interpretation.
> For example, similarly looking Collections.nCopies(int n, T) also throws IAE for negative n.
> 
>> Benchmark comments:
>> 
>> - since you're reusing a StringBuilder you're effectively removing the impact of resizing
>> the underlying buffer, which is typically a significant part of appending, so while this
>> zooms in on the cost of actually appending to a prepared builder, it might overstate the
>> effect.
>> 
> It was intentional, to be honest.
> If appending several chars causes reallocation, then appending chars in a loop can only be slower, comparing to appendN() or append(String).
> I didn't want to find the sharp constant of the speedup factor, but just wanted to be sure that the increase in performance is observable.
> 
>> Creating new StringBuilders (of varying sizes) in a setup method outside the
>> @Benchmark method might be more in line with typical use, in addition to what you have
>> now (which is zooming in on the cost of appending without allocation overhead).
>> setLength(0) could also be moved to an invocation level @Setup method)
>> 
>> - seeing that appending a String, which uses System.arraycopy, can be slower for small
>> strings is a bit surprising as I'd assume it'd be completely intrinsified. Is the compiled code
>> making a JNI transition or are things not being inlined properly?
>> 
> I'm not sure exactly why appending short String is slower then filling.
> Might it be because the former means both reading from and writing to the memory, and the later only means writing?
> Anyways, I only wanted to make sure that replacing the code in BigInteger and FDBigInteger won't make things slower.
> 
>> - please use -tu us -bm avgt or annotate benchmarks to output scores with a reasonable
>> number of digits.
>> 
> Sure.  Here you go:
> 
> Benchmark                 (size)  Mode  Cnt  Score   Error Units
> MyBenchmark.test_0_New         0  avgt    4  0.003 ± 0.004 us/op
> MyBenchmark.test_0_New         1  avgt    4  0.005 ± 0.008 us/op
> MyBenchmark.test_0_New         5  avgt    4  0.014 ± 0.015 us/op
> MyBenchmark.test_0_New        10  avgt    4  0.016 ± 0.019 us/op
> MyBenchmark.test_0_New        20  avgt    4  0.018 ± 0.010 us/op
> MyBenchmark.test_1_Old         0  avgt    4  0.003 ± 0.001 us/op
> MyBenchmark.test_1_Old         1  avgt    4  0.006 ± 0.004 us/op
> MyBenchmark.test_1_Old         5  avgt    4  0.023 ± 0.021 us/op
> MyBenchmark.test_1_Old        10  avgt    4  0.049 ± 0.071 us/op
> MyBenchmark.test_1_Old        20  avgt    4  0.089 ± 0.110 us/op
> MyBenchmark.test_2_Solid       0  avgt    4  0.007 ± 0.003 us/op
> MyBenchmark.test_2_Solid       1  avgt    4  0.018 ± 0.024 us/op
> MyBenchmark.test_2_Solid       5  avgt    4  0.016 ± 0.011 us/op
> MyBenchmark.test_2_Solid      10  avgt    4  0.017 ± 0.016 us/op
> MyBenchmark.test_2_Solid      20  avgt    4  0.016 ± 0.007 us/op
> 
> 
> With kind regards,
> Ivan
> 
> 
>> Thanks!
>> 
>> /Claes
>> 
>> On 12/04/2016 04:07 AM, Ivan Gerasimov wrote:
>>> Hello!
>>> 
>>> There are several places in JDK where the same character is appended to a StringBuilder object multiple times (usually padding).
>>> With each append there are a few routine checks performed.
>>> They could have been done only once, if we had a method for appending multiple copies at a time.
>>> A simple benchmark shows that such method may save us a few machine cycles (see the results below).
>>> 
>>> In the benchmark, three approaches were compared:
>>> 0) Using the new appendN(char, int) method to append several chars at once,
>>> 1) Calling append(char) in a loop,
>>> 2) Appending a prepared-in-advance string
>>> 
>>> On my machine, the new method demonstrates better or comparable performance for all sizes up to 20.
>>> 
>>> In the webrev, there are two changesets included:
>>> - the new default Appendable.appendN(char, int) method, its overrides in StringBuilder/Buffer and a basic test,
>>> - several applications of the new method across JDK.
>>> 
>>> Would you please help review?
>>> Comments, suggestions are welcome.
>>> 
>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8170348
>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8170348/00/webrev/
>>> Benchmark: http://cr.openjdk.java.net/~igerasim/8170348/00/MyBenchmark.java
>>> 
>>> 
>>> Benchmark                 (size)   Mode  Cnt Score Error Units
>>> MyBenchmark.test_0_New         0  thrpt   70  331922128.215 ± 16399254.452  ops/s
>>> MyBenchmark.test_0_New         1  thrpt   70  209207932.893 ± 14955800.231  ops/s
>>> MyBenchmark.test_0_New         5  thrpt   70   72926671.621 ± 4841791.555  ops/s
>>> MyBenchmark.test_0_New        10  thrpt   70   67779575.053 ± 3234366.239  ops/s
>>> MyBenchmark.test_0_New        20  thrpt   70   59731629.661 ± 2769497.288  ops/s
>>> MyBenchmark.test_1_Old         0  thrpt   70  333467628.860 ± 15981678.430  ops/s
>>> MyBenchmark.test_1_Old         1  thrpt   70  156126381.967 ± 9619653.294  ops/s
>>> MyBenchmark.test_1_Old         5  thrpt   70   46550204.382 ± 2009987.637  ops/s
>>> MyBenchmark.test_1_Old        10  thrpt   70   23309297.849 ± 1268874.282  ops/s
>>> MyBenchmark.test_1_Old        20  thrpt   70   13143637.821 ± 662265.103  ops/s
>>> MyBenchmark.test_2_Solid       0  thrpt   70  138548108.540 ± 6408775.462  ops/s
>>> MyBenchmark.test_2_Solid       1  thrpt   70   63890936.132 ± 3918274.970  ops/s
>>> MyBenchmark.test_2_Solid       5  thrpt   70   65838879.075 ± 2701493.698  ops/s
>>> MyBenchmark.test_2_Solid      10  thrpt   70   65387238.993 ± 3131562.548  ops/s
>>> MyBenchmark.test_2_Solid      20  thrpt   70   57528150.828 ± 3171453.716  ops/s
>>> 
>>> 
>>> With kind regards,
>>> Ivan
>>> 
>> 
>> 
>