RFR(S): 8067471: Use private static final char[0] for empty Strings

Fri Dec 19 19:13:36 UTC 2014

No, as mentioned, I didn't check for mixed inputs, which is probably 
necessary to ensure we don't slow down the typical case due
to adding an extra branch, so I went about and made an attempt to write 
a microbenchmark to capture this[1] by making the count
parameter unpredictable without inducing much overhead (no randomness in 
the benchmark code itself), which should make
it sufficiently hard for the compiler to prune the branch checks. I 
don't think generating different char[]s would change things,
but feel free to improve upon this and generate random char[]s to verify 
that.

Doing this, I added Ivan Gerasimov's idea to fold the rare count == 0 
check into the count < 0 clause, which seemed like a
reasonable way to avoid adding the extra branch to the normal path[2], 
although it makes the code a little weirder. I'll denote
this version as experiment2, and my original code as experiment.

Results grouped on maxCount parameter, then ordered internally from 
baseline, experiment, experiment2. Probability of getting
count == 0 is roughly 1/maxCount, and the cost per operation (average) 
increases as maxCount goes up as we're copying more
data on average, the interesting metric is whether or not this 
optimization breaks the normal flow when we're only rarely
hitting the non-allocating branch, which would show up as the experiment 
underperforming versus the baseline as we increase
maxCount:

Benchmark                                  (maxCount) (numCount)   Mode  
Samples   Score   Error   Units
o.s.SimpleBench.mixedNewStringFromCount             2 1000000  
thrpt       50  46.114 ± 0.391  ops/us
o.s.SimpleBench.mixedNewStringFromCount             2 1000000  
thrpt       50  64.320 ± 0.569  ops/us
o.s.SimpleBench.mixedNewStringFromCount             2 1000000  
thrpt       50  64.874 ± 1.095  ops/us

o.s.SimpleBench.mixedNewStringFromCount            10 1000000  
thrpt       50  48.609 ± 0.921  ops/us
o.s.SimpleBench.mixedNewStringFromCount            10     1000000 
thrpt       50  50.562 ± 0.598  ops/us
o.s.SimpleBench.mixedNewStringFromCount            10     1000000 
thrpt       50  50.724 ± 0.535  ops/us

o.s.SimpleBench.mixedNewStringFromCount           100     1000000 
thrpt       50  31.422 ± 0.553  ops/us
o.s.SimpleBench.mixedNewStringFromCount           100     1000000 
thrpt       50  31.416 ± 0.512  ops/us
o.s.SimpleBench.mixedNewStringFromCount           100     1000000 
thrpt       50  31.852 ± 0.441  ops/us

o.s.SimpleBench.mixedNewStringFromCount          1000     1000000 
thrpt       50  5.528  ± 0.053  ops/us
o.s.SimpleBench.mixedNewStringFromCount          1000     1000000 
thrpt       50  5.531  ± 0.039  ops/us
o.s.SimpleBench.mixedNewStringFromCount          1000     1000000 
thrpt       50  5.550  ± 0.033  ops/us

To me it looks like the micro-optimization can be applied either way 
without much risk that it'll slow down the normal flow.
There's some indication that Ivan's suggestion is a slight improvement 
across the board.

/Claes

[1]
package org.sample;

import org.openjdk.jmh.annotations.*;
import java.util.concurrent.ThreadLocalRandom;

@State(Scope.Thread)
public class SimpleBench {

     @Param("1000000")
     public int numCount;

     @Param("10")
     public int maxCount;

     public int[] counts;

     public char[] data;

     public int index = 0;

     @Setup
     public void buildCounts() {
         counts = new int[numCount];
         data = new char[maxCount];
         ThreadLocalRandom r = ThreadLocalRandom.current();
         for (int i = 0; i < numCount; i++) {
             counts[i] = r.nextInt(0, maxCount);
         }
         for (int i = 0; i < maxCount; i++) {
             data[i] = 'a';
         }
     }

     @Benchmark
     public String mixedNewStringFromCount() {
         if (index >= numCount) {
             index = 0;
         }
         return new String(data, 0, counts[index++]);
     }
}

[2] http://cr.openjdk.java.net/~redestad/8067471/webrev.03/

On 2014-12-21 17:08, Lev Priima wrote:
> Claes,
>
> Is the source of arrays in these benchmarks unpredictable enough to 
> avoid branch checks elimination in compiled code?
>
> On 12/19/2014 05:25 PM, Claes Redestad wrote:
>> Hi again,
>>
>> I just had a nagging thought. For applications reporting some or a 
>> lot of different empty String objects,
>> this might be due not to use of this constructor, but due to things 
>> like an empty StringBuilder.toString() ending
>> up calling java.lang.String(char[] value, int offset, int count) with 
>> a count of 0.
>>
>> My feeling is that this would be a more probable/reasonable source of 
>> distinct empty strings in an application,
>> and would not be helped much by the current patch.
>>
>> For completeness the java.lang.String(char/int[], int, int) 
>> constructors could have a check for count == 0 in
>> which case it assigns value to "".value, see 
>> http://cr.openjdk.java.net/~redestad/8067471/webrev.01/
>>
>> Running a few simple microbenchmarks on this (sorry, Aleksey!)[1] 
>> yields a bit of speedup, at no discernible
>> cost for a quick sanity check of an actually copying path:
>>
>> baseline:
>> Benchmark                            Mode  Samples    Score Error   
>> Units
>> o.s.SimpleBench.emptySBBench        thrpt        5   48.213 ± 3.509  
>> ops/us
>> o.s.SimpleBench.emptyString         thrpt        5  103.900 ± 4.869  
>> ops/us
>> o.s.SimpleBench.emptyStringCount    thrpt        5  105.802 ± 3.355  
>> ops/us
>> o.s.SimpleBench.emptyStringCountCP  thrpt        5  104.464 ± 8.251  
>> ops/us
>> o.s.SimpleBench.singleStringCount   thrpt        5   76.379 ± 4.139  
>> ops/us
>> o.s.SimpleBench.singleStringCountCP thrpt        5   88.202 ± 4.945  
>> ops/us
>>
>> experiment:
>> Benchmark                            Mode  Samples    Score Error   
>> Units
>> o.s.SimpleBench.emptySBBench        thrpt        5  153.822 ± 7.900  
>> ops/us 3x
>> o.s.SimpleBench.emptyString         thrpt        5  183.588 ± 4.429  
>> ops/us 1.75x
>> o.s.SimpleBench.emptyStringCount    thrpt        5  166.457 ± 9.202  
>> ops/us 1.6x
>> o.s.SimpleBench.emptyStringCountCP  thrpt        5  168.089 ± 4.676  
>> ops/us 1.6x
>> o.s.SimpleBench.singleStringCount   thrpt        5   75.780 ± 8.490  
>> ops/us 1x
>> o.s.SimpleBench.singleStringCountCP thrpt        5   89.202 ± 3.714  
>> ops/us 1x
>>
>> I guess we'd need to check that compiled code don't get messed up for 
>> a mix of inputs where count can be both
>> zero and non-zero with some probability.
> Do we have such benchmarks examples?
>>
>> /Claes
>>
>> [1] public char[] emptyCharArray = new char[0];
>>     public int[] emptyIntArray = new int[0];
>>
>>     public char[] singleCharArray = new char[] { 'x' };
>>     public int[] singleIntArray = new int[] { 102 };
>>     public StringBuilder emptySB = new StringBuilder();
>>
>>     @Benchmark
>>     public String emptySBBench() {
>>         return emptySB.toString();
>>     }
>>
>>     @Benchmark
>>     public String emptyString() {
>>         return new String();
>>     }
>>
>>     @Benchmark
>>     public String emptyStringCount() {
>>         return new String(emptyCharArray, 0, 0);
>>     }
>>
>>     @Benchmark
>>     public String emptyStringCountCP() {
>>         return new String(emptyIntArray, 0, 0);
>>     }
>>
>>     @Benchmark
>>     public String singleStringCount() {
>>         return new String(singleCharArray, 0, 1);
>>     }
>>
>>     @Benchmark
>>     public String singleStringCountCP() {
>>         return new String(singleIntArray, 0, 1);
>>     }
>>
>>
>> On 2014-12-19 01:41, Lev Priima wrote:
>>> Claes,
>>> Thanks for cool suggestion: 
>>> http://cr.openjdk.java.net/~lpriima/8067471/webrev.01/:
>>>   public java.lang.String();
>>>     Signature: ()V
>>>     flags: ACC_PUBLIC
>>>     LineNumberTable:
>>>       line 137: 0
>>>       line 138: 4
>>>       line 139: 13
>>>     Code:
>>>       stack=2, locals=1, args_size=1
>>>          0: aload_0
>>>          1: invokespecial #1                  // Method 
>>> java/lang/Object."<init>":()V
>>>          4: aload_0
>>>          5: ldc           #2                  // String
>>>          7: getfield      #3                  // Field value:[C
>>>         10: putfield      #3                  // Field value:[C
>>>         13: return
>>>       LineNumberTable:
>>>         line 137: 0
>>>         line 138: 4
>>>
>>>
>>> Aleksey ,
>>> By naive glance, new constructor thrpt is  bigger(jdk9/dev vs 9b42 
>>> in attach) on:
>>>     @Benchmark
>>>     public String getNewString() {
>>>         return new String();
>>>     }
>>> Please suggest things to compare . Do we have bigapps-like 
>>> benchmarks which may show regression on this?
>>>
>>> On 17.12.2014 21:10, Aleksey Shipilev wrote:
>>>> On 17.12.2014 18:58, Claes Redestad wrote:
>>>>> On 2014-12-17 11:22, Lev Priima wrote:
>>>>>> Please review space optimization in no args String constructor.
>>>>>> Originally, it was already rejected once by suggestion in
>>>>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-May/010300.html 
>>>>>>
>>>>>> w/o formal justification of pros and contras.  And enhancement was
>>>>>> requested again by Nathan Reynolds.
>>>>>>
>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8067471
>>>>>> Patch: http://cr.openjdk.java.net/~lpriima/8067471/webrev.00/
>>>> I am OK with this change pending we understand the performance 
>>>> impact a
>>>> little better. Please do benchmarks.
>>>>
>>>>
>>>>> Could this have some obscure security implications, such as the 
>>>>> ability
>>>>> to lock up some subsystem via retrieving and synchronizing on the 
>>>>> shared
>>>>> new String().value? I guess not, for practical purposes.
>>>> This would be an issue if we published String.value directly, but 
>>>> we don't.
>>>>
>>>>> I guess it could also be written:
>>>>>
>>>>>      public String() {
>>>>>          this.value = "".value;
>>>>>      }
>>>>>
>>>>> ... which would saves some byte code and should avoid having to 
>>>>> allocate
>>>>> a distinct object for this.
>>>> Oh, I like this trick, given "" is probably already in constant 
>>>> pool for
>>>> some other reason. However, there is an additional dereference. We 
>>>> need
>>>> a targeted nanobenchmark to properly quantify the costs for either 
>>>> variant.
>>>>
>>>> -Aleksey.
>>>>
>>>>
>>>
>>
>