[PATCH] Allocate uninitialized byte[] to improve performance of String-related operations

Сергей Цыпанов sergei.tsypanov at yandex.ru
Thu Sep 10 20:33:23 UTC 2020


Hi Paul,

thanks for looking into this!

> The update to AbstractStringBuilder is problematic because the uninitialized array could potentially be accessed through reflection.
>
> In other cases it seems you have slotted in the allocation at a low-level e.g. StringUTF16.newBytesFor. That makes it hard for me to evaluate the implications, and it's not clear to me it offers advantages in all cases, especially given the risks. The toCharArray/Utf allocation costs appear to be lost in the noise, at least from your benchmark results.

I've reverted suspicious changes you've mentioned along with those that gave no improvement.

> However, in the repeat of 1 case I would like to know why C2 does not elide the zeroing of the array, since it should know that the fill covers the whole array.

Should I report this case to hotspot-compiler-dev mailing list?

> Separately, renaming StringConcatHelper.new{Char}Array to StringConcatHelper.newUninitialized{Char}Array would be clearer, and perhaps moving to somewhere else.

After reverting StringLatin1/StringUTF16.toChars() we don't need this any more, so I've reverted it too.
Initially I decided to keep it within StringConcatHelper as the latter already contained Unsafe field.

I've reworked the patch, benchmark is in [1] and the results are below. The patch is attached.
I think these are useful changes bringing good improvement at low cost and no risk
as we are sure that we are dealing only with local variables.

Also pay attention, that StringLatin1.replace() has already been using StringConcatHelper.newArray()
and improvement is achieved with usage of System.arraycopy() instead of manual copying.
Is it ok to keep this along with other ca
 
Benchmark            (length)               before               after    Units

newString                   8     12.805 ±   0.024    11.945 ±   0.124    ns/op
newString                  64     48.165 ±   0.105    38.525 ±   0.041    ns/op
newString                 128     79.746 ±   0.120    69.061 ±   0.081    ns/op
newString                1024    631.384 ±   1.068   521.811 ±   0.592    ns/op

repeatOneByteString         8     10.278 ±   0.017     9.751 ±   0.027    ns/op
repeatOneByteString        64     18.034 ±   0.017    15.173 ±   0.060    ns/op
repeatOneByteString       128     30.749 ±   0.491    22.568 ±   0.033    ns/op
repeatOneByteString      1024    109.009 ±   2.476    82.269 ±   0.101    ns/op

repeatOneCharString         8     22.141 ±   0.703    20.376 ±   0.171    ns/op
repeatOneCharString        64     44.796 ±   0.764    36.344 ±   0.059    ns/op
repeatOneCharString       128     57.736 ±   0.134    43.992 ±   0.287    ns/op
repeatOneCharString      1024    229.131 ±   1.211   176.311 ±   0.497    ns/op

replace                     8     14.545 ±   0.019    13.426 ±   0.089    ns/op
replace                    64     56.453 ±   0.096    36.115 ±   0.107    ns/op
replace                   128    108.701 ±   1.201    65.387 ±   1.159    ns/op
replace                  1024    820.524 ±   0.852   427.047 ±   0.510    ns/op

replaceUtf                  8     17.480 ±   0.089    15.908 ±   0.175    ns/op
replaceUtf                 64     44.982 ±   0.500    34.091 ±   0.209    ns/op
replaceUtf                128     66.433 ±   0.427    50.828 ±   0.645    ns/op
replaceUtf               1024    407.634 ±  11.850   302.786 ±   1.152    ns/op

With best regards,
Sergey Tsypanov

1. Benchmark

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
public class MiscStringBenchmark {

  @Benchmark
  public String newString(Data data){
    return new String(data.codepoints, 0, data.codepoints.length);
  }

  @Benchmark
  public String repeatOneByteString(Data data) {
    return data.oneCharString.repeat(data.length);
  }

  @Benchmark
  public String repeatOneCharString(Data data) {
    return data.oneUtfCharString.repeat(data.length);
  }

  @Benchmark
  public String replace(Data data){
    return data.stringToReplace.replace('z', 'b');
  }

  @Benchmark
  public String replaceUtf(Data data){
    return data.utfStringToReplace.replace('я', 'ю');
  }

  @State(Scope.Thread)
  public static class Data {
    @Param({"8", "64", "128", "1024"})
    private int length;
    private final String oneCharString = "a";
    private final String oneUtfCharString = "ё";

    private String stringToReplace;
    private String utfStringToReplace;
    private int[] codepoints;

    @Setup
    public void setup() {
      String string = oneCharString.repeat(length);
      String utfString = oneUtfCharString.repeat(length);

      stringToReplace = string + 'z';
      codepoints = stringToReplace.codePoints().toArray();
      utfStringToReplace = utfString + 'я';
    }
  }
}


10.09.2020, 01:24, "Paul Sandoz" <paul.sandoz at oracle.com>:
> Hi Sergey,
>
> Unsafe.allocateUninitializedArray should be used with great care, so as we are confident that no array, covering uninitialized memory, leaks out. Or later we unintentionally introduce a problem through refactoring.
>
> The update to AbstractStringBuilder is problematic because the uninitialized array could potentially be accessed through reflection.
>
> In other cases it seems you have slotted in the allocation at a low-level e.g. StringUTF16.newBytesFor. That makes it hard for me to evaluate the implications, and it's not clear to me it offers advantages in all cases, especially given the risks. The toCharArray/Utf allocation costs appear to be lost in the noise, at least from your benchmark results.
>
> The method repeat seems a good candidate to focus on given the allocation and looping is obviously localized.
> However, in the repeat of 1 case I would like to know why C2 does not elide the zeroing of the array, since it should know that the fill covers the whole array.
> For the repeat > 2 we can excuse C2 for not seeing through to the ranges passed to System.arraycopy.
>
> Separately, renaming StringConcatHelper.new{Char}Array to StringConcatHelper.newUninitialized{Char}Array would be clearer, and perhaps moving to somewhere else.
>
> Paul.
>
>>  On Sep 4, 2020, at 5:52 AM, Сергей Цыпанов <sergei.tsypanov at yandex.ru> wrote:
>>
>>  Hello,
>>
>>  currently jdk.internal.misc.Unsafe declares method allocateUninitializedArray(Class, int) returning uninitialized array of given type and length.
>>
>>  Allocation of uninitialized arrays is faster especially for larger ones, so we could use them for cases
>>  when we are sure that created array will be completely overwritten or is guarded by count field (e.g. in AbstractStringBuilder).
>>
>>  I've exposed jdk.internal.misc.Unsafe.allocateUninitializedArray(Class, int)
>>  via delegating method of sun.misc.Unsafe to measure creation of byte[] with benchmark [1]
>>  and got those results:
>>
>>                     (length) Mode Cnt Score Error Units
>>  constructor 0 avgt 50 7.639 ± 0.629 ns/op
>>  constructor 10 avgt 50 9.448 ± 0.725 ns/op
>>  constructor 100 avgt 50 21.158 ± 1.865 ns/op
>>  constructor 1000 avgt 50 146.158 ± 9.836 ns/op
>>  constructor 10000 avgt 50 916.321 ± 50.618 ns/op
>>
>>  unsafe 0 avgt 50 8.057 ± 0.599 ns/op
>>  unsafe 10 avgt 50 8.308 ± 0.907 ns/op
>>  unsafe 100 avgt 50 12.232 ± 1.813 ns/op
>>  unsafe 1000 avgt 50 37.679 ± 1.382 ns/op
>>  unsafe 10000 avgt 50 78.896 ± 6.758 ns/op
>>
>>  As a result I propose to add the following methods into StringConcatHelper
>>
>>  @ForceInline
>>  static byte[] newArray(int length) {
>>     return (byte[]) UNSAFE.allocateUninitializedArray(byte.class, length);
>>  }
>>
>>  @ForceInline
>>  static char[] newCharArray(int length) {
>>     return (char[]) UNSAFE.allocateUninitializedArray(char.class, length);
>>  }
>>
>>  along with existing StringConcatHelper.newArray(long indexCoder) and utilize them in String-related operations
>>  instead of conventional array creation with new-keyword.
>>
>>  I've used benchmark [2] to measure impact on affected String-methods and found quite a good improvement:
>>
>>                                                  before after
>>
>>  Benchmark (length) Score Error Score Error Units
>>
>>  newStringBuilderWithLength 8 8.288 ± 0.411 5.656 ± 0.019 ns/op
>>  newStringBuilderWithLength 64 12.954 ± 0.687 7.588 ± 0.009 ns/op
>>  newStringBuilderWithLength 128 20.603 ± 0.412 10.446 ± 0.005 ns/op
>>  newStringBuilderWithLength 1024 119.935 ± 2.383 35.452 ± 0.029 ns/op
>>
>>  newStringBuilderWithString 8 19.721 ± 0.375 14.642 ± 0.052 ns/op
>>  newStringBuilderWithString 64 34.006 ± 1.523 15.479 ± 0.031 ns/op
>>  newStringBuilderWithString 128 36.697 ± 0.972 17.052 ± 0.133 ns/op
>>  newStringBuilderWithString 1024 140.486 ± 6.396 85.156 ± 0.175 ns/op
>>
>>  repeatOneByteString 8 11.340 ± 0.197 9.736 ± 0.051 ns/op
>>  repeatOneByteString 64 20.859 ± 0.257 15.073 ± 0.024 ns/op
>>  repeatOneByteString 128 36.311 ± 1.162 22.808 ± 0.198 ns/op
>>  repeatOneByteString 1024 149.243 ± 3.083 82.839 ± 0.193 ns/op
>>
>>  repeatOneChar 8 28.033 ± 0.615 20.377 ± 0.137 ns/op
>>  repeatOneChar 64 56.399 ± 1.094 36.230 ± 0.051 ns/op
>>  repeatOneChar 128 68.423 ± 5.647 44.157 ± 0.239 ns/op
>>  repeatOneChar 1024 230.115 ± 0.312 179.126 ± 0.437 ns/op
>>
>>  replace 8 14.684 ± 0.088 14.434 ± 0.057 ns/op
>>  replace 64 56.811 ± 0.612 56.420 ± 0.050 ns/op
>>  replace 128 112.694 ± 0.404 109.799 ± 1.202 ns/op
>>  replace 1024 837.939 ± 0.855 818.802 ± 0.408 ns/op
>>
>>  replaceUtf 8 17.802 ± 0.074 15.744 ± 0.094 ns/op
>>  replaceUtf 64 45.754 ± 0.139 39.228 ± 0.864 ns/op
>>  replaceUtf 128 67.170 ± 0.353 50.497 ± 1.218 ns/op
>>  replaceUtf 1024 415.767 ± 6.829 297.831 ± 22.510 ns/op
>>
>>  toCharArray 8 6.164 ± 0.033 6.128 ± 0.064 ns/op
>>  toCharArray 64 10.960 ± 0.032 13.566 ± 0.802 ns/op
>>  toCharArray 128 20.373 ± 0.013 20.823 ± 0.376 ns/op
>>  toCharArray 1024 165.923 ± 0.098 164.362 ± 0.065 ns/op
>>
>>  toCharArrayUTF 8 8.009 ± 0.067 7.778 ± 0.026 ns/op
>>  toCharArrayUTF 64 11.112 ± 0.014 10.880 ± 0.010 ns/op
>>  toCharArrayUTF 128 20.390 ± 0.014 20.103 ± 0.017 ns/op
>>  toCharArrayUTF 1024 166.233 ± 0.076 163.827 ± 0.099 ns/op
>>
>>  So the question is could we include the changes of attached patch into JDK?
>>
>>  With best regards,
>>  Sergey Tsypanov
>>
>>  1. Benchmark for array-allocation
>>
>>  @BenchmarkMode(Mode.AverageTime)
>>  @OutputTimeUnit(TimeUnit.NANOSECONDS)
>>  @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
>>  public class ArrayAllocationBenchmark {
>>   private static Unsafe U;
>>
>>   static {
>>     try {
>>       Field f = Unsafe.class.getDeclaredField("theUnsafe");
>>       f.setAccessible(true);
>>       U = (Unsafe) f.get(null);
>>     } catch (Exception e) {
>>       new RuntimeException(e);
>>     }
>>   }
>>
>>   @Benchmark
>>   public byte[] constructor(Data data) {
>>     return new byte[data.length];
>>   }
>>
>>   @Benchmark
>>   public byte[] unsafe(Data data) {
>>     return (byte[]) U.allocateUninitializedArray(byte.class, data.length);
>>   }
>>
>>   @State(Scope.Thread)
>>   public static class Data {
>>     @Param({"0", "10", "100", "1000", "10000"})
>>     private int length;
>>   }
>>  }
>>
>>  2. Benchmark for String-method measurements
>>
>>  @BenchmarkMode(Mode.AverageTime)
>>  @OutputTimeUnit(TimeUnit.NANOSECONDS)
>>  @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
>>  public class MiscStringBenchmark {
>>
>>   @Benchmark
>>   public char[] toCharArrayLatin1(Data data) {
>>     return data.string.toCharArray();
>>   }
>>
>>   @Benchmark
>>   public char[] toCharArrayUTF(Data data) {
>>     return data.utfString.toCharArray();
>>   }
>>
>>   @Benchmark
>>   public String repeatOneByteString(Data data) {
>>     return data.oneCharString.repeat(data.length);
>>   }
>>
>>   @Benchmark
>>   public String repeatOneChar(Data data) {
>>     return data.oneUtfCharString.repeat(data.length);
>>   }
>>
>>   @Benchmark
>>   public String replace(Data data){
>>     return data.stringToReplace.replace('z', 'b');
>>   }
>>
>>   @Benchmark
>>   public String replaceUtf(Data data){
>>     return data.utfStringToReplace.replace('я', 'ю');
>>   }
>>
>>   @Benchmark
>>   public StringBuilder newStringBuilderWithLength(Data data) {
>>     return new StringBuilder(data.length);
>>   }
>>
>>   @Benchmark
>>   public StringBuilder newStringBuilderWithString(Data data) {
>>     return new StringBuilder(data.string);
>>   }
>>
>>   @State(Scope.Thread)
>>   public static class Data {
>>
>>     @Param({"8", "64", "128", "1024"})
>>     private int length;
>>
>>     private String string;
>>     public String utfString;
>>     private final String oneCharString = "a";
>>     private final String oneUtfCharString = "ё";
>>     private String stringToReplace;
>>     private String utfStringToReplace;
>>
>>     @Setup
>>     public void setup() {
>>       string = oneCharString.repeat(length);
>>       utfString = oneUtfCharString.repeat(length);
>>
>>       stringToReplace = string + 'z';
>>       utfStringToReplace = utfString + 'я';
>>     }
>>   }
>>  }
>>  <uninit_array.patch>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: uninit_array.patch
Type: text/x-diff
Size: 4731 bytes
Desc: not available
URL: <https://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20200910/2b137530/uninit_array-0001.patch>


More information about the core-libs-dev mailing list