[PATCH] Allocate uninitialized byte[] to improve performance of String-related operations
Сергей Цыпанов
sergei.tsypanov at yandex.ru
Thu Sep 10 20:33:23 UTC 2020
Hi Paul,
thanks for looking into this!
> The update to AbstractStringBuilder is problematic because the uninitialized array could potentially be accessed through reflection.
>
> In other cases it seems you have slotted in the allocation at a low-level e.g. StringUTF16.newBytesFor. That makes it hard for me to evaluate the implications, and it's not clear to me it offers advantages in all cases, especially given the risks. The toCharArray/Utf allocation costs appear to be lost in the noise, at least from your benchmark results.
I've reverted suspicious changes you've mentioned along with those that gave no improvement.
> However, in the repeat of 1 case I would like to know why C2 does not elide the zeroing of the array, since it should know that the fill covers the whole array.
Should I report this case to hotspot-compiler-dev mailing list?
> Separately, renaming StringConcatHelper.new{Char}Array to StringConcatHelper.newUninitialized{Char}Array would be clearer, and perhaps moving to somewhere else.
After reverting StringLatin1/StringUTF16.toChars() we don't need this any more, so I've reverted it too.
Initially I decided to keep it within StringConcatHelper as the latter already contained Unsafe field.
I've reworked the patch, benchmark is in [1] and the results are below. The patch is attached.
I think these are useful changes bringing good improvement at low cost and no risk
as we are sure that we are dealing only with local variables.
Also pay attention, that StringLatin1.replace() has already been using StringConcatHelper.newArray()
and improvement is achieved with usage of System.arraycopy() instead of manual copying.
Is it ok to keep this along with other ca
Benchmark (length) before after Units
newString 8 12.805 ± 0.024 11.945 ± 0.124 ns/op
newString 64 48.165 ± 0.105 38.525 ± 0.041 ns/op
newString 128 79.746 ± 0.120 69.061 ± 0.081 ns/op
newString 1024 631.384 ± 1.068 521.811 ± 0.592 ns/op
repeatOneByteString 8 10.278 ± 0.017 9.751 ± 0.027 ns/op
repeatOneByteString 64 18.034 ± 0.017 15.173 ± 0.060 ns/op
repeatOneByteString 128 30.749 ± 0.491 22.568 ± 0.033 ns/op
repeatOneByteString 1024 109.009 ± 2.476 82.269 ± 0.101 ns/op
repeatOneCharString 8 22.141 ± 0.703 20.376 ± 0.171 ns/op
repeatOneCharString 64 44.796 ± 0.764 36.344 ± 0.059 ns/op
repeatOneCharString 128 57.736 ± 0.134 43.992 ± 0.287 ns/op
repeatOneCharString 1024 229.131 ± 1.211 176.311 ± 0.497 ns/op
replace 8 14.545 ± 0.019 13.426 ± 0.089 ns/op
replace 64 56.453 ± 0.096 36.115 ± 0.107 ns/op
replace 128 108.701 ± 1.201 65.387 ± 1.159 ns/op
replace 1024 820.524 ± 0.852 427.047 ± 0.510 ns/op
replaceUtf 8 17.480 ± 0.089 15.908 ± 0.175 ns/op
replaceUtf 64 44.982 ± 0.500 34.091 ± 0.209 ns/op
replaceUtf 128 66.433 ± 0.427 50.828 ± 0.645 ns/op
replaceUtf 1024 407.634 ± 11.850 302.786 ± 1.152 ns/op
With best regards,
Sergey Tsypanov
1. Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
public class MiscStringBenchmark {
@Benchmark
public String newString(Data data){
return new String(data.codepoints, 0, data.codepoints.length);
}
@Benchmark
public String repeatOneByteString(Data data) {
return data.oneCharString.repeat(data.length);
}
@Benchmark
public String repeatOneCharString(Data data) {
return data.oneUtfCharString.repeat(data.length);
}
@Benchmark
public String replace(Data data){
return data.stringToReplace.replace('z', 'b');
}
@Benchmark
public String replaceUtf(Data data){
return data.utfStringToReplace.replace('я', 'ю');
}
@State(Scope.Thread)
public static class Data {
@Param({"8", "64", "128", "1024"})
private int length;
private final String oneCharString = "a";
private final String oneUtfCharString = "ё";
private String stringToReplace;
private String utfStringToReplace;
private int[] codepoints;
@Setup
public void setup() {
String string = oneCharString.repeat(length);
String utfString = oneUtfCharString.repeat(length);
stringToReplace = string + 'z';
codepoints = stringToReplace.codePoints().toArray();
utfStringToReplace = utfString + 'я';
}
}
}
10.09.2020, 01:24, "Paul Sandoz" <paul.sandoz at oracle.com>:
> Hi Sergey,
>
> Unsafe.allocateUninitializedArray should be used with great care, so as we are confident that no array, covering uninitialized memory, leaks out. Or later we unintentionally introduce a problem through refactoring.
>
> The update to AbstractStringBuilder is problematic because the uninitialized array could potentially be accessed through reflection.
>
> In other cases it seems you have slotted in the allocation at a low-level e.g. StringUTF16.newBytesFor. That makes it hard for me to evaluate the implications, and it's not clear to me it offers advantages in all cases, especially given the risks. The toCharArray/Utf allocation costs appear to be lost in the noise, at least from your benchmark results.
>
> The method repeat seems a good candidate to focus on given the allocation and looping is obviously localized.
> However, in the repeat of 1 case I would like to know why C2 does not elide the zeroing of the array, since it should know that the fill covers the whole array.
> For the repeat > 2 we can excuse C2 for not seeing through to the ranges passed to System.arraycopy.
>
> Separately, renaming StringConcatHelper.new{Char}Array to StringConcatHelper.newUninitialized{Char}Array would be clearer, and perhaps moving to somewhere else.
>
> Paul.
>
>> On Sep 4, 2020, at 5:52 AM, Сергей Цыпанов <sergei.tsypanov at yandex.ru> wrote:
>>
>> Hello,
>>
>> currently jdk.internal.misc.Unsafe declares method allocateUninitializedArray(Class, int) returning uninitialized array of given type and length.
>>
>> Allocation of uninitialized arrays is faster especially for larger ones, so we could use them for cases
>> when we are sure that created array will be completely overwritten or is guarded by count field (e.g. in AbstractStringBuilder).
>>
>> I've exposed jdk.internal.misc.Unsafe.allocateUninitializedArray(Class, int)
>> via delegating method of sun.misc.Unsafe to measure creation of byte[] with benchmark [1]
>> and got those results:
>>
>> (length) Mode Cnt Score Error Units
>> constructor 0 avgt 50 7.639 ± 0.629 ns/op
>> constructor 10 avgt 50 9.448 ± 0.725 ns/op
>> constructor 100 avgt 50 21.158 ± 1.865 ns/op
>> constructor 1000 avgt 50 146.158 ± 9.836 ns/op
>> constructor 10000 avgt 50 916.321 ± 50.618 ns/op
>>
>> unsafe 0 avgt 50 8.057 ± 0.599 ns/op
>> unsafe 10 avgt 50 8.308 ± 0.907 ns/op
>> unsafe 100 avgt 50 12.232 ± 1.813 ns/op
>> unsafe 1000 avgt 50 37.679 ± 1.382 ns/op
>> unsafe 10000 avgt 50 78.896 ± 6.758 ns/op
>>
>> As a result I propose to add the following methods into StringConcatHelper
>>
>> @ForceInline
>> static byte[] newArray(int length) {
>> return (byte[]) UNSAFE.allocateUninitializedArray(byte.class, length);
>> }
>>
>> @ForceInline
>> static char[] newCharArray(int length) {
>> return (char[]) UNSAFE.allocateUninitializedArray(char.class, length);
>> }
>>
>> along with existing StringConcatHelper.newArray(long indexCoder) and utilize them in String-related operations
>> instead of conventional array creation with new-keyword.
>>
>> I've used benchmark [2] to measure impact on affected String-methods and found quite a good improvement:
>>
>> before after
>>
>> Benchmark (length) Score Error Score Error Units
>>
>> newStringBuilderWithLength 8 8.288 ± 0.411 5.656 ± 0.019 ns/op
>> newStringBuilderWithLength 64 12.954 ± 0.687 7.588 ± 0.009 ns/op
>> newStringBuilderWithLength 128 20.603 ± 0.412 10.446 ± 0.005 ns/op
>> newStringBuilderWithLength 1024 119.935 ± 2.383 35.452 ± 0.029 ns/op
>>
>> newStringBuilderWithString 8 19.721 ± 0.375 14.642 ± 0.052 ns/op
>> newStringBuilderWithString 64 34.006 ± 1.523 15.479 ± 0.031 ns/op
>> newStringBuilderWithString 128 36.697 ± 0.972 17.052 ± 0.133 ns/op
>> newStringBuilderWithString 1024 140.486 ± 6.396 85.156 ± 0.175 ns/op
>>
>> repeatOneByteString 8 11.340 ± 0.197 9.736 ± 0.051 ns/op
>> repeatOneByteString 64 20.859 ± 0.257 15.073 ± 0.024 ns/op
>> repeatOneByteString 128 36.311 ± 1.162 22.808 ± 0.198 ns/op
>> repeatOneByteString 1024 149.243 ± 3.083 82.839 ± 0.193 ns/op
>>
>> repeatOneChar 8 28.033 ± 0.615 20.377 ± 0.137 ns/op
>> repeatOneChar 64 56.399 ± 1.094 36.230 ± 0.051 ns/op
>> repeatOneChar 128 68.423 ± 5.647 44.157 ± 0.239 ns/op
>> repeatOneChar 1024 230.115 ± 0.312 179.126 ± 0.437 ns/op
>>
>> replace 8 14.684 ± 0.088 14.434 ± 0.057 ns/op
>> replace 64 56.811 ± 0.612 56.420 ± 0.050 ns/op
>> replace 128 112.694 ± 0.404 109.799 ± 1.202 ns/op
>> replace 1024 837.939 ± 0.855 818.802 ± 0.408 ns/op
>>
>> replaceUtf 8 17.802 ± 0.074 15.744 ± 0.094 ns/op
>> replaceUtf 64 45.754 ± 0.139 39.228 ± 0.864 ns/op
>> replaceUtf 128 67.170 ± 0.353 50.497 ± 1.218 ns/op
>> replaceUtf 1024 415.767 ± 6.829 297.831 ± 22.510 ns/op
>>
>> toCharArray 8 6.164 ± 0.033 6.128 ± 0.064 ns/op
>> toCharArray 64 10.960 ± 0.032 13.566 ± 0.802 ns/op
>> toCharArray 128 20.373 ± 0.013 20.823 ± 0.376 ns/op
>> toCharArray 1024 165.923 ± 0.098 164.362 ± 0.065 ns/op
>>
>> toCharArrayUTF 8 8.009 ± 0.067 7.778 ± 0.026 ns/op
>> toCharArrayUTF 64 11.112 ± 0.014 10.880 ± 0.010 ns/op
>> toCharArrayUTF 128 20.390 ± 0.014 20.103 ± 0.017 ns/op
>> toCharArrayUTF 1024 166.233 ± 0.076 163.827 ± 0.099 ns/op
>>
>> So the question is could we include the changes of attached patch into JDK?
>>
>> With best regards,
>> Sergey Tsypanov
>>
>> 1. Benchmark for array-allocation
>>
>> @BenchmarkMode(Mode.AverageTime)
>> @OutputTimeUnit(TimeUnit.NANOSECONDS)
>> @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
>> public class ArrayAllocationBenchmark {
>> private static Unsafe U;
>>
>> static {
>> try {
>> Field f = Unsafe.class.getDeclaredField("theUnsafe");
>> f.setAccessible(true);
>> U = (Unsafe) f.get(null);
>> } catch (Exception e) {
>> new RuntimeException(e);
>> }
>> }
>>
>> @Benchmark
>> public byte[] constructor(Data data) {
>> return new byte[data.length];
>> }
>>
>> @Benchmark
>> public byte[] unsafe(Data data) {
>> return (byte[]) U.allocateUninitializedArray(byte.class, data.length);
>> }
>>
>> @State(Scope.Thread)
>> public static class Data {
>> @Param({"0", "10", "100", "1000", "10000"})
>> private int length;
>> }
>> }
>>
>> 2. Benchmark for String-method measurements
>>
>> @BenchmarkMode(Mode.AverageTime)
>> @OutputTimeUnit(TimeUnit.NANOSECONDS)
>> @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
>> public class MiscStringBenchmark {
>>
>> @Benchmark
>> public char[] toCharArrayLatin1(Data data) {
>> return data.string.toCharArray();
>> }
>>
>> @Benchmark
>> public char[] toCharArrayUTF(Data data) {
>> return data.utfString.toCharArray();
>> }
>>
>> @Benchmark
>> public String repeatOneByteString(Data data) {
>> return data.oneCharString.repeat(data.length);
>> }
>>
>> @Benchmark
>> public String repeatOneChar(Data data) {
>> return data.oneUtfCharString.repeat(data.length);
>> }
>>
>> @Benchmark
>> public String replace(Data data){
>> return data.stringToReplace.replace('z', 'b');
>> }
>>
>> @Benchmark
>> public String replaceUtf(Data data){
>> return data.utfStringToReplace.replace('я', 'ю');
>> }
>>
>> @Benchmark
>> public StringBuilder newStringBuilderWithLength(Data data) {
>> return new StringBuilder(data.length);
>> }
>>
>> @Benchmark
>> public StringBuilder newStringBuilderWithString(Data data) {
>> return new StringBuilder(data.string);
>> }
>>
>> @State(Scope.Thread)
>> public static class Data {
>>
>> @Param({"8", "64", "128", "1024"})
>> private int length;
>>
>> private String string;
>> public String utfString;
>> private final String oneCharString = "a";
>> private final String oneUtfCharString = "ё";
>> private String stringToReplace;
>> private String utfStringToReplace;
>>
>> @Setup
>> public void setup() {
>> string = oneCharString.repeat(length);
>> utfString = oneUtfCharString.repeat(length);
>>
>> stringToReplace = string + 'z';
>> utfStringToReplace = utfString + 'я';
>> }
>> }
>> }
>> <uninit_array.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uninit_array.patch
Type: text/x-diff
Size: 4731 bytes
Desc: not available
URL: <https://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20200910/2b137530/uninit_array-0001.patch>
More information about the core-libs-dev
mailing list