<div dir="ltr">Hello Wenshao and the core libraries mailing list,<div>First, I want to talk about the roles of Unsafe and BALE.</div><div>Unsafe itself is a collection of JVM-specific APIs that must be guarded from dependent Java code. The set/getXxx methods are one set of such APIs that directly utilizes unaligned reads and writes on supported platforms.</div><div>This set of APIs are already exposed to regular Java users via 2 public APIs: ByteBuffer and VarHandle (MethodHandles.byteArrayViewVarHandle, used by BALE), both of which are invoking the Unsafe API, and their overhead can be eliminated by JIT.</div><div><br></div><div>Currently, we have a Vector API in incubation, which ensures vectorization of some operations; our usage of BALE is similar, that we wish to accomplish SLP reliably.<br></div><div><br></div><div>I took some time to look through where you use BALE to speed up writing: I believe that performing the optimization at JIT level would be better if possible, for the JIT knows the best way to group bytes together to write at a given offset on an arbitrary platform (such as a big-endian one). Similar to the Vector API, I think we might add new internal APIs like: </div><div>public static void write(byte[] arr, int offset, int b0, int b1, ....)</div><div>where we declare explicitly that we write multiple bytes at once so we know JIT will reliably optimize our writes (if JIT have trouble optimizing SLP like auto-vectorization)</div><div><br></div><div>Another reason JIT is better than reusing BALE/ByteBuffer is that their resulting values are "meaningful"; i.e. the results are used directly and the read/writes are 2-way. In our case, however, we are only interested in faster writing, and there are multiple ways to group the writes, so I don't think Java-based APIs will be useful.</div><div><br></div><div>For JVM startup, I recommend you to run a simple Hello World with -Xlog:class+init flags to see what classes are initialized before java/lang/invoke/MethodHandleImpl. In general, you shouldn't initialize java.lang.invoke classes (lambdas, VarHandle, MethodHandle), such as by keeping them in fields, in wrappers, String, collection, and reflection. They can use lambdas in their methods, but those lambdas cannot be called before java.lang.invoke is ready.</div><div><br></div><div>Best,</div><div>Chen Liang</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Oct 8, 2023 at 5:14 PM 温绍锦(高铁) <<a href="mailto:shaojin.wensj@alibaba-inc.com" target="_blank">shaojin.wensj@alibaba-inc.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div style="line-height:1.7;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14px;color:rgb(0,112,192)"><div style="clear:both"><span>Should we allow use Unsafe or ByteArrayLittleEndian for trivial byte[] writes in core-libs?</span></div><div style="clear:both"><br><div style="clear:both">There is already code that uses ByteArrayLittleEndian to improve performance, such as:</div><div style="clear:both">```java</div><div style="clear:both">package java.util;</div><div style="clear:both"><br></div><div style="clear:both">class UUID {</div><div style="clear:both"> <span style="color:rgb(0,112,192);font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;white-space:normal;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">   <span> </span></span>public String toString() {</div><div style="clear:both">     <span style="color:rgb(0,112,192);font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;white-space:normal;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">   <span> </span></span>// ...</div><div style="clear:both">        ByteArrayLittleEndian.setInt(</div><div style="clear:both">                buf,</div><div style="clear:both">                9,</div><div style="clear:both">                HexDigits.packDigits(((int) msb) >> 24, ((int) msb) >> 16));</div><div style="clear:both">        // ...</div><div style="clear:both"> <span style="color:rgb(0,112,192);font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;white-space:normal;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">   <span> </span></span>}</div><div style="clear:both">}</div><div style="clear:both">```</div><div style="clear:both"><br></div><div style="clear:both">There are examples of using ByteArrayLittleEndian and then removing it because it caused the JVM to start slowly (we can use Unsafe.putShortUnaligned to solve the problem of slow JVM startup)</div><div style="clear:both">```java</div><div style="clear:both">package java.lang;</div><div style="clear:both">class StringLatin1 {</div><div style="clear:both">    private static void writeDigitPair(byte[] buf, int charPos, int value) {</div><div style="clear:both">        short pair = DecimalDigits.digitPair(value);</div><div style="clear:both">        // UNSAFE.putShortUnaligned(buf, ARRAY_BYTE_BASE_OFFSET + charPos, pair);</div><div style="clear:both">        buf[charPos] = (byte)(pair);</div><div style="clear:both">        buf[charPos + 1] = (byte)(pair >> 8);</div><div style="clear:both">    } </div><div style="clear:both">}</div><div style="clear:both">```</div><div style="clear:both"><br></div><div style="clear:both">Here is an example in the PR Review is disagreeing with the use of ByteArrayLittleEndian</div><div style="clear:both"><a href="https://github.com/openjdk/jdk/pull/15768" target="_blank">https://github.com/openjdk/jdk/pull/15768</a></div><div style="clear:both">```java</div><div style="clear:both">package java.util;</div><div style="clear:both">class HexFormat {</div><div style="clear:both"> <span style="color:rgb(0,112,192);font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;white-space:normal;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">   <span> </span></span>String formatOptDelimiter(byte[] bytes, int fromIndex, int toIndex) {</div><div style="clear:both">     // ...</div><div style="clear:both">        short pair = HexDigits.digitPair(bytes[fromIndex + i], ucase);</div><div style="clear:both">        int pos = i * 2;</div><div style="clear:both">        rep[pos] = (byte)pair;</div><div style="clear:both">        rep[pos + 1] = (byte)(pair >>> 8);</div><div style="clear:both">        // ByteArrayLittleEndian.setShort(rep, pos, pair);</div><div style="clear:both"> <span style="color:rgb(0,112,192);font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;white-space:normal;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">   <span> </span></span>}</div><div style="clear:both">}</div><div style="clear:both">```</div><div style="clear:both"><br></div><div style="clear:both">This is another example of PR Review disagreeing with the use of ByteArrayLittleEndian.</div><div style="clear:both"><a href="https://github.com/openjdk/jdk/pull/15990" target="_blank">https://github.com/openjdk/jdk/pull/15990</a></div><div style="clear:both">```java</div><div style="clear:both">package java.lang;</div><div style="clear:both">class AbstractStringBuilder {</div><div style="clear:both"> <span style="color:rgb(0,112,192);font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;white-space:normal;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">   <span> </span></span>static final class Constants {</div><div style="clear:both">        static final int NULL_LATIN1;</div><div style="clear:both">        static final int NULL_UTF16;</div><div style="clear:both">        static {</div><div style="clear:both">            byte[] bytes4 = new byte[] {'t', 'r', 'u', 'e'};</div><div style="clear:both">            byte[] bytes8 = new byte[8];</div><div style="clear:both">            NULL_LATIN1 = ByteArrayLittleEndian.getInt(bytes4, 0);</div><div style="clear:both">            StringLatin1.inflate(bytes4, 0, bytes8, 0, 4);</div><div style="clear:both">            NULL_UTF16 = ByteArrayLittleEndian.getLong(bytes8, 0);</div><div style="clear:both">        }</div><div style="clear:both">    }</div><div style="clear:both"><br></div><div style="clear:both">    private AbstractStringBuilder appendNull() {</div><div style="clear:both">        ensureCapacityInternal(count + 4);</div><div style="clear:both">        int count = this.count;</div><div style="clear:both">        byte[] val = this.value;</div><div style="clear:both">        if (isLatin1()) {</div><div style="clear:both">            ByteArrayLittleEndian.setInt(val, count, Constants.NULL_LATIN1);</div><div style="clear:both">        } else {</div><div style="clear:both">            ByteArrayLittleEndian.setLong(val, count << 1, Constants.NULL_UTF16);</div><div style="clear:both">        }</div><div style="clear:both">        this.count = count + 4;</div><div style="clear:both">        return this;</div><div style="clear:both">    }</div><div style="clear:both">}</div><div style="clear:both">```</div><div style="clear:both"><br></div><span>In these examples, using Unsafe/ByteArrayLittleEndian significantly improves performance. If JIT automatic optimization is the best solution, but SuperWord Level Parallelism (SLP) does not currently support this optimization, what are our recommendations? <span>What scenarios cannot use Unsafe, and what scenarios cannot use ByteArrayLittleEndian?</span></span></div></div></div></blockquote></div>