use of Unsafe for ASCII detection

Paul Sandoz paul.sandoz at oracle.com
Thu Jan 7 09:32:27 UTC 2016


Hi Martin,

It’s possible to do that with a VarHandle, see the proposed MethodHandles.byteArrrayViewHandle factory method:

http://cr.openjdk.java.net/~psandoz/jdk9/varhandles/specdiff/java/lang/invoke/MethodHandles-report.html#method:byteArrayViewVarHandle(java.lang.Class, boolean, boolean) <http://cr.openjdk.java.net/~psandoz/jdk9/varhandles/specdiff/java/lang/invoke/MethodHandles-report.html#method:byteArrayViewVarHandle(java.lang.Class,%20boolean,%20boolean)>

I have not tested in your context but i would expect using a VarHandle should produce almost identical generated code as direct use of Unsafe, any bounds checks will be dominated by those of the loop bounds. If not it’s a bug!

Some recent HotSpot compiler work by Roland should ensure that unrolled loops using Unsafe will be on par with direct array access. Once that gets rolled into jdk9/dev (i don’t think it is there yet) you might see another incremental improvement in performance.

—

Longer term i think we can do better. A low-level, and safe, vectorization API would provide support for widths wider that 64 bits on supported platforms. A stream implementation utilising that API would be even better, so one could simply express it as Stream.of(bytes).allMatch(b -> b & 0x80 != 0), but that is much harder.

Paul.


> On 7 Jan 2016, at 00:21, Martin Buchholz <martinrb at google.com> wrote:
> 
> Hi guys,
> 
> It turns out that much of the world's computers is busy doing UTF-8
> decoding, so lots of people have expended much effort to make it
> faster.  Much of that text is ASCII, so it's definitely worth having
> special ASCII-only loops.  Like this:
> 
>  static boolean isAscii1(byte[] bytes) {
>    for (byte b : bytes)
>      if (b < 0) return false;
>    return true;
>  }
> 
> Hotspot does a pretty good job on that, but we can go faster if we
> reach for Unsafe, processing data a long at a time.
> 
>  static boolean isAscii2(byte[] bytes) {
>    long i, limit;
>    for (i = ABASE, limit = ABASE + bytes.length - 7; i < limit; i += 8)
>      if ((U.getLong(bytes, i) & 0x8080808080808080L) != 0)
>        return false;
>    for (limit = ABASE + bytes.length; i < limit; i += 1)
>      if (U.getByte(bytes, i) < 0)
>        return false;
>    return true;
>  }
> 
> We can also try the same strategy with ByteBuffer.getLong, hoping it
> will do atomic reads for us:
> 
>  static boolean isAscii3(byte[] bytes) {
>    ByteBuffer b = ByteBuffer.wrap(bytes);
>    int i, limit;
>    for (i = 0, limit = bytes.length - 7; i < limit; i += 8)
>      if ((b.getLong(i) & 0x8080808080808080L) != 0)
>        return false;
>    for (limit = bytes.length; i < limit; i += 1)
>      if (b.get(i) < 0)
>        return false;
>    return true;
>  }
> 
> A stupid benchmark (sorry Aleksey - maybe you already have a jmh
> version of this?)
> http://cr.openjdk.java.net/~martin/Utf8Bench.java
> gives me:
> (jdk9)
> Unsafe stride 8: 0.35626781581093897
> Buffer stride 8: 0.4825880453758097
> (jdk8)
> Unsafe stride 8: 0.33818706070673143
> Buffer stride 8: 1.9958109701624323
> 
> That is, the Unsafe code is 3x faster than the simple code.  The
> ByteBuffer code used to be 2x slower and is now 2x faster - well done
> - crowd goes wild!  I see it uses new and well-hidden
> Unsafe.getLongUnaligned ... all the performance fanatics want to use
> that, but y'all don't want to give it to us.  And we don't want to
> allocate a temp ByteBuffer object.  What to do?  If you provided a
> static variant of ByteBuffer.getLong for byte[], it would probably
> have enough performance for us to live without Unsafe.  It's hard to
> resist Unsafe when it gives 3x improvement in a performance hotspot.
> Maybe C2 can optimize the simple loop even more than it is today?
> Maybe we can add a static method on j.u.Arrays?  Maybe
> 
> static int asciiCount(byte[] bytes, int start, int end)
> 
> Still a performance hack, but far less of one.




More information about the core-libs-dev mailing list