use of Unsafe for ASCII detection

Wed Jan 6 23:21:16 UTC 2016

Hi guys,

It turns out that much of the world's computers is busy doing UTF-8
decoding, so lots of people have expended much effort to make it
faster.  Much of that text is ASCII, so it's definitely worth having
special ASCII-only loops.  Like this:

  static boolean isAscii1(byte[] bytes) {
    for (byte b : bytes)
      if (b < 0) return false;
    return true;
  }

Hotspot does a pretty good job on that, but we can go faster if we
reach for Unsafe, processing data a long at a time.

  static boolean isAscii2(byte[] bytes) {
    long i, limit;
    for (i = ABASE, limit = ABASE + bytes.length - 7; i < limit; i += 8)
      if ((U.getLong(bytes, i) & 0x8080808080808080L) != 0)
        return false;
    for (limit = ABASE + bytes.length; i < limit; i += 1)
      if (U.getByte(bytes, i) < 0)
        return false;
    return true;
  }

We can also try the same strategy with ByteBuffer.getLong, hoping it
will do atomic reads for us:

  static boolean isAscii3(byte[] bytes) {
    ByteBuffer b = ByteBuffer.wrap(bytes);
    int i, limit;
    for (i = 0, limit = bytes.length - 7; i < limit; i += 8)
      if ((b.getLong(i) & 0x8080808080808080L) != 0)
        return false;
    for (limit = bytes.length; i < limit; i += 1)
      if (b.get(i) < 0)
        return false;
    return true;
  }

A stupid benchmark (sorry Aleksey - maybe you already have a jmh
version of this?)
http://cr.openjdk.java.net/~martin/Utf8Bench.java
gives me:
(jdk9)
Unsafe stride 8: 0.35626781581093897
Buffer stride 8: 0.4825880453758097
(jdk8)
Unsafe stride 8: 0.33818706070673143
Buffer stride 8: 1.9958109701624323

That is, the Unsafe code is 3x faster than the simple code.  The
ByteBuffer code used to be 2x slower and is now 2x faster - well done
- crowd goes wild!  I see it uses new and well-hidden
Unsafe.getLongUnaligned ... all the performance fanatics want to use
that, but y'all don't want to give it to us.  And we don't want to
allocate a temp ByteBuffer object.  What to do?  If you provided a
static variant of ByteBuffer.getLong for byte[], it would probably
have enough performance for us to live without Unsafe.  It's hard to
resist Unsafe when it gives 3x improvement in a performance hotspot.
Maybe C2 can optimize the simple loop even more than it is today?
Maybe we can add a static method on j.u.Arrays?  Maybe

static int asciiCount(byte[] bytes, int start, int end)

Still a performance hack, but far less of one.