use of Unsafe for ASCII detection
Martin Buchholz
martinrb at google.com
Wed Jan 6 23:21:16 UTC 2016
Hi guys,
It turns out that much of the world's computers is busy doing UTF-8
decoding, so lots of people have expended much effort to make it
faster. Much of that text is ASCII, so it's definitely worth having
special ASCII-only loops. Like this:
static boolean isAscii1(byte[] bytes) {
for (byte b : bytes)
if (b < 0) return false;
return true;
}
Hotspot does a pretty good job on that, but we can go faster if we
reach for Unsafe, processing data a long at a time.
static boolean isAscii2(byte[] bytes) {
long i, limit;
for (i = ABASE, limit = ABASE + bytes.length - 7; i < limit; i += 8)
if ((U.getLong(bytes, i) & 0x8080808080808080L) != 0)
return false;
for (limit = ABASE + bytes.length; i < limit; i += 1)
if (U.getByte(bytes, i) < 0)
return false;
return true;
}
We can also try the same strategy with ByteBuffer.getLong, hoping it
will do atomic reads for us:
static boolean isAscii3(byte[] bytes) {
ByteBuffer b = ByteBuffer.wrap(bytes);
int i, limit;
for (i = 0, limit = bytes.length - 7; i < limit; i += 8)
if ((b.getLong(i) & 0x8080808080808080L) != 0)
return false;
for (limit = bytes.length; i < limit; i += 1)
if (b.get(i) < 0)
return false;
return true;
}
A stupid benchmark (sorry Aleksey - maybe you already have a jmh
version of this?)
http://cr.openjdk.java.net/~martin/Utf8Bench.java
gives me:
(jdk9)
Unsafe stride 8: 0.35626781581093897
Buffer stride 8: 0.4825880453758097
(jdk8)
Unsafe stride 8: 0.33818706070673143
Buffer stride 8: 1.9958109701624323
That is, the Unsafe code is 3x faster than the simple code. The
ByteBuffer code used to be 2x slower and is now 2x faster - well done
- crowd goes wild! I see it uses new and well-hidden
Unsafe.getLongUnaligned ... all the performance fanatics want to use
that, but y'all don't want to give it to us. And we don't want to
allocate a temp ByteBuffer object. What to do? If you provided a
static variant of ByteBuffer.getLong for byte[], it would probably
have enough performance for us to live without Unsafe. It's hard to
resist Unsafe when it gives 3x improvement in a performance hotspot.
Maybe C2 can optimize the simple loop even more than it is today?
Maybe we can add a static method on j.u.Arrays? Maybe
static int asciiCount(byte[] bytes, int start, int end)
Still a performance hack, but far less of one.
More information about the core-libs-dev
mailing list