RFR: 8353741: Improve UUID.toString performance by using SIMD within a register instead of table lookup
Johannes Graham
duke at openjdk.org
Fri Apr 4 16:47:19 UTC 2025
On Mon, 6 Jan 2025 13:18:50 GMT, Shaojin Wen <swen at openjdk.org> wrote:
> Improve the performance of UUID::toString by using Long.expand and SWAR (SIMD within a register) instead of table lookup. Eliminating the table lookup can also avoid the performance degradation problem when the cache misses.
By stepping through the code of `Long.expand`, and substituting in the constants, I come up with this:
static long expandNibbles(long i){
// Inlined version of Long.expand(i,0x0F0F_0F0F_0F0F_0F0FL)
long t = i << 16;
i = (i & ~0xFFFF00000000L) | (t & 0xFFFF00000000L);
t = i << 8;
i = (i & ~0xFF000000FF0000L) | (t & 0xFF000000FF0000L);
t = i << 4;
i = (i & ~0xF000F000F000F00L) | (t & 0xF000F000F000F00L);
return i & 0x0F0F_0F0F_0F0F_0F0FL;
}
This looks like it might actually do better than *Method 2*. If inlining and constant folding is happening in the non-intrinsic `Long.expand` I would imagine it would perform comparably to this.
The non-intrinsified java code should be able to run as quickly as the hand-inlined one.
I think I've found an issue that prevents the code from being constant-folded as expected. C2 seems to not do constant-folding of xor nodes.
See https://github.com/openjdk/jdk/pull/23089 for an attempt at addressing this.
There are no XOR nodes in expandNibbles

-------------
PR Comment: https://git.openjdk.org/jdk/pull/22928#issuecomment-2584577398
PR Comment: https://git.openjdk.org/jdk/pull/22928#issuecomment-2588342173
PR Comment: https://git.openjdk.org/jdk/pull/22928#issuecomment-2590840422
More information about the core-libs-dev
mailing list