[foreign-memaccess+abi] RFR: Add a benchmark for strlen using Foreign Linker API
Maurizio Cimadamore
mcimadamore at openjdk.java.net
Wed Feb 17 15:32:14 UTC 2021
On Wed, 17 Feb 2021 15:17:31 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
> I've been spending some time looking into this issue:
>
> https://bugs.openjdk.java.net/browse/JDK-8261828
>
> And, to understand better the problem, I put together an hopefully comprehensive benchmark of the strlen function; it turns out that the strlen call itself is fast, and it's the conversion from Java to native string where the benchmark spends most of its time.
>
> While playing with the benchmark, I came up with alternative ways to do this conversion which greatly speed up the benchmark results, even surpassing (at least on my machine) what's possible with JNI. Jorn and I think that, for future references, it would be a good idea to include this benchmark in our suite.
>
> For the curious, there are many factors which make the default `CLinker::toCString` go slower than expected:
>
> * allocating a fresh segment on each iteration is expensive, because it takes two native call (malloc, memset), plus a bunch of CAS to reserve memory in the Java runtime
> * freeing the segment on each iteration is equally expensive - one native call (free), plus again, some CAS to unreserve memory
> * bulk copy is fast, but again requires a native call
> * all in all, we need 4 native calls per iteration (malloc, memset, copy, free) each adding cost when it comes to state transitions
>
> In other words, the advantage of JNI here is that (i) the level of safety provided by JNI is lower (e.g. the runtime doesn't need to track e.g. allocated memory, which segments do); also (ii) when we call the JNI-ified strlen function, the malloc, free, copy happen when we're in native code already - which means less state transitions are required.
>
> Note that we can completely eliminate (i) basically by creating restricted segments using CLinker::allocateMemoryRestricted (which does a plain malloc). We can also eliminate (ii) by creating *trivial* function descriptors for the calls to malloc/free/strlen, thereby removing cost associated with state transitions there. Both routes are tested in the benchmark (note that they both requires some willingness to embrace restricted methods). I have put together a variant which shows how NativeScope can be used to speed allocation up (which works really well for small strings, and is _not_ restricted).
>
> What are the lessons learned for plain `CLinker::toCString` ?
>
> * While the logic is generally fast, all state transitions and unsafe calls are killing performance in such a tight scenario; perhaps worth considering intrinsifying Unsafe::allocateMemory/copyMemory/setMemory/freeMemory.
> * The way to go, performance-wise is not to rely on the default malloc-based allocation. This is where Panama has a big edge over JNI, whose allocation logic is _fixed_. Proposals such as the one described in [1] will make passing custom allocators to `CLinker::toCString` easier, so that clients can decide which allocation strategy best fits their use case.
>
> [1] - https://inside.java/2021/01/25/memory-access-pulling-all-the-threads/
For the records, benchmark results on my machine look as below:
Benchmark (size) Mode Cnt Score Error Units
StrLenTest.jni_strlen 5 avgt 30 47.952 ? 0.899 ns/op
StrLenTest.jni_strlen 20 avgt 30 61.918 ? 2.668 ns/op
StrLenTest.jni_strlen 100 avgt 30 135.449 ? 1.454 ns/op
StrLenTest.panama_strlen 5 avgt 30 115.883 ? 2.194 ns/op
StrLenTest.panama_strlen 20 avgt 30 114.238 ? 1.896 ns/op
StrLenTest.panama_strlen 100 avgt 30 133.208 ? 2.056 ns/op
StrLenTest.panama_strlen_scope 5 avgt 30 34.467 ? 0.596 ns/op
StrLenTest.panama_strlen_scope 20 avgt 30 50.872 ? 2.357 ns/op
StrLenTest.panama_strlen_scope 100 avgt 29 89.604 ? 3.675 ns/op
StrLenTest.panama_strlen_unsafe 5 avgt 30 52.222 ? 3.626 ns/op
StrLenTest.panama_strlen_unsafe 20 avgt 30 55.937 ? 2.125 ns/op
StrLenTest.panama_strlen_unsafe 100 avgt 30 67.084 ? 0.671 ns/op
StrLenTest.panama_strlen_unsafe_trivial 5 avgt 30 29.762 ? 0.443 ns/op
StrLenTest.panama_strlen_unsafe_trivial 20 avgt 30 36.156 ? 0.369 ns/op
StrLenTest.panama_strlen_unsafe_trivial 100 avgt 30 59.830 ? 4.222 ns/op
-------------
PR: https://git.openjdk.java.net/panama-foreign/pull/454
More information about the panama-dev
mailing list