[foreign-memaccess+abi] RFR: Add a benchmark for strlen using Foreign Linker API

Jorn Vernee jvernee at openjdk.java.net
Wed Feb 17 16:47:50 UTC 2021


On Wed, 17 Feb 2021 15:17:31 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> I've been spending some time looking into this issue:
> 
> https://bugs.openjdk.java.net/browse/JDK-8261828
> 
> And, to understand better the problem, I put together an hopefully comprehensive benchmark of the strlen function; it turns out that the strlen call itself is fast, and it's the conversion from Java to native string where the benchmark spends most of its time.
> 
> While playing with the benchmark, I came up with alternative ways to do this conversion which greatly speed up the benchmark results, even surpassing (at least on my machine) what's possible with JNI. Jorn and I think that, for future references, it would be a good idea to include this benchmark in our suite.
> 
> For the curious, there are many factors which make the default `CLinker::toCString` go slower than expected: 
> 
> * allocating a fresh segment on each iteration is expensive, because it takes two native call (malloc, memset), plus a bunch of CAS to reserve memory in the Java runtime
> * freeing the segment on each iteration is equally expensive - one native call (free), plus again, some CAS to unreserve memory 
> * bulk copy is fast, but again requires a native call
> * all in all, we need 4 native calls per iteration (malloc, memset, copy, free) each adding cost when it comes to state transitions
> 
> In other words, the advantage of JNI here is that (i) the level of safety provided by JNI is lower (e.g. the runtime doesn't need to track e.g. allocated memory, which segments do); also (ii) when we call the JNI-ified strlen function, the malloc, free, copy happen when we're in native code already - which means less state transitions are required.
> 
> Note that we can completely eliminate (i) basically by creating restricted segments using CLinker::allocateMemoryRestricted (which does a plain malloc). We can also eliminate (ii) by creating *trivial* function descriptors for the calls to malloc/free/strlen, thereby removing cost associated with state transitions there. Both routes are tested in the benchmark (note that they both requires some willingness to embrace restricted methods). I have put together a variant which shows how NativeScope can be used to speed allocation up (which works really well for small strings, and is _not_ restricted).
> 
> What are the lessons learned for plain `CLinker::toCString` ? 
> 
> * While the logic is generally fast, all state transitions and unsafe calls are killing performance in such a tight scenario; perhaps worth considering intrinsifying Unsafe::allocateMemory/copyMemory/setMemory/freeMemory.
> * The way to go, performance-wise is not to rely on the default malloc-based allocation. This is where Panama has a big edge over JNI, whose allocation logic is _fixed_. Proposals such as the one described in [1] will make passing custom allocators to `CLinker::toCString` easier, so that clients can decide which allocation strategy best fits their use case.
> 
> [1] - https://inside.java/2021/01/25/memory-access-pulling-all-the-threads/

Looks good, besides a warning in the native code (see inline comment).

Numbers on my Windows machine look like this:

Benchmark                                (size)  Mode  Cnt    Score     Error  Units
StrLenTest.jni_strlen                         5  avgt   30  158.912 �  11.886  ns/op
StrLenTest.jni_strlen                        20  avgt   30  206.928 �   3.980  ns/op
StrLenTest.jni_strlen                       100  avgt   30  405.025 �  10.837  ns/op
StrLenTest.panama_strlen                      5  avgt   30  211.031 �   3.851  ns/op
StrLenTest.panama_strlen                     20  avgt   30  234.639 �  27.133  ns/op
StrLenTest.panama_strlen                    100  avgt   30  235.827 �   4.062  ns/op
StrLenTest.panama_strlen_scope                5  avgt   30   65.311 �   4.032  ns/op
StrLenTest.panama_strlen_scope               20  avgt   30   96.457 �   6.595  ns/op
StrLenTest.panama_strlen_scope              100  avgt   24  157.490 �  24.683  ns/op
StrLenTest.panama_strlen_unsafe               5  avgt   30  135.766 �   1.252  ns/op
StrLenTest.panama_strlen_unsafe              20  avgt   30  154.713 �   4.510  ns/op
StrLenTest.panama_strlen_unsafe             100  avgt   30  152.058 �   2.003  ns/op
StrLenTest.panama_strlen_unsafe_trivial       5  avgt   30  119.331 �   6.429  ns/op
StrLenTest.panama_strlen_unsafe_trivial      20  avgt   30  127.749 �   1.932  ns/op
StrLenTest.panama_strlen_unsafe_trivial     100  avgt   30  129.092 �   0.967  ns/op

(timings seem a bit jittery)

test/micro/org/openjdk/bench/jdk/incubator/foreign/libStrLen.c line 32:

> 30: JNIEXPORT jint JNICALL Java_org_openjdk_bench_jdk_incubator_foreign_StrLenTest_strlen(JNIEnv *const env, const jclass cls, const jstring text) {
> 31:     const char *str = (*env)->GetStringUTFChars(env, text, NULL);
> 32:     int len = strlen(str);

I'm seeing a warning here: `conversion from 'size_t' to 'int', possible loss of data` which makes the compilation fail. An explicit cast to `int` fixes that.

-------------

Marked as reviewed by jvernee (Committer).

PR: https://git.openjdk.java.net/panama-foreign/pull/454


More information about the panama-dev mailing list