[foreign] some JMH benchmarks
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue Sep 18 09:43:43 UTC 2018
No discernible difference with static (in fact getpid was already using
static but I forgot that in getpid). I wouldn't expect the JIT not to
optimize that (given that the receiver was always the same value created
once at the beginning of the class' lifecycle).
Maurizio
On 18/09/18 01:51, Samuel Audet wrote:
> Thanks! Also, the native declaration for exp() isn't "static". That
> might impose a large overhead...
>
> 2018年9月18日(火) 0:59 Maurizio Cimadamore <maurizio.cimadamore at oracle.com
> <mailto:maurizio.cimadamore at oracle.com>>:
>
> For the records, here's what I get for all the three benchmarks if I
> compile the JNI code with -O3:
>
> Benchmark Mode Cnt Score Error Units
> PanamaBenchmark.testJNIExp thrpt 5 28575269.294 ±
> 1907726.710 ops/s
> PanamaBenchmark.testJNIJavaQsort thrpt 5 372148.433 ±
> 27178.529
> ops/s
> PanamaBenchmark.testJNIPid thrpt 5 59240069.011 ±
> 403881.697
> ops/s
>
> The first and second benchmarks get faster and very close to the
> 'direct' optimization numbers in [1]. Surprisingly, the last
> benchmark
> (getpid) is quite slower. I've been able to reproduce across multiple
> runs; for that benchmark omitting O3 seems to be the achieve best
> results, not sure why. It starts of faster (around in the first
> couple
> of warmup iterations, but then it goes slower in all the other runs -
> presumably it interacts badly with the C2 generated code. For
> instance,
> this is a run with O3 enabled:
>
> # Run progress: 66.67% complete, ETA 00:01:40
> # Fork: 1 of 1
> # Warmup Iteration 1: 65182202.653 ops/s
> # Warmup Iteration 2: 64900639.094 ops/s
> # Warmup Iteration 3: 59314945.437 ops/s
> <---------------------------------
> # Warmup Iteration 4: 59269007.877 ops/s
> # Warmup Iteration 5: 59239905.163 ops/s
> Iteration 1: 59300748.074 ops/s
> Iteration 2: 59249666.044 ops/s
> Iteration 3: 59268597.051 ops/s
> Iteration 4: 59322074.572 ops/s
> Iteration 5: 59059259.317 ops/s
>
> And this is a run with O3 disabled:
>
> # Run progress: 0.00% complete, ETA 00:01:40
> # Fork: 1 of 1
> # Warmup Iteration 1: 55882128.787 ops/s
> # Warmup Iteration 2: 53102361.751 ops/s
> # Warmup Iteration 3: 66964755.699 ops/s
> <---------------------------------
> # Warmup Iteration 4: 66414428.355 ops/s
> # Warmup Iteration 5: 65328475.276 ops/s
> Iteration 1: 64229192.993 ops/s
> Iteration 2: 65191719.319 ops/s
> Iteration 3: 65352022.471 ops/s
> Iteration 4: 65152090.426 ops/s
> Iteration 5: 65320545.712 ops/s
>
>
> In both cases, the 3rd warmup execution sees a performance jump -
> with
> O3, the jump is backwards, w/o O3 the jump is forward, which is quite
> typical for a JMH benchmark as C2 optimization will start to kick in.
>
> For these reasons, I'm reluctant to update my benchmark numbers to
> reflect the O3 behavior (although I agree that, since the Hotspot
> code
> is compiled with that optimization it would make more sense to use
> that
> as a reference).
>
> Maurizio
>
> [1] -
> http://cr.openjdk.java.net/~mcimadamore/panama/foreign-jmh.txt
> <http://cr.openjdk.java.net/%7Emcimadamore/panama/foreign-jmh.txt>
>
>
>
> On 17/09/18 16:18, Maurizio Cimadamore wrote:
> >
> >
> > On 17/09/18 15:08, Samuel Audet wrote:
> >> Yes, the blackhole or the random number doesn't make any
> difference,
> >> but not calling gcc with -O3 does. Running the compiler with
> >> optimizations on is pretty common, but they are not enabled by
> default.
> > A bit better
> >
> > PanamaBenchmark.testMethod thrpt 5 28018170.076 ±
> 8491668.248 ops/s
> >
> > But not much of a difference (I did not expected much, as the
> body of
> > the native method is extremely simple).
> >
> > Maurizio
> >>
> >> Samuel
> >>
> >>
> >> 2018年9月17日(月) 21:37 Maurizio Cimadamore
> >> <maurizio.cimadamore at oracle.com
> <mailto:maurizio.cimadamore at oracle.com>
> >> <mailto:maurizio.cimadamore at oracle.com
> <mailto:maurizio.cimadamore at oracle.com>>>:
> >>
> >> Hi Samuel,
> >> I was planning to upload the benchmark IDE project in the near
> >> future (I
> >> need to clean that up a bit, so that it can be opened at ease):
> >>
> >> My getpid example is like this; this is the Java decl:
> >>
> >> public class GetPid {
> >>
> >> static {
> >> System.loadLibrary("getpid");
> >> }
> >>
> >> native static long getpid();
> >>
> >> native double exp(double base);
> >> }
> >>
> >> This is the JNI code:
> >>
> >> JNIEXPORT jlong JNICALL Java_org_panama_GetPid_getpid
> >> (JNIEnv *env, jobject recv) {
> >> return getpid();
> >> }
> >>
> >> JNIEXPORT jdouble JNICALL Java_org_panama_GetPid_exp
> >> (JNIEnv *env, jobject recv, jdouble arg) {
> >> return exp(arg);
> >> }
> >>
> >> And this is the benchmark:
> >>
> >> class PanamaBenchmark {
> >> static GetPid pidlib = new GetPid();
> >>
> >> @Benchmark
> >> public long testJNIPid() {
> >> return pidlib.getpid();
> >> }
> >>
> >> @Benchmark
> >> public double testJNIExp() {
> >> return pidlib.exp(10d);
> >> }
> >> }
> >>
> >>
> >> I think this should be rather standard?
> >>
> >> I'm on Ubuntu 16.04.1, and using GCC 5.4.0. The command I
> use to
> >> compile
> >> the C lib is this:
> >>
> >> gcc -I<path to jni,h> -l<path to jni lib> -shared -o
> libgetpid.so
> >> -fPIC
> >> GetPid.c
> >>
> >> One difference I see between our two examples is the use of
> >> BlackHole.
> >> In my bench, I'm just returning the call to 'exp' - which
> should be
> >> equivalent, and, actually, preferred, as described here:
> >>
> >>
> http://hg.openjdk.java.net/code-tools/jmh/file/3769055ad883/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_09_Blackholes.java#l51
> >>
> >> Another minor difference I see is that I pass a constant
> argument,
> >> while
> >> you generate a random number on each iteration.
> >>
> >> I tried to cut and paste your benchmark and I got this:
> >>
> >> Benchmark Mode Cnt Score Error Units
> >> PanamaBenchmark.testMethod thrpt 5 26362701.827 ±
> >> 1357012.981 ops/s
> >>
> >>
> >> Which looks exactly the same as what I've got. So, for whatever
> >> reason,
> >> my machine seems to be slower than the one you are using. For
> >> what is
> >> worth, this website [1] seems to confirm the difference.
> While clock
> >> speeds are similar, your machine has more Ghz in Turbo
> boost mode
> >> and
> >> it's 3-4 years newer than mine, so I'd expect that to make a
> >> difference
> >> in terms of internal optimizations etc. Note that I'm able
> to beat
> >> the
> >> numbers of my workstation using my laptop which sports a
> slightly
> >> higher
> >> frequency and only has 2 cores and 8G of RAM.
> >>
> >> [1] -
> >>
> https://www.cpubenchmark.net/compare/Intel-Xeon-E5-2673-v4-vs-Intel-Xeon-E5-2665/2888vs1439
> >>
> >> Maurizio
> >>
> >>
> >>
> >> On 17/09/18 11:00, Samuel Audet wrote:
> >> > Thanks for the figures Maurizio! It's finally good to be
> >> speaking in
> >> > numbers. :)
> >> >
> >> > However, you're not providing a lot of details about how you
> >> actually
> >> > ran the experiments. So I've decided to run a JMH
> benchmark on
> >> what we
> >> > get by default with JavaCPP and this declaration:
> >> >
> >> > @Platform(include = "math.h")
> >> > public class MyBenchmark {
> >> > static { Loader.load(); }
> >> >
> >> > @NoException
> >> > public static native double exp(double x);
> >> >
> >> > @State(Scope.Thread)
> >> > public static class MyState {
> >> > double x;
> >> >
> >> > @Setup(Level.Iteration)
> >> > public void setupMethod() {
> >> > x = Math.random();
> >> > }
> >> > }
> >> >
> >> > @Benchmark
> >> > public void testMethod(MyState s, Blackhole bh) {
> >> > bh.consume(exp(s.x));
> >> > }
> >> > }
> >> >
> >> > The relevant portion of generated JNI looks like this:
> >> >
> >> > JNIEXPORT jdouble JNICALL
> >> Java_org_sample_MyBenchmark_exp(JNIEnv*
> >> > env, jclass cls, jdouble arg0) {
> >> > jdouble rarg = 0;
> >> > double rval = exp(arg0);
> >> > rarg = (jdouble)rval;
> >> > return rarg;
> >> > }
> >> >
> >> > And with access to just 2 virtual cores of an Intel(R)
> Xeon(R) CPU
> >> > E5-2673 v4 @ 2.30GHz and 8 GB of RAM on the cloud (so
> probably
> >> slower
> >> > than your E5-2665 @ 2.40GHz) running Ubuntu 14.04 with
> GCC 4.9 and
> >> > OpenJDK 8, I get these numbers:
> >> > Benchmark Mode Cnt Score Error Units
> >> > MyBenchmark.testMethod thrpt 25 37183556.094 ± 460795.746
> >> ops/s
> >> >
> >> > I'm not sure how that compares with your numbers exactly,
> but it
> >> does
> >> > seem to me that what you get for JNI is a bit low. If you
> could
> >> > provide more details about how to reproduce your results,
> that
> >> would
> >> > be great.
> >> >
> >> > Samuel
> >> >
> >> >
> >> > On 09/14/2018 10:19 PM, Maurizio Cimadamore wrote:
> >> >> Hi,
> >> >> over the last few weeks I've been busy playing with
> Panama and
> >> >> assessing performances with JMH. For those just
> interested in raw
> >> >> numbers, the results of my explorations can be found
> here [1].
> >> But as
> >> >> all benchmarks, I think it's better to spend few words to
> >> understand
> >> >> what these numbers actually _mean_.
> >> >>
> >> >> To evaluate the performances of Panama I have first
> created a
> >> >> baseline using JNI - more specifically I wanted to assess
> >> >> performances of three calls (all part of the C std library),
> >> namely
> >> >> `getpid`, `exp` and `qsort`.
> >> >>
> >> >> The first example is the de facto benchmark for FFIs -
> since it
> >> does
> >> >> relatively little computation, it is a good test to
> measure the
> >> >> 'latency' of the FFI approach (e.g. how long does it
> take to
> >> go to
> >> >> native). The second example is also relatively simple,
> but the
> >> this
> >> >> time the function takes a double argument. The third test is
> >> akin to
> >> >> an FFI torture test, since not only it passes
> substantially more
> >> >> arguments (4) but one of these arguments is also a
> callback - a
> >> >> pointer to a function that is used to sort the contents
> of the
> >> input
> >> >> array.
> >> >>
> >> >> As expected, the first batch of JNI results confirms our
> >> >> expectations: `getpid` is the fastest, followed by
> `exp`, and
> >> then
> >> >> followed by `qsort`. Note that qsort is not even close in
> >> terms of
> >> >> raw numbers to the other two tests - that's because, to
> sort the
> >> >> array we need to do (N * log N) upcalls into Java. In the
> >> benchmark,
> >> >> N = 8 and we do the upcalls using the JNI function
> >> >> JNIEnv::CallIntMethod.
> >> >>
> >> >> Now let's examine the second batch of results; these call
> >> `getpid`,
> >> >> `exp` and `qsort` using Panama. The numbers here are
> considerably
> >> >> lower than the JNI ones for all the three benchmark -
> although
> >> the
> >> >> first two seem to be the most problematic. To explain these
> >> results
> >> >> we need to peek under the hood. Panama implements
> foreign calls
> >> >> through a so called 'universal adapter' which, given a
> calling
> >> scheme
> >> >> and a bunch of arguments (machine words) shuffles these
> >> arguments in
> >> >> the right registers/stack slot and then jumps to the target
> >> native
> >> >> function - after which another round of adaptation must be
> >> performed
> >> >> (e.g. to recover the return value from the right
> register/memory
> >> >> location).
> >> >>
> >> >> Needless to say, all this generality comes at a cost -
> some of
> >> the
> >> >> cost is in Java - e.g. all arguments have to be packaged up
> >> into a
> >> >> long array (although this component doesn't seem to show up
> >> much in
> >> >> the generated JVM compiled code). A lot of the cost is
> in the
> >> adapter
> >> >> logic itself - which has to look at the 'call recipe'
> and move
> >> >> arguments around accordingly - more specifically, in
> order to
> >> call
> >> >> the native function, the adapter creates a bunch of
> helper C++
> >> >> objects and structs which model the CPU state (e.g. in the
> >> >> ShuffleDowncallContext struct, we find a field for each
> >> register to
> >> >> be modeled in the target architecture). The adapter has to
> >> first move
> >> >> the values coming from the Java world (stored in the
> >> aforementioned
> >> >> long array) into the right context fields (and it needs
> to do
> >> so by
> >> >> looking at the recipe, which involves iteration over the
> recipe
> >> >> elements). After that's done, we can jump into the assembly
> >> stub that
> >> >> does the native call - this stub will take as input one
> of those
> >> >> ShuffleDowncallContext structure and will load the
> corresponding
> >> >> registers/create necessary stack slots ahead of the call.
> >> >>
> >> >> As you can see, there's quite a lot of action going on here,
> >> and this
> >> >> explains the benchmark numbers; of course, if you are
> calling a
> >> >> native function that does a lot of computation, this
> adaptation
> >> cost
> >> >> will wash out - but for relatively quick calls such as
> 'getpid'
> >> and
> >> >> 'exp' the latency dominates the picture.
> >> >>
> >> >> Digression: the callback logic suffers pretty much from
> the same
> >> >> issues, albeit in a reversed order - this time it's the
> Java code
> >> >> which receives a 'snapshot' of the register values from a
> >> generated
> >> >> assembly adapter; the Java code can then read such values
> >> (using the
> >> >> Pointer API), turn them into Java objects, call the
> target Java
> >> >> method and store the results (after another conversion)
> in the
> >> right
> >> >> location of the snapshot. The assembly adapter will then
> pick
> >> up the
> >> >> values set onto the snapshot by the Java code, store
> them into
> >> the
> >> >> corresponding registers and return control to the native
> >> callee. In
> >> >> the remainder of this email we will not discuss callbacks in
> >> details
> >> >> - we will just posit that for any optimization technique
> that
> >> can be
> >> >> defined, there exists a _dual_ strategy that works with
> >> callbacks.
> >> >>
> >> >> How can we make sensible native calls go faster? Well, one
> >> obvious
> >> >> way would be to optimize the universal adapter so that
> we get a
> >> >> specialized assembly stub for each code shape. If we do
> that,
> >> we can
> >> >> move pretty much all of the computation described above from
> >> >> execution time to the stub generation time, so that, by the
> >> time we
> >> >> have to call the native function, we just have to
> populate the
> >> right
> >> >> registers (the specialized stub knows where to find
> them) and
> >> jump.
> >> >> While this sounds a good approach, it feels like there's
> also a
> >> move
> >> >> for the JIT somewhere in there - after all, the JVM
> knows which
> >> calls
> >> >> are hot and in need for optimization, so perhaps this
> >> specialization
> >> >> process (some or all of it) could happen dynamically.
> And this is
> >> >> indeed an approach we'd like to aim for in the long run.
> >> >>
> >> >> Now, few years ago, Vlad put together a patch which now
> lives
> >> in the
> >> >> 'linkToNative' branch [6, 7] - the goal of this patch is to
> >> implement
> >> >> the approach described above: generate a specialized
> assembly
> >> adapter
> >> >> for a given native signature, and then leverage the JIT to
> >> optimize
> >> >> it away, turning the adapter into a bare, direct, native
> method
> >> call.
> >> >> As you can see from the third batch of benchmarks, if we
> tweak
> >> Panama
> >> >> to use the linkToNative machinery, the speed up is really
> >> impressive,
> >> >> and we end up being much faster than JNI (up to 4x for
> getPid).
> >> >>
> >> >> Unfortunately, the technology in the linkToNative branch
> is not
> >> ready
> >> >> from prime time (yet) - first, it doesn't cover some
> useful cases
> >> >> (e.g. varargs, multiple returns via registers, arguments
> >> passed in
> >> >> memory). That is, the logic assumes there's a 1-1 mapping
> >> between a
> >> >> Java signature and the native function to be called -
> and that
> >> the
> >> >> arguments passed from Java will either be longs or doubles.
> >> While we
> >> >> can workaround this limitation and define the necessary
> >> marshalling
> >> >> logic in Java (as I have done to run this benchmark),
> some of the
> >> >> limitations (multiple returns, structs passed by value which
> >> are too
> >> >> big) cannot simply be worked around. But that's fine, we
> can
> >> still
> >> >> have a fast path for those calls which have certain
> >> characteristics
> >> >> and a slow path (through the universal adapter) for all the
> >> other calls.
> >> >>
> >> >> But there's a second and more serious issue lurking: as
> you can
> >> see
> >> >> in the benchmark, I was not able to get the qsort benchmark
> >> running
> >> >> when using the linkToNative backend. The reason is that the
> >> >> linkToNative code is still pretty raw, and it doesn't fully
> >> adhere to
> >> >> the JVM internal conventions - e.g. there are missing
> thread
> >> state
> >> >> transitions which, in the case of upcalls into Java, create
> >> issues
> >> >> when it comes to garbage collection, as the GC cannot
> parse the
> >> >> native stack in the correct way.
> >> >>
> >> >> This means that, while there's a clear shining path ahead of
> >> us, it
> >> >> is simply too early to just use the linkToNative backend
> from
> >> Panama.
> >> >> For this reason, I've been looking into some kind of stopgap
> >> solution
> >> >> - another way of optimizing native calls (and upcalls into
> >> Java) that
> >> >> doesn't require too much VM magic. Now, a crucial
> observation is
> >> >> that, in many native calls, there is indeed a 1-1
> mapping between
> >> >> Java arguments and native arguments (and back, for return
> >> values).
> >> >> That is, we can think of calling a native function as a
> process
> >> that
> >> >> takes a bunch of Java arguments, turn them into native
> arguments
> >> >> (either double or longs), calls the native methods and
> then turns
> >> >> back the result into Java.
> >> >>
> >> >> The mapping between Java arguments and C values is quite
> simple:
> >> >>
> >> >> * primitives: either long or double, depending on
> whether they
> >> >> describe an integral value or a floating point one.
> >> >> * pointers: they convert to a long
> >> >> * callbacks: they also convert to a long
> >> >> * structs: they are recursively decomposed into fields
> and each
> >> field
> >> >> is marshalled separately (assuming the struct is not too
> big, in
> >> >> which case is passed in memory)
> >> >>
> >> >> So, in principle, we could define a bunch of native entry
> >> points in
> >> >> the VM, one per shape, which take a bunch of long and
> doubles and
> >> >> call an underlying function with those arguments. For
> instance,
> >> let's
> >> >> consider the case of a native function which is modelled in
> >> Java as:
> >> >>
> >> >> int m(Pointer<Foo>, double)
> >> >>
> >> >> To call this native function we have to first turn the Java
> >> arguments
> >> >> into a (long, double) pair. Then we need to call a
> native adapter
> >> >> that looks like the following:
> >> >>
> >> >> jlong NI_invokeNative_J_JD(JNIEnv *env, jobject _unused,
> jlong
> >> addr,
> >> >> jlong arg0, jdouble arg1) {
> >> >> return ((jlong (*)(jlong, jdouble))addr)(arg0, arg1);
> >> >> }
> >> >>
> >> >> And this will take care of calling the native function and
> >> returning
> >> >> the value back. This is, admittedly, a very simple
> solution; of
> >> >> course there are limitations: we have to define a bunch of
> >> >> specialized native entry point (and Java entry points, for
> >> >> callbacks). But here we can play a trick: most of
> moderns ABI
> >> pass
> >> >> arguments in registers; for instance System V ABI [5]
> uses up
> >> to 6
> >> >> (!!) integer registers and 7 (!!) MMXr registers for FP
> values
> >> - this
> >> >> gives us a total of 13 registers available for argument
> passing.
> >> >> Which covers quite a lot of cases. Now, if we have a
> call where
> >> _all_
> >> >> arguments are passed in registers, then the order in
> which these
> >> >> arguments are declared in the adapter doesn't matter!
> That is,
> >> since
> >> >> FP-values will always be passed in different register from
> >> integral
> >> >> values, we can just define entry points which look like
> these:
> >> >>
> >> >> invokeNative_V_DDDDD
> >> >> invokeNative_V_JDDDD
> >> >> invokeNative_V_JJDDD
> >> >> invokeNative_V_JJJDD
> >> >> invokeNative_V_JJJJD
> >> >> invokeNative_V_JJJJJ
> >> >>
> >> >> That is, for a given arity (5 in this case), we can just put
> >> all long
> >> >> arguments in front, and the double arguments after that.
> That
> >> is, we
> >> >> don't need to generate all possible permutations of J/D
> in all
> >> >> positions - as the adapter will always do the same thing
> (read:
> >> load
> >> >> from same registers) for all equivalent combinations. This
> >> keeps the
> >> >> number of entry points in check - and it also poses some
> >> challenges
> >> >> to the Java logic in charge of marshalling/unmarshalling, as
> >> there's
> >> >> an extra permutation step involved (although that is not
> >> something
> >> >> super-hard to address).
> >> >>
> >> >> You can see the performance numbers associated with this
> >> invocation
> >> >> scheme (which I've dubbed 'direct') in the 4th batch of the
> >> benchmark
> >> >> results. These numbers are on par (and slightly better) with
> >> JNI in
> >> >> all the three cases considered which is, I think, a very
> positive
> >> >> result, given that to write these benchmarks I did not
> have to
> >> write
> >> >> a single line of JNI code. In other words, this
> optimization
> >> gives
> >> >> you the same speed as JNI, with improved ease of use (**).
> >> >>
> >> >> Now, since the 'direct' optimization builds on top of the VM
> >> native
> >> >> call adapters, this approach is significantly more
> robust than
> >> >> linkToNative and I have not run into any weird VM
> crashes when
> >> >> playing with it. The downside of that, is that, for obvious
> >> reasons,
> >> >> this approach cannot get much faster than JNI - that is, it
> >> cannot
> >> >> get close to the numbers obtained with the linkToNative
> backend,
> >> >> which features much deeper optimizations. But I think that,
> >> despite
> >> >> its limitations, it's still a good opportunistic improvement
> >> that is
> >> >> worth pursuing in the short term (while we sort out the
> >> linkToNative
> >> >> story). For this reason, I will soon be submitting a
> review which
> >> >> incorporates the changes for the 'direct' invocation
> schemes.
> >> >>
> >> >> Cheers
> >> >> Maurizio
> >> >>
> >> >> [1] -
> >> http://cr.openjdk.java.net/~mcimadamore/panama/foreign-jmh.txt
> <http://cr.openjdk.java.net/%7Emcimadamore/panama/foreign-jmh.txt>
> >> <http://cr.openjdk.java.net/%7Emcimadamore/panama/foreign-jmh.txt>
> >> >> [2] - https://github.com/jnr/jnr-ffi
> >> >> [3] - https://github.com/jnr/jffi
> >> >> [4] - https://sourceware.org/libffi/
> >> >> [5] -
> >> >>
> >>
> https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf
> >>
> >> >>
> >> >> [6] -
> >> >>
> >>
> http://cr.openjdk.java.net/~jrose/panama/native-call-primitive.html
> <http://cr.openjdk.java.net/%7Ejrose/panama/native-call-primitive.html>
> >>
> <http://cr.openjdk.java.net/%7Ejrose/panama/native-call-primitive.html>
> >> >> [7] -
> http://hg.openjdk.java.net/panama/dev/shortlog/b9ebb1bb8354
> >> >>
> >> >> (**) the benchmark also contains a 5th row in which I
> repeated
> >> same
> >> >> tests, this time using JNR [2]. JNR is built on top of
> libjffi
> >> [3], a
> >> >> JNI library in turn built on top of the popular libffi
> [4]. I
> >> wanted
> >> >> to have some numbers about JNR because that's another
> solution
> >> that
> >> >> allows for better ease to use, taking care of
> marshalling Java
> >> values
> >> >> into C and back; since the goals of JNR are similar in
> spirit
> >> with
> >> >> some of the goals of the Panama/foreign work, I thought it
> >> would be
> >> >> worth having a comparison of these approaches. For the
> records, I
> >> >> think the JNR numbers are very respectable given that
> JNR had
> >> to do
> >> >> all the hard work outside of the JDK!
> >> >>
> >> >>
> >> >
> >>
> >
>
More information about the panama-dev
mailing list