[foreign] some JMH benchmarks

Tue Sep 18 00:51:33 UTC 2018

Thanks! Also, the native declaration for exp() isn't "static". That might
impose a large overhead...

2018年9月18日(火) 0:59 Maurizio Cimadamore <maurizio.cimadamore at oracle.com>:

> For the records, here's what I get for all the three benchmarks if I
> compile the JNI code with -O3:
>
> Benchmark                          Mode  Cnt Score         Error  Units
> PanamaBenchmark.testJNIExp        thrpt    5  28575269.294 ±
> 1907726.710  ops/s
> PanamaBenchmark.testJNIJavaQsort  thrpt    5    372148.433 ± 27178.529
> ops/s
> PanamaBenchmark.testJNIPid        thrpt    5  59240069.011 ± 403881.697
> ops/s
>
> The first and second benchmarks get faster and very close to the
> 'direct' optimization numbers in [1]. Surprisingly, the last benchmark
> (getpid) is quite slower. I've been able to reproduce across multiple
> runs; for that benchmark omitting O3 seems to be the achieve best
> results, not sure why. It starts of faster (around in the first couple
> of warmup iterations, but then it goes slower in all the other runs -
> presumably it interacts badly with the C2 generated code. For instance,
> this is a run with O3 enabled:
>
> # Run progress: 66.67% complete, ETA 00:01:40
> # Fork: 1 of 1
> # Warmup Iteration   1: 65182202.653 ops/s
> # Warmup Iteration   2: 64900639.094 ops/s
> # Warmup Iteration   3: 59314945.437 ops/s
> <---------------------------------
> # Warmup Iteration   4: 59269007.877 ops/s
> # Warmup Iteration   5: 59239905.163 ops/s
> Iteration   1: 59300748.074 ops/s
> Iteration   2: 59249666.044 ops/s
> Iteration   3: 59268597.051 ops/s
> Iteration   4: 59322074.572 ops/s
> Iteration   5: 59059259.317 ops/s
>
> And this is a run with O3 disabled:
>
> # Run progress: 0.00% complete, ETA 00:01:40
> # Fork: 1 of 1
> # Warmup Iteration   1: 55882128.787 ops/s
> # Warmup Iteration   2: 53102361.751 ops/s
> # Warmup Iteration   3: 66964755.699 ops/s
> <---------------------------------
> # Warmup Iteration   4: 66414428.355 ops/s
> # Warmup Iteration   5: 65328475.276 ops/s
> Iteration   1: 64229192.993 ops/s
> Iteration   2: 65191719.319 ops/s
> Iteration   3: 65352022.471 ops/s
> Iteration   4: 65152090.426 ops/s
> Iteration   5: 65320545.712 ops/s
>
>
> In both cases, the 3rd warmup execution sees a performance jump - with
> O3, the jump is backwards, w/o O3 the jump is forward, which is quite
> typical for a JMH benchmark as C2 optimization will start to kick in.
>
> For these reasons, I'm reluctant to update my benchmark numbers to
> reflect the O3 behavior (although I agree that, since the Hotspot code
> is compiled with that optimization it would make more sense to use that
> as a reference).
>
> Maurizio
>
> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/foreign-jmh.txt
>
>
>
> On 17/09/18 16:18, Maurizio Cimadamore wrote:
> >
> >
> > On 17/09/18 15:08, Samuel Audet wrote:
> >> Yes, the blackhole or the random number doesn't make any difference,
> >> but not calling gcc with -O3 does. Running the compiler with
> >> optimizations on is pretty common, but they are not enabled by default.
> > A bit better
> >
> > PanamaBenchmark.testMethod  thrpt    5  28018170.076 ± 8491668.248 ops/s
> >
> > But not much of a difference (I did not expected much, as the body of
> > the native method is extremely simple).
> >
> > Maurizio
> >>
> >> Samuel
> >>
> >>
> >> 2018年9月17日(月) 21:37 Maurizio Cimadamore
> >> <maurizio.cimadamore at oracle.com
> >> <mailto:maurizio.cimadamore at oracle.com>>:
> >>
> >>     Hi Samuel,
> >>     I was planning to upload the benchmark IDE project in the near
> >>     future (I
> >>     need to clean that up a bit, so that it can be opened at ease):
> >>
> >>     My getpid example is like this; this is the Java decl:
> >>
> >>     public class GetPid {
> >>
> >>          static {
> >>              System.loadLibrary("getpid");
> >>          }
> >>
> >>          native static long getpid();
> >>
> >>          native double exp(double base);
> >>     }
> >>
> >>     This is the JNI code:
> >>
> >>     JNIEXPORT jlong JNICALL Java_org_panama_GetPid_getpid
> >>        (JNIEnv *env, jobject recv) {
> >>         return getpid();
> >>     }
> >>
> >>     JNIEXPORT jdouble JNICALL Java_org_panama_GetPid_exp
> >>        (JNIEnv *env, jobject recv, jdouble arg) {
> >>         return exp(arg);
> >>     }
> >>
> >>     And this is the benchmark:
> >>
> >>     class PanamaBenchmark {
> >>          static GetPid pidlib = new GetPid();
> >>
> >>          @Benchmark
> >>          public long testJNIPid() {
> >>              return pidlib.getpid();
> >>          }
> >>
> >>          @Benchmark
> >>          public double testJNIExp() {
> >>              return pidlib.exp(10d);
> >>          }
> >>     }
> >>
> >>
> >>     I think this should be rather standard?
> >>
> >>     I'm on Ubuntu 16.04.1, and using GCC 5.4.0. The command I use to
> >>     compile
> >>     the C lib is this:
> >>
> >>     gcc -I<path to jni,h> -l<path to jni lib> -shared -o libgetpid.so
> >>     -fPIC
> >>     GetPid.c
> >>
> >>     One difference I see between our two examples is the use of
> >>     BlackHole.
> >>     In my bench, I'm just returning the call to 'exp' - which should be
> >>     equivalent, and, actually, preferred, as described here:
> >>
> >>
> http://hg.openjdk.java.net/code-tools/jmh/file/3769055ad883/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_09_Blackholes.java#l51
> >>
> >>     Another minor difference I see is that I pass a constant argument,
> >>     while
> >>     you generate a random number on each iteration.
> >>
> >>     I tried to cut and paste your benchmark and I got this:
> >>
> >>     Benchmark                    Mode  Cnt         Score Error Units
> >>     PanamaBenchmark.testMethod  thrpt    5  26362701.827 ±
> >>     1357012.981  ops/s
> >>
> >>
> >>     Which looks exactly the same as what I've got. So, for whatever
> >>     reason,
> >>     my machine seems to be slower than the one you are using. For
> >> what is
> >>     worth, this website [1] seems to confirm the difference. While clock
> >>     speeds are similar, your machine has more Ghz in Turbo boost mode
> >> and
> >>     it's 3-4 years newer than mine, so I'd expect that to make a
> >>     difference
> >>     in terms of internal optimizations etc. Note that I'm able to beat
> >>     the
> >>     numbers of my workstation using my laptop which sports a slightly
> >>     higher
> >>     frequency and only has 2 cores and 8G of RAM.
> >>
> >>     [1] -
> >>
> https://www.cpubenchmark.net/compare/Intel-Xeon-E5-2673-v4-vs-Intel-Xeon-E5-2665/2888vs1439
> >>
> >>     Maurizio
> >>
> >>
> >>
> >>     On 17/09/18 11:00, Samuel Audet wrote:
> >>     > Thanks for the figures Maurizio! It's finally good to be
> >>     speaking in
> >>     > numbers. :)
> >>     >
> >>     > However, you're not providing a lot of details about how you
> >>     actually
> >>     > ran the experiments. So I've decided to run a JMH benchmark on
> >>     what we
> >>     > get by default with JavaCPP and this declaration:
> >>     >
> >>     >     @Platform(include = "math.h")
> >>     >     public class MyBenchmark {
> >>     >         static { Loader.load(); }
> >>     >
> >>     >         @NoException
> >>     >         public static native double exp(double x);
> >>     >
> >>     >         @State(Scope.Thread)
> >>     >         public static class MyState {
> >>     >             double x;
> >>     >
> >>     >             @Setup(Level.Iteration)
> >>     >             public void setupMethod() {
> >>     >                 x = Math.random();
> >>     >             }
> >>     >         }
> >>     >
> >>     >         @Benchmark
> >>     >         public void testMethod(MyState s, Blackhole bh) {
> >>     >             bh.consume(exp(s.x));
> >>     >         }
> >>     >     }
> >>     >
> >>     > The relevant portion of generated JNI looks like this:
> >>     >
> >>     >     JNIEXPORT jdouble JNICALL
> >>     Java_org_sample_MyBenchmark_exp(JNIEnv*
> >>     > env, jclass cls, jdouble arg0) {
> >>     >         jdouble rarg = 0;
> >>     >         double rval = exp(arg0);
> >>     >         rarg = (jdouble)rval;
> >>     >         return rarg;
> >>     >     }
> >>     >
> >>     > And with access to just 2 virtual cores of an Intel(R) Xeon(R) CPU
> >>     > E5-2673 v4 @ 2.30GHz and 8 GB of RAM on the cloud (so probably
> >>     slower
> >>     > than your E5-2665 @ 2.40GHz) running Ubuntu 14.04 with GCC 4.9 and
> >>     > OpenJDK 8, I get these numbers:
> >>     > Benchmark                Mode  Cnt Score        Error Units
> >>     > MyBenchmark.testMethod  thrpt   25  37183556.094 ± 460795.746
> >> ops/s
> >>     >
> >>     > I'm not sure how that compares with your numbers exactly, but it
> >>     does
> >>     > seem to me that what you get for JNI is a bit low. If you could
> >>     > provide more details about how to reproduce your results, that
> >>     would
> >>     > be great.
> >>     >
> >>     > Samuel
> >>     >
> >>     >
> >>     > On 09/14/2018 10:19 PM, Maurizio Cimadamore wrote:
> >>     >> Hi,
> >>     >> over the last few weeks I've been busy playing with Panama and
> >>     >> assessing performances with JMH. For those just interested in raw
> >>     >> numbers, the results of my explorations can be found here [1].
> >>     But as
> >>     >> all benchmarks, I think it's better to spend few words to
> >>     understand
> >>     >> what these numbers actually _mean_.
> >>     >>
> >>     >> To evaluate the performances of Panama I have first created a
> >>     >> baseline using JNI - more specifically I wanted to assess
> >>     >> performances of three calls (all part of the C std library),
> >>     namely
> >>     >> `getpid`, `exp` and `qsort`.
> >>     >>
> >>     >> The first example is the de facto benchmark for FFIs - since it
> >>     does
> >>     >> relatively little computation, it is a good test to measure the
> >>     >> 'latency' of the FFI approach (e.g. how long does it take to
> >> go to
> >>     >> native). The second example is also relatively simple, but the
> >>     this
> >>     >> time the function takes a double argument. The third test is
> >>     akin to
> >>     >> an FFI torture test, since not only it passes substantially more
> >>     >> arguments (4) but one of these arguments is also a callback - a
> >>     >> pointer to a function that is used to sort the contents of the
> >>     input
> >>     >> array.
> >>     >>
> >>     >> As expected, the first batch of JNI results confirms our
> >>     >> expectations: `getpid` is the fastest, followed by `exp`, and
> >> then
> >>     >> followed by `qsort`. Note that qsort is not even close in
> >> terms of
> >>     >> raw numbers to the other two tests - that's because, to sort the
> >>     >> array we need to do (N * log N) upcalls into Java. In the
> >>     benchmark,
> >>     >> N = 8 and we do the upcalls using the JNI function
> >>     >> JNIEnv::CallIntMethod.
> >>     >>
> >>     >> Now let's examine the second batch of results; these call
> >>     `getpid`,
> >>     >> `exp` and `qsort` using Panama. The numbers here are considerably
> >>     >> lower than the JNI ones for all the three benchmark - although
> >> the
> >>     >> first two seem to be the most problematic. To explain these
> >>     results
> >>     >> we need to peek under the hood. Panama implements foreign calls
> >>     >> through a so called 'universal adapter' which, given a calling
> >>     scheme
> >>     >> and a bunch of arguments (machine words) shuffles these
> >>     arguments in
> >>     >> the right registers/stack slot and then jumps to the target
> >> native
> >>     >> function - after which another round of adaptation must be
> >>     performed
> >>     >> (e.g. to recover the return value from the right register/memory
> >>     >> location).
> >>     >>
> >>     >> Needless to say, all this generality comes at a cost - some of
> >> the
> >>     >> cost is in Java - e.g. all arguments have to be packaged up
> >> into a
> >>     >> long array (although this component doesn't seem to show up
> >>     much in
> >>     >> the generated JVM compiled code). A lot of the cost is in the
> >>     adapter
> >>     >> logic itself - which has to look at the 'call recipe' and move
> >>     >> arguments around accordingly - more specifically, in order to
> >> call
> >>     >> the native function, the adapter creates a bunch of helper C++
> >>     >> objects and structs which model the CPU state (e.g. in the
> >>     >> ShuffleDowncallContext struct, we find a field for each
> >>     register to
> >>     >> be modeled in the target architecture). The adapter has to
> >>     first move
> >>     >> the values coming from the Java world (stored in the
> >>     aforementioned
> >>     >> long array) into the right context fields (and it needs to do
> >>     so by
> >>     >> looking at the recipe, which involves iteration over the recipe
> >>     >> elements). After that's done, we can jump into the assembly
> >>     stub that
> >>     >> does the native call - this stub will take as input one of those
> >>     >> ShuffleDowncallContext structure and will load the corresponding
> >>     >> registers/create necessary stack slots ahead of the call.
> >>     >>
> >>     >> As you can see, there's quite a lot of action going on here,
> >>     and this
> >>     >> explains the benchmark numbers; of course, if you are calling a
> >>     >> native function that does a lot of computation, this adaptation
> >>     cost
> >>     >> will wash out - but for relatively quick calls such as 'getpid'
> >>     and
> >>     >> 'exp' the latency dominates the picture.
> >>     >>
> >>     >> Digression: the callback logic suffers pretty much from the same
> >>     >> issues, albeit in a reversed order - this time it's the Java code
> >>     >> which receives a 'snapshot' of the register values from a
> >>     generated
> >>     >> assembly adapter; the Java code can then read such values
> >>     (using the
> >>     >> Pointer API), turn them into Java objects, call the target Java
> >>     >> method and store the results (after another conversion) in the
> >>     right
> >>     >> location of the snapshot. The assembly adapter will then pick
> >>     up the
> >>     >> values set onto the snapshot by the Java code, store them into
> >> the
> >>     >> corresponding registers and return control to the native
> >>     callee. In
> >>     >> the remainder of this email we will not discuss callbacks in
> >>     details
> >>     >> - we will just posit that for any optimization technique that
> >>     can be
> >>     >> defined, there exists a _dual_ strategy that works with
> >> callbacks.
> >>     >>
> >>     >> How can we make sensible native calls go faster? Well, one
> >> obvious
> >>     >> way would be to optimize the universal adapter so that we get a
> >>     >> specialized assembly stub for each code shape. If we do that,
> >>     we can
> >>     >> move pretty much all of the computation described above from
> >>     >> execution time to the stub generation time, so that, by the
> >>     time we
> >>     >> have to call the native function, we just have to populate the
> >>     right
> >>     >> registers (the specialized stub knows where to find them) and
> >>     jump.
> >>     >> While this sounds a good approach, it feels like there's also a
> >>     move
> >>     >> for the JIT somewhere in there - after all, the JVM knows which
> >>     calls
> >>     >> are hot and in need for optimization, so perhaps this
> >>     specialization
> >>     >> process (some or all of it) could happen dynamically. And this is
> >>     >> indeed an approach we'd like to aim for in the long run.
> >>     >>
> >>     >> Now, few years ago, Vlad put together a patch which now lives
> >>     in the
> >>     >> 'linkToNative' branch [6, 7] - the goal of this patch is to
> >>     implement
> >>     >> the approach described above: generate a specialized assembly
> >>     adapter
> >>     >> for a given native signature, and then leverage the JIT to
> >>     optimize
> >>     >> it away, turning the adapter into a bare, direct, native method
> >>     call.
> >>     >> As you can see from the third batch of benchmarks, if we tweak
> >>     Panama
> >>     >> to use the linkToNative machinery, the speed up is really
> >>     impressive,
> >>     >> and we end up being much faster than JNI (up to 4x for getPid).
> >>     >>
> >>     >> Unfortunately, the technology in the linkToNative branch is not
> >>     ready
> >>     >> from prime time (yet) - first, it doesn't cover some useful cases
> >>     >> (e.g. varargs, multiple returns via registers, arguments
> >> passed in
> >>     >> memory). That is, the logic assumes there's a 1-1 mapping
> >>     between a
> >>     >> Java signature and the native function to be called - and that
> >> the
> >>     >> arguments passed from Java will either be longs or doubles.
> >>     While we
> >>     >> can workaround this limitation and define the necessary
> >>     marshalling
> >>     >> logic in Java (as I have done to run this benchmark), some of the
> >>     >> limitations (multiple returns, structs passed by value which
> >>     are too
> >>     >> big) cannot simply be worked around. But that's fine, we can
> >> still
> >>     >> have a fast path for those calls which have certain
> >>     characteristics
> >>     >> and a slow path (through the universal adapter) for all the
> >>     other calls.
> >>     >>
> >>     >> But there's a second and more serious issue lurking: as you can
> >>     see
> >>     >> in the benchmark, I was not able to get the qsort benchmark
> >>     running
> >>     >> when using the linkToNative backend. The reason is that the
> >>     >> linkToNative code is still pretty raw, and it doesn't fully
> >>     adhere to
> >>     >> the JVM internal conventions - e.g. there are missing thread
> >> state
> >>     >> transitions which, in the case of upcalls into Java, create
> >> issues
> >>     >> when it comes to garbage collection, as the GC cannot parse the
> >>     >> native stack in the correct way.
> >>     >>
> >>     >> This means that, while there's a clear shining path ahead of
> >>     us, it
> >>     >> is simply too early to just use the linkToNative backend from
> >>     Panama.
> >>     >> For this reason, I've been looking into some kind of stopgap
> >>     solution
> >>     >> - another way of optimizing native calls (and upcalls into
> >>     Java) that
> >>     >> doesn't require too much VM magic. Now, a crucial observation is
> >>     >> that, in many native calls, there is indeed a 1-1 mapping between
> >>     >> Java arguments and native arguments (and back, for return
> >> values).
> >>     >> That is, we can think of calling a native function as a process
> >>     that
> >>     >> takes a bunch of Java arguments, turn them into native arguments
> >>     >> (either double or longs), calls the native methods and then turns
> >>     >> back the result into Java.
> >>     >>
> >>     >> The mapping between Java arguments and C values is quite simple:
> >>     >>
> >>     >> * primitives: either long or double, depending on whether they
> >>     >> describe an integral value or a floating point one.
> >>     >> * pointers: they convert to a long
> >>     >> * callbacks: they also convert to a long
> >>     >> * structs: they are recursively decomposed into fields and each
> >>     field
> >>     >> is marshalled separately (assuming the struct is not too big, in
> >>     >> which case is passed in memory)
> >>     >>
> >>     >> So, in principle, we could define a bunch of native entry
> >>     points in
> >>     >> the VM, one per shape, which take a bunch of long and doubles and
> >>     >> call an underlying function with those arguments. For instance,
> >>     let's
> >>     >> consider the case of a native function which is modelled in
> >>     Java as:
> >>     >>
> >>     >> int m(Pointer<Foo>, double)
> >>     >>
> >>     >> To call this native function we have to first turn the Java
> >>     arguments
> >>     >> into a (long, double) pair. Then we need to call a native adapter
> >>     >> that looks like the following:
> >>     >>
> >>     >> jlong NI_invokeNative_J_JD(JNIEnv *env, jobject _unused, jlong
> >>     addr,
> >>     >> jlong arg0, jdouble arg1) {
> >>     >>      return ((jlong (*)(jlong, jdouble))addr)(arg0, arg1);
> >>     >> }
> >>     >>
> >>     >> And this will take care of calling the native function and
> >>     returning
> >>     >> the value back. This is, admittedly, a very simple solution; of
> >>     >> course there are limitations: we have to define a bunch of
> >>     >> specialized native entry point (and Java entry points, for
> >>     >> callbacks). But here we can play a trick: most of moderns ABI
> >> pass
> >>     >> arguments in registers; for instance System V ABI [5] uses up
> >> to 6
> >>     >> (!!) integer registers and 7 (!!) MMXr registers for FP values
> >>     - this
> >>     >> gives us a total of 13 registers available for argument passing.
> >>     >> Which covers quite a lot of cases. Now, if we have a call where
> >>     _all_
> >>     >> arguments are passed in registers, then the order in which these
> >>     >> arguments are declared in the adapter doesn't matter! That is,
> >>     since
> >>     >> FP-values will always be passed in different register from
> >>     integral
> >>     >> values, we can just define entry points which look like these:
> >>     >>
> >>     >> invokeNative_V_DDDDD
> >>     >> invokeNative_V_JDDDD
> >>     >> invokeNative_V_JJDDD
> >>     >> invokeNative_V_JJJDD
> >>     >> invokeNative_V_JJJJD
> >>     >> invokeNative_V_JJJJJ
> >>     >>
> >>     >> That is, for a given arity (5 in this case), we can just put
> >>     all long
> >>     >> arguments in front, and the double arguments after that. That
> >>     is, we
> >>     >> don't need to generate all possible permutations of J/D in all
> >>     >> positions - as the adapter will always do the same thing (read:
> >>     load
> >>     >> from same registers) for all equivalent combinations. This
> >>     keeps the
> >>     >> number of entry points in check - and it also poses some
> >>     challenges
> >>     >> to the Java logic in charge of marshalling/unmarshalling, as
> >>     there's
> >>     >> an extra permutation step involved (although that is not
> >> something
> >>     >> super-hard to address).
> >>     >>
> >>     >> You can see the performance numbers associated with this
> >>     invocation
> >>     >> scheme (which I've dubbed 'direct') in the 4th batch of the
> >>     benchmark
> >>     >> results. These numbers are on par (and slightly better) with
> >>     JNI in
> >>     >> all the three cases considered which is, I think, a very positive
> >>     >> result, given that to write these benchmarks I did not have to
> >>     write
> >>     >> a single line of JNI code. In other words, this optimization
> >> gives
> >>     >> you the same speed as JNI, with improved ease of use (**).
> >>     >>
> >>     >> Now, since the 'direct' optimization builds on top of the VM
> >>     native
> >>     >> call adapters, this approach is significantly more robust than
> >>     >> linkToNative and I have not run into any weird VM crashes when
> >>     >> playing with it. The downside of that, is that, for obvious
> >>     reasons,
> >>     >> this approach cannot get much faster than JNI - that is, it
> >> cannot
> >>     >> get close to the numbers obtained with the linkToNative backend,
> >>     >> which features much deeper optimizations. But I think that,
> >>     despite
> >>     >> its limitations, it's still a good opportunistic improvement
> >>     that is
> >>     >> worth pursuing in the short term (while we sort out the
> >>     linkToNative
> >>     >> story). For this reason, I will soon be submitting a review which
> >>     >> incorporates the changes for the 'direct' invocation schemes.
> >>     >>
> >>     >> Cheers
> >>     >> Maurizio
> >>     >>
> >>     >> [1] -
> >> http://cr.openjdk.java.net/~mcimadamore/panama/foreign-jmh.txt
> >> <http://cr.openjdk.java.net/%7Emcimadamore/panama/foreign-jmh.txt>
> >>     >> [2] - https://github.com/jnr/jnr-ffi
> >>     >> [3] - https://github.com/jnr/jffi
> >>     >> [4] - https://sourceware.org/libffi/
> >>     >> [5] -
> >>     >>
> >>
> https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf
> >>
> >>     >>
> >>     >> [6] -
> >>     >>
> >> http://cr.openjdk.java.net/~jrose/panama/native-call-primitive.html
> >> <http://cr.openjdk.java.net/%7Ejrose/panama/native-call-primitive.html>
> >>     >> [7] -
> http://hg.openjdk.java.net/panama/dev/shortlog/b9ebb1bb8354
> >>     >>
> >>     >> (**) the benchmark also contains a 5th row in which I repeated
> >>     same
> >>     >> tests, this time using JNR [2]. JNR is built on top of libjffi
> >>     [3], a
> >>     >> JNI library in turn built on top of the popular libffi [4]. I
> >>     wanted
> >>     >> to have some numbers about JNR because that's another solution
> >>     that
> >>     >> allows for better ease to use, taking care of marshalling Java
> >>     values
> >>     >> into C and back; since the goals of JNR are similar in spirit
> >> with
> >>     >> some of the goals of the Panama/foreign work, I thought it
> >>     would be
> >>     >> worth having a comparison of these approaches. For the records, I
> >>     >> think the JNR numbers are very respectable given that JNR had
> >>     to do
> >>     >> all the hard work outside of the JDK!
> >>     >>
> >>     >>
> >>     >
> >>
> >
>
>