<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>I'm dropping most of direct recipients and going back to just use

      panama-dev and hotspot-dev, as it appears that our sever is having

      issues in handling too many recipients (the message that got

      delivered today was written few days ago :-) ).</p>

    <p>I suggest everybody doing the same, and just use mailing lists

      for further replies to this thread.</p>

    <p>Cheers<br>

      Maurizio<br>

    </p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 05/07/2022 12:33, Maurizio

      Cimadamore wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:f699a00e-4626-808e-9a37-3fb8808149b2@oracle.com">

      <p>Hi,<br>

        As Erik explained in his reply, what we call "critical JNI"

        comes in two pieces: one removes Java to native thread

        transitions (which is what Wojciech is referring to), while

        another part interacts with the GC locker (basically to allow

        critical JNI code to access Java arrays w/o copying). I think

        the latter part is the most problematic GC-wise.<br>

      </p>

      <p>Then, regarding the former, I think there are still questions

        as to whether dropping transitions is the best way to get the

        performance boost required; for instance, yesterday I did some

        experiments with an experimental patch from Jorn (kudos) which

        re-enables an opt-in for "trivial" native calls in the Panama

        API. I used it to test clock_gettime, and, while there's an

        improvement, the results I got were not as conclusive as one

        might expect expected. This is what I get w/ state transitions:</p>

      <p>```<br>

        Benchmark                                 Mode  Cnt   Score  

        Error  Units<br>

        ClockgettimeTest.panama_monotonic         avgt   30  27.814 ±

        0.165  ns/op<br>

        ClockgettimeTest.panama_monotonic_coarse  avgt   30  12.094 ±

        0.103  ns/op<br>

        ClockgettimeTest.panama_monotonic_raw     avgt   30  27.719 ±

        0.393  ns/op<br>

        ClockgettimeTest.panama_realtime          avgt   30  27.133 ±

        0.280  ns/op<br>

        ClockgettimeTest.panama_realtime_coarse   avgt   30  26.812 ±

        0.384  ns/op<br>

        ```<br>

      </p>

      <p>And this is what I get with transitions removed:</p>

      <p>```<br>

        Benchmark                                 Mode  Cnt   Score  

        Error  Units<br>

        ClockgettimeTest.panama_monotonic         avgt   30  22.383 ±

        0.213  ns/op<br>

        ClockgettimeTest.panama_monotonic_coarse  avgt   30   6.312 ±

        0.117  ns/op<br>

        ClockgettimeTest.panama_monotonic_raw     avgt   30  22.731 ±

        0.279  ns/op<br>

        ClockgettimeTest.panama_realtime          avgt   30  22.503 ±

        0.292  ns/op<br>

        ClockgettimeTest.panama_realtime_coarse   avgt   30  21.853 ±

        0.100  ns/op<br>

      </p>

      <p>```<br>

      </p>

      <p>Here we can see a gain of 4-5ns, obtained by dropping the

        transition. The only case where this makes a significant

        difference is with the monotonic_coarse flavor. In the other

        cases there's a difference, yes, but not as pronounced, simply

        because the term we're comparing against is bigger: it's easy to

        see a 5ns gain if your function runs for 10ns in total - but

        such a gain starts to get lost in the "noise" when functions run

        for longer. And that's the main issue with removing

        Java->native transitions: the "window" in which this

        optimization yield a positive effect is extremely narrow

        (anything lasting longer than 30ns won't probably appreciate

        much difference), but, as you can see from the PR in [1], the VM

        changes required to support it touch quite a bit of stuff!</p>

      <p>Luckily, selectively disabling transitions from Panama is

        slightly more straightforward and, perhaps, for stuff like

        recvmsg syscalls that are bypassed, there's not much else we can

        do: while one could imagine Panama special-casing calls to

        clock_gettime, as that's a known "leaf", the same cannot be done

        with rcvmsg, which is in general a blocking call. Panama also

        has a "trusted mode" flag (--enable-native-access), so there is

        a way in the Panama API to distinguish between safe and unsafe

        API point, which also helps with this. The risk of course is for

        developers to see whatever mechanism is provided as some kind of

        "make my code go fast please" and apply it blindly, w/o fully

        understanding the consequences. What I said before about

        "extremely narrow window" remains true: in the vast majority of

        cases (like 99%) dropping state transitions can result in very

        big downsides, while the corresponding upsides are not big

        enough to even be noticeable (the Q/A in [2] arrives at a very

        similar conclusion).<br>

      </p>

      <p>All this said, selectively disabling state transitions from

        native calls made using the Panama foreign API seem the most

        straightforward way to offset the performance delta introduced

        by the removal of critical JNI. In part it's because the Panama

        API is more flexible, e.g. function descriptors allows us to

        model the distinction between a trivial and non-trivial call; in

        part it's because, as stated above, Panama can already reason

        about calls that are "unsafe" and that require extra

        permissions. And, finally it's also because, if we added back

        critical JNI, we'd probably add it back w/o its most problematic

        GC locker parts (that's what [1] does AFAIK) - which means it

        won't be a complete code reversal. So, perhaps, coming up with a

        fresh mechanism to drop transitions (only) could also be less

        confusing for developers. Of course this would require

        developers such as Wojciech to rewrite some of the code to use

        Panama instead of JNI.</p>

      <p>And, coming back to clock_gettime, my feeling is that with the

        right tools (e.g. some intrinsics), we can make that go a lot

        faster than what shown above. Being able to quickly get a

        timestamp seems a widely-enough applicable use case to deserves

        some special treatment. So, perhaps, it's worth considering a

        _spectrum of solutions_ on how to improve the status quo, rather

        than investing solely on the removal of thread transitions.<br>

      </p>

      <p>Maurizio<br>

      </p>

      <p>[1] - <a class="moz-txt-link-freetext" href="https://github.com/openjdk/jdk19/pull/90/files" moz-do-not-send="true">https://github.com/openjdk/jdk19/pull/90/files<br>

        </a>[2] - <a class="moz-txt-link-freetext" href="https://youtu.be/LoyBTqkSkZk?t=742" moz-do-not-send="true">https://youtu.be/LoyBTqkSkZk?t=742</a></p>

      <p><br>

      </p>

      <div class="moz-cite-prefix">On 04/07/2022 18:38, Vitaly

        Davidovich wrote:<br>

      </div>

      <blockquote type="cite" cite="mid:CAHjP37E62eEbrDtS7HF0eZ0wA65xTWVF_eqpZFubP=4PTXEYVg@mail.gmail.com">

        <div dir="auto">To not sidetrack this thread with my previous

          reply:</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">Maurizio - are you saying java criticals are

          *already* hindering ZGC and/or other planned Hotspot

          improvements? Or that theoretically they could and you’d like

          to remove/deprecate them now(ish)?</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">If it’s the former, perhaps it’s prudent to keep

          them around until a compelling case surfaces where they

          preclude or severely restrict evolution of the platform? If

          it’s the former, would be curious what that is but would also

          understand the rationale behind wanting to remove it.</div>

        <div><br>

          <div class="gmail_quote">

            <div dir="ltr" class="gmail_attr">On Mon, Jul 4, 2022 at

              1:26 PM Vitaly Davidovich <<a href="mailto:vitalyd@gmail.com" moz-do-not-send="true" class="moz-txt-link-freetext">vitalyd@gmail.com</a>>

              wrote:<br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

              <div><br>

              </div>

              <div><br>

                <div class="gmail_quote">

                  <div dir="ltr" class="gmail_attr">On Mon, Jul 4, 2022

                    at 1:13 PM Wojciech Kudla <<a href="mailto:wkudla.kernel@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">wkudla.kernel@gmail.com</a>>

                    wrote:<br>

                  </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                    <div dir="ltr">

                      <div>

                        <div>

                          <div>Thanks for your input, Vitaly. I'd be

                            interested to find out more about the nature

                            of the HW noise you observed in your

                            benchmarks as our results were very

                            consistent and it was pretty straightforward

                            to pinpoint the culprit as JNI call

                            overhead. Maybe it was just easier for us

                            because we disallow C- and P-state

                            transitions and put a lot of effort to

                            eliminate platform jitter in general. Were

                            you maybe running on a CPU model that

                            doesn't support constant TSC? I would also

                            suggest retrying with LAPIC interrupts

                            suppressed (with: cli/sti) to maybe see if

                            it's the kernel and not the hardware.</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                  <div dir="auto">This was on a Broadwell Xeon chipset

                    with constant tsc.  All the typical jitter sources

                    were reduced: C/P states disabled in bios, max turbo

                    enabled, IRQs steered away, core isolated, etc.  By

                    the way, by noise I don’t mean the results

                    themselves were noisy - they were constant run to

                    run.  I just meant the delta between normal vs

                    critical JNI entrypoints was very minimal - ie “in

                    the noise”, particularly with rdtsc.</div>

                  <div dir="auto"><br>

                  </div>

                  <div dir="auto">I can try to remeasure on newer Intel

                    but see below …</div>

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                    <div dir="ltr">

                      <div>

                        <div>

                          <div dir="auto"><br>

                            <br>

                          </div>

                          100% agree on rdtsc(p) and snippets. There are

                          some narrow usecases were one can get some

                          substantial speed ups with direct access to

                          prefetch or by abusing misprediction to keep

                          icache hot. These scenarios are sadly only

                          available with inline assembly. I know of a

                          few shops that go to the length of forking

                          Graal, etc to achieve that but am quite

                          convinced such capabilities would be welcome

                          and utilized by many more groups if they were

                          easily accessible from java.</div>

                      </div>

                    </div>

                  </blockquote>

                  <div dir="auto">I’m of the firm (and perhaps

                    controversial for some :)) opinion these days that

                    Java is simply the wrong platform/tool for low

                    latency cases that warrant this level of control. 

                    There’re very strong headwinds even outside of JNI

                    costs.  And the “real” problem with JNI, besides

                    transition costs, is lack of inlining into the

                    native calls.  So even if JVM transition costs are

                    fully eliminated, there’s still an optimization

                    fence due to lost inlining (not unlike native code

                    calling native fns via shared libs).</div>

                  <div dir="auto"><br>

                  </div>

                  <div dir="auto">That’s not say that perf regressions

                    are welcomed - nobody likes those :).</div>

                </div>

              </div>

              <div>

                <div class="gmail_quote">

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                    <div dir="ltr">

                      <div>

                        <div dir="auto"><br>

                          <br>

                        </div>

                        Thanks,<br>

                      </div>

                      W.<br>

                    </div>

                    <br>

                    <div class="gmail_quote">

                      <div dir="ltr" class="gmail_attr">On Mon, Jul 4,

                        2022 at 5:51 PM Vitaly Davidovich <<a href="mailto:vitalyd@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">vitalyd@gmail.com</a>>

                        wrote:<br>

                      </div>

                      <blockquote class="gmail_quote" style="margin:0px

                        0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                        <div dir="auto">I’d add rdtsc(p) wrapper

                          functions to the list.  These are usually

                          either inline asm or compiler intrinsic in the

                          JNI entrypoint.  In addition, any native libs

                          exposed via JNI that have “trivial” functions

                          are also candidates for faster calling

                          conventions.  There’re sometimes way to

                          mitigate the call overhead (eg batching) but

                          it’s not always feasible.</div>

                        <div dir="auto"><br>

                        </div>

                        <div dir="auto">I’ll add that last time I tried

                          to measure the improvement of Java criticals

                          for clock_gettime (and rdtsc) it looked to be

                          in the noise on the hardware I was testing

                          on.  It got the point where I had to

                          instrument the critical and normal JNI

                          entrypoints to confirm the critical was being

                          hit.  The critical calling convention isn’t

                          significantly different *if* basic primitives

                          (or no args at all) are passed as args. 

                          JNIEnv*, IIRC, is loaded from a register so

                          that’s minor.  jclass (for static calls, which

                          is what’s relevant here) should be a compiled

                          constant.  Critical call still has a GCLocker

                          check.  So I’m not actually sure what the

                          significant difference is for “lightweight”

                          (ie few primitive or no args, primitive return

                          types) calls.</div>

                        <div dir="auto"><br>

                        </div>

                        <div dir="auto">In general, I do think it’d be

                          nice if there was a faster native call

                          sequence, even if it comes with a caveat

                          emptor and/or special requirements on the

                          callee (not unlike the requirements for

                          criticals).  I think Vladimir Ivanov was

                          working on “snippets” that allowed dynamic

                          construction of a native call, possibly

                          including assembly.  Not sure where that

                          exploration is these days, but that would be a

                          welcome capability.</div>

                        <div dir="auto"><br>

                        </div>

                        <div dir="auto">My $.02.  Happy 4th of July for

                          those celebrating!</div>

                        <div dir="auto"><br>

                        </div>

                        <div dir="auto">Vitaly</div>

                        <div><br>

                          <div class="gmail_quote">

                            <div dir="ltr" class="gmail_attr">On Mon,

                              Jul 4, 2022 at 12:04 PM Maurizio

                              Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>

                              wrote:<br>

                            </div>

                            <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                              <div>

                                <p>Hi,<br>

                                  while I'm not an expert with some of

                                  the IO calls you mention (some of my

                                  colleagues are more knowledgeable in

                                  this area, so I'm sure they will have

                                  more info), my general sense is that,

                                  as with getrusage, if there is a

                                  system call involved, you already pay

                                  a hefty price for the user to kernel

                                  transition. On my machine this seem to

                                  cost around 200ns. In these cases,

                                  using JNI critical to shave off a

                                  dozen of nanoseconds (at best!) seems

                                  just not worth it.</p>

                                <p>So, of the functions in your list,

                                  the ones in which I *believe* 

                                  dropping transitions would have the

                                  most effect are (if we exclude getpid,

                                  for which another approach is

                                  possible) clock_gettime and getcpu, I

                                  believe, as they might use vdso [1],

                                  which typically brings the performance

                                  of these call closer to calls to

                                  shared lib functions.<br>

                                </p>

                                <p>If you have examples e.g. where

                                  performance of recvmsg (or related

                                  calls) varies significantly between

                                  base JNI and critical JNI, please send

                                  them our way; I'm sure some of my

                                  colleagues would be intersted to take

                                  a look.<br>

                                </p>

                                <p>Popping back a couple of levels, I

                                  think it would be helpful to also

                                  define what's an acceptable regression

                                  in this context. Of course, in an

                                  ideal world,  we'd like to see no

                                  performance regression at all. But JNI

                                  critical is an unsupported interface,

                                  which might misbehave with modern

                                  garbage collectors (e.g. ZGC) and that

                                  requires quite a bit of internal

                                  complexity which might, in the

                                  medium/long run, hinder the evolution

                                  of the Java platform (all these things

                                  have _some_ cost, even if the cost is

                                  not directly material to developers).

                                  In this vein, I think calls like

                                  clock_gettime tend to be more

                                  problematic: as they complete very

                                  quickly, you see the cost of

                                  transitions a lot more. In other

                                  cases, where syscalls are involved,

                                  the cost associated to transitions are

                                  more likely to be "in the noise". Of

                                  course if we look at absolute numbers,

                                  dropping transitions would always

                                  yield "faster" code; but at the same

                                  time, going from 250ns to 245ns is

                                  very unlikely to result in visible

                                  performance difference when

                                  considering an application as a whole,

                                  so I think it's critical here to

                                  decide _which_ use cases to

                                  prioritize.<br>

                                </p>

                                <p>I think a good outcome of this

                                  discussion would be if we could come

                                  to some shared understanding of which

                                  native calls are truly problematic

                                  (e.g. clock_gettime-like), and then

                                  for the JDK to provide better (and

                                  more maintainable) alternatives for

                                  those (which might even be faster than

                                  using critical JNI).<br>

                                </p>

                                <p>Thanks<br>

                                  Maurizio<br>

                                </p>

                                <p>[1] - <a href="https://man7.org/linux/man-pages/man7/vdso.7.html" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://man7.org/linux/man-pages/man7/vdso.7.html</a><br>

                                </p>

                              </div>

                              <div>

                                <div>On 04/07/2022 12:23, Wojciech Kudla

                                  wrote:<br>

                                </div>

                                <blockquote type="cite">

                                  <div dir="ltr">

                                    <div>

                                      <div>Thanks Maurizio,<br>

                                        <br>

                                      </div>

                                      I raised this case mainly about

                                      clock_gettime and recvmsg/sendmsg,

                                      I think we're focusing on the

                                      wrong things here. Feel free to

                                      drop the two syscalls from the

                                      discussion entirely, but the main

                                      usecases I have been presenting

                                      throughout this thread definitely

                                      stand.<br>

                                      <br>

                                    </div>

                                    <div>Thanks<br>

                                    </div>

                                    <br>

                                  </div>

                                  <br>

                                  <div class="gmail_quote">

                                    <div dir="ltr" class="gmail_attr">On

                                      Mon, Jul 4, 2022 at 10:54 AM

                                      Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>

                                      wrote:<br>

                                    </div>

                                    <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                                      <div>

                                        <p>Hi Wojtek,<br>

                                          thanks for sharing this list,

                                          I think this is a good

                                          starting point to understand

                                          more about your use case.</p>

                                        <p>Last week I've been looking

                                          at "getrusage" (as you

                                          mentioned it in an earlier

                                          email), and I was surprised to

                                          see that the call took a

                                          pointer to a (fairly big)

                                          struct which then needed to be

                                          initialized with some

                                          thread-local state:</p>

                                        <p><a href="https://man7.org/linux/man-pages/man2/getrusage.2.html" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://man7.org/linux/man-pages/man2/getrusage.2.html</a></p>

                                        <p>I've looked at the

                                          implementation, and it seems

                                          to be doing memset on the

                                          user-provided struct pointer,

                                          plus all the fields

                                          assignment. Eyeballing the

                                          implementation, this does not

                                          seem to me like a "classic"

                                          use case where dropping

                                          transition would help much. I

                                          mean, surely dropping

                                          transitions would help shaving

                                          some nanoseconds off the call,

                                          but it doesn't seem to me that

                                          the call would be shortlived

                                          enough to make a difference.

                                          Do you have some benchmarks on

                                          this one? I did some [1] and

                                          the call overhead seemed to

                                          come up at 260ns/op - w/o

                                          transition you might perhaps

                                          be able to get to 250ns, but

                                          that's in the noise?<br>

                                        </p>

                                        <p>As for getpid, note that you

                                          can do (since Java 9):<br>

                                          <br>

                                          ProcessHandle.current().pid();<br>

                                          <br>

                                          I believe the impl caches the

                                          result, so it shouldn't even

                                          make the native call.<br>

                                        </p>

                                        <p>Maurizio</p>

                                        <p>[1] - <a href="http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java</a><br>

                                        </p>

                                        <div>On 02/07/2022 07:42,

                                          Wojciech Kudla wrote:<br>

                                        </div>

                                        <blockquote type="cite">

                                          <div dir="ltr">

                                            <div>

                                              <div>Hi Maurizio,<br>

                                                <br>

                                              </div>

                                              Thanks for staying on

                                              this.<br>

                                              <br>

                                              > Could you please

                                              provide a rough list of

                                              the native calls you make

                                              where you believe critical

                                              JNI is having a real

                                              impact in the performance

                                              of your application?<br>

                                            </div>

                                            <div><br>

                                              From the top of my head:<br>

                                            </div>

                                            <div>clock_gettime<br>

                                            </div>

                                            <div>recvmsg<br>

                                            </div>

                                            <div>recvmmsg</div>

                                            <div>sendmsg<br>

                                            </div>

                                            <div>sendmmsg</div>

                                            <div>select<br>

                                            </div>

                                            <div>getpid</div>

                                            <div>getcpu<br>

                                            </div>

                                            <div>getrusage<br>

                                            </div>

                                            <div><br>

                                            </div>

                                            <div>> Also, could you

                                              please tell us whether any

                                              of these calls need to

                                              interact with Java arrays?<br>

                                            </div>

                                            <div>No arrays or objects of

                                              any type involved.

                                              Everything happens by the

                                              means of passing raw

                                              pointers as longs and

                                              using other primitive

                                              types as function

                                              arguments.<br>

                                            </div>

                                            <div><br>

                                              > In other words, do

                                              you use critical JNI to

                                              remove the cost associated

                                              with thread transitions,

                                              or are you also taking

                                              advantage of accessing

                                              on-heap memory _directly_

                                              from native code?<br>

                                            </div>

                                            <div>Criticial JNI natives

                                              are used solely to remove

                                              the cost of transitions.

                                              We don't get anywhere near

                                              java heap in native code.<br>

                                              <br>

                                            </div>

                                            <div>In general I think it

                                              makes a lot of sense for

                                              Java as a

                                              language/platform to have

                                              some guards around unsafe

                                              code, but on the other

                                              hand the popularity of

                                              libraries employing Unsafe

                                              and their success in more

                                              performance-oriented

                                              corners of software

                                              engineering is a clear

                                              indicator there is a need

                                              for the JVM to provide

                                              access to more low-level

                                              primitives and mechanisms.

                                              <br>

                                            </div>

                                            <div>I think it's entirely

                                              fair to tell developers

                                              that all bets are off when

                                              they get into some

                                              non-idiomatic scenarios

                                              but please don't take away

                                              a feature that greatly

                                              contributed to Java's

                                              success.<br>

                                              <br>

                                            </div>

                                            <div>Kind regards,<br>

                                            </div>

                                            <div>Wojtek<br>

                                            </div>

                                          </div>

                                          <br>

                                          <div class="gmail_quote">

                                            <div dir="ltr" class="gmail_attr">On Wed,

                                              Jun 29, 2022 at 5:20 PM

                                              Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>

                                              wrote:<br>

                                            </div>

                                            <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                                              <div>

                                                <p>Hi Wojciech,<br>

                                                  picking up this thread

                                                  again. After some

                                                  internal discussion,

                                                  we realize that we

                                                  don't know enough

                                                  about your use case.

                                                  While re-enabling JNI

                                                  critical would

                                                  obviously provide a

                                                  quick fix, we're

                                                  afraid that (a)

                                                  developers might end

                                                  up depending on JNI

                                                  critical when they

                                                  don't need to (perhaps

                                                  also unaware of the

                                                  consequences of

                                                  depending on it) and

                                                  (b) that there might

                                                  actually be _better_

                                                  (as in: much faster)

                                                  solutions than using

                                                  critical native calls

                                                  to address at least

                                                  some of your use cases

                                                  (that seemed to be the

                                                  case with the

                                                  clock_gettime example

                                                  you mentioned). Could

                                                  you please provide a

                                                  rough list of the

                                                  native calls you make

                                                  where you believe

                                                  critical JNI is having

                                                  a real impact in the

                                                  performance of your

                                                  application? Also,

                                                  could you please tell

                                                  us whether any of

                                                  these calls need to

                                                  interact with Java

                                                  arrays? In other

                                                  words, do you use

                                                  critical JNI to remove

                                                  the cost associated

                                                  with thread

                                                  transitions, or are

                                                  you also taking

                                                  advantage of accessing

                                                  on-heap memory

                                                  _directly_ from native

                                                  code?</p>

                                                <p>Regards<br>

                                                  Maurizio<br>

                                                </p>

                                                <div>On 13/06/2022

                                                  21:38, Wojciech Kudla

                                                  wrote:<br>

                                                </div>

                                                <blockquote type="cite">

                                                  <div dir="ltr">

                                                    <div>

                                                      <div>

                                                        <div>

                                                          <div>

                                                          <div>Hi Mark,<br>

                                                          <br>

                                                          </div>

                                                          Thanks for

                                                          your input and

                                                          apologies for

                                                          the delayed

                                                          response.<br>

                                                          <br>

                                                          > If the

                                                          platform

                                                          included, say,

                                                          an

                                                          intrinsified

                                                          System.nanoRealTime()<br>

                                                          method that

                                                          returned

                                                          clock_gettime(CLOCK_REALTIME),

                                                          how much would<br>

                                                          that help

                                                          developers in

                                                          your unnamed

                                                          industry?<br>

                                                          <br>

                                                          </div>

                                                          Exposing

                                                          realtime clock

                                                          with

                                                          nanosecond

                                                          granularity in

                                                          the JDK would

                                                          be a great

                                                          step forward.

                                                          I should have

                                                          made it clear

                                                          that I

                                                          represent

                                                          fintech corner

                                                          (investment

                                                          banking to be

                                                          exact) but the

                                                          issues my

                                                          message

                                                          touches upon

                                                          span areas

                                                          such as HPC,

                                                          audio

                                                          processing,

                                                          gaming, and

                                                          defense

                                                          industry so

                                                          it's not like

                                                          we have an

                                                          isolated case.<br>

                                                          <br>

                                                          > In a

                                                          similar vein,

                                                          if people are

                                                          finding it

                                                          necessary to

                                                          “replace parts<br>

                                                          of NIO with

                                                          hand-crafted

                                                          native code”

                                                          then it would

                                                          be interesting

                                                          to<br>

                                                          understand

                                                          what their

                                                          requirements

                                                          are<br>

                                                          <br>

                                                        </div>

                                                        As for the other

                                                        example I

                                                        provided with

                                                        making very

                                                        short lived

                                                        syscalls such as

                                                        recvmsg/recvmmsg

                                                        the premise is

                                                        getting access

                                                        to hardware

                                                        timestamps on

                                                        the ingress and

                                                        egress ends as

                                                        well as enabling

                                                        batch receive

                                                        with a single

                                                        syscall and

                                                        otherwise

                                                        exploiting

                                                        features

                                                        unavailable from

                                                        the JDK (like

                                                        access to CMSG

                                                        interface,

                                                        scatter/gather,

                                                        etc).<br>

                                                      </div>

                                                      <div>There are

                                                        also other

                                                        examples of

                                                        calls that we'd

                                                        love to make

                                                        often and at

                                                        lowest possible

                                                        cost (ie.

                                                        getrusage) but

                                                        I'm not sure if

                                                        there's a strong

                                                        case for some of

                                                        these ideas,

                                                        that's why it

                                                        might be worth

                                                        looking into

                                                        more generic

                                                        approach for

                                                        performance

                                                        sensitive code.<br>

                                                      </div>

                                                      <div>Hope this

                                                        does better job

                                                        at explaining

                                                        where we're

                                                        coming from than

                                                        my previous

                                                        messages.<br>

                                                      </div>

                                                      <div><br>

                                                      </div>

                                                      Thanks,<br>

                                                    </div>

                                                    W<br>

                                                  </div>

                                                  <br>

                                                  <div class="gmail_quote">

                                                    <div dir="ltr" class="gmail_attr">On

                                                      Tue, Jun 7, 2022

                                                      at 6:31 PM <<a href="mailto:mark.reinhold@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">mark.reinhold@oracle.com</a>> wrote:<br>

                                                    </div>

                                                    <blockquote class="gmail_quote" style="margin:0px

                                                      0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">2022/6/6

                                                      0:24:17 -0700, <a href="mailto:wkudla.kernel@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">wkudla.kernel@gmail.com</a>:<br>

                                                      >> Yes for

                                                      System.nanoTime(),

                                                      but

                                                      System.currentTimeMillis()

                                                      reports<br>

                                                      >>

                                                      CLOCK_REALTIME.<br>

                                                      > <br>

                                                      > Unfortunately

System.currentTimeMillis() offers only millisecond<br>

                                                      > granularity

                                                      which is the

                                                      reason why our

                                                      industry has to

                                                      resort to<br>

                                                      >

                                                      clock_gettime.<br>

                                                      <br>

                                                      If the platform

                                                      included, say, an

                                                      intrinsified

                                                      System.nanoRealTime()<br>

                                                      method that

                                                      returned

clock_gettime(CLOCK_REALTIME), how much would<br>

                                                      that help

                                                      developers in your

                                                      unnamed industry?<br>

                                                      <br>

                                                      In a similar vein,

                                                      if people are

                                                      finding it

                                                      necessary to

                                                      “replace parts<br>

                                                      of NIO with

                                                      hand-crafted

                                                      native code” then

                                                      it would be

                                                      interesting to<br>

                                                      understand what

                                                      their requirements

                                                      are.  Some simple

                                                      enhancements to<br>

                                                      the NIO API would

                                                      be much less

                                                      costly to design

                                                      and implement than

                                                      a<br>

                                                      generalized

                                                      user-level

                                                      native-call

                                                      intrinsification

                                                      mechanism.<br>

                                                      <br>

                                                      - Mark<br>

                                                    </blockquote>

                                                  </div>

                                                </blockquote>

                                              </div>

                                            </blockquote>

                                          </div>

                                        </blockquote>

                                      </div>

                                    </blockquote>

                                  </div>

                                </blockquote>

                              </div>

                            </blockquote>

                          </div>

                        </div>

                        -- <br>

                        <div dir="ltr">Sent from my phone</div>

                      </blockquote>

                    </div>

                  </blockquote>

                </div>

              </div>

              -- <br>

              <div dir="ltr" data-smartmail="gmail_signature">Sent from

                my phone</div>

            </blockquote>

          </div>

        </div>

        -- <br>

        <div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Sent from my phone</div>

      </blockquote>

    </blockquote>

  </body>

</html>