<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>I'm dropping most of direct recipients and going back to just use
      panama-dev and hotspot-dev, as it appears that our sever is having
      issues in handling too many recipients (the message that got
      delivered today was written few days ago :-) ).</p>
    <p>I suggest everybody doing the same, and just use mailing lists
      for further replies to this thread.</p>
    <p>Cheers<br>
      Maurizio<br>
    </p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 05/07/2022 12:33, Maurizio
      Cimadamore wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:f699a00e-4626-808e-9a37-3fb8808149b2@oracle.com">
      
      <p>Hi,<br>
        As Erik explained in his reply, what we call "critical JNI"
        comes in two pieces: one removes Java to native thread
        transitions (which is what Wojciech is referring to), while
        another part interacts with the GC locker (basically to allow
        critical JNI code to access Java arrays w/o copying). I think
        the latter part is the most problematic GC-wise.<br>
      </p>
      <p>Then, regarding the former, I think there are still questions
        as to whether dropping transitions is the best way to get the
        performance boost required; for instance, yesterday I did some
        experiments with an experimental patch from Jorn (kudos) which
        re-enables an opt-in for "trivial" native calls in the Panama
        API. I used it to test clock_gettime, and, while there's an
        improvement, the results I got were not as conclusive as one
        might expect expected. This is what I get w/ state transitions:</p>
      <p>```<br>
        Benchmark                                 Mode  Cnt   Score  
        Error  Units<br>
        ClockgettimeTest.panama_monotonic         avgt   30  27.814 ±
        0.165  ns/op<br>
        ClockgettimeTest.panama_monotonic_coarse  avgt   30  12.094 ±
        0.103  ns/op<br>
        ClockgettimeTest.panama_monotonic_raw     avgt   30  27.719 ±
        0.393  ns/op<br>
        ClockgettimeTest.panama_realtime          avgt   30  27.133 ±
        0.280  ns/op<br>
        ClockgettimeTest.panama_realtime_coarse   avgt   30  26.812 ±
        0.384  ns/op<br>
        ```<br>
      </p>
      <p>And this is what I get with transitions removed:</p>
      <p>```<br>
        Benchmark                                 Mode  Cnt   Score  
        Error  Units<br>
        ClockgettimeTest.panama_monotonic         avgt   30  22.383 ±
        0.213  ns/op<br>
        ClockgettimeTest.panama_monotonic_coarse  avgt   30   6.312 ±
        0.117  ns/op<br>
        ClockgettimeTest.panama_monotonic_raw     avgt   30  22.731 ±
        0.279  ns/op<br>
        ClockgettimeTest.panama_realtime          avgt   30  22.503 ±
        0.292  ns/op<br>
        ClockgettimeTest.panama_realtime_coarse   avgt   30  21.853 ±
        0.100  ns/op<br>
      </p>
      <p>```<br>
      </p>
      <p>Here we can see a gain of 4-5ns, obtained by dropping the
        transition. The only case where this makes a significant
        difference is with the monotonic_coarse flavor. In the other
        cases there's a difference, yes, but not as pronounced, simply
        because the term we're comparing against is bigger: it's easy to
        see a 5ns gain if your function runs for 10ns in total - but
        such a gain starts to get lost in the "noise" when functions run
        for longer. And that's the main issue with removing
        Java->native transitions: the "window" in which this
        optimization yield a positive effect is extremely narrow
        (anything lasting longer than 30ns won't probably appreciate
        much difference), but, as you can see from the PR in [1], the VM
        changes required to support it touch quite a bit of stuff!</p>
      <p>Luckily, selectively disabling transitions from Panama is
        slightly more straightforward and, perhaps, for stuff like
        recvmsg syscalls that are bypassed, there's not much else we can
        do: while one could imagine Panama special-casing calls to
        clock_gettime, as that's a known "leaf", the same cannot be done
        with rcvmsg, which is in general a blocking call. Panama also
        has a "trusted mode" flag (--enable-native-access), so there is
        a way in the Panama API to distinguish between safe and unsafe
        API point, which also helps with this. The risk of course is for
        developers to see whatever mechanism is provided as some kind of
        "make my code go fast please" and apply it blindly, w/o fully
        understanding the consequences. What I said before about
        "extremely narrow window" remains true: in the vast majority of
        cases (like 99%) dropping state transitions can result in very
        big downsides, while the corresponding upsides are not big
        enough to even be noticeable (the Q/A in [2] arrives at a very
        similar conclusion).<br>
      </p>
      <p>All this said, selectively disabling state transitions from
        native calls made using the Panama foreign API seem the most
        straightforward way to offset the performance delta introduced
        by the removal of critical JNI. In part it's because the Panama
        API is more flexible, e.g. function descriptors allows us to
        model the distinction between a trivial and non-trivial call; in
        part it's because, as stated above, Panama can already reason
        about calls that are "unsafe" and that require extra
        permissions. And, finally it's also because, if we added back
        critical JNI, we'd probably add it back w/o its most problematic
        GC locker parts (that's what [1] does AFAIK) - which means it
        won't be a complete code reversal. So, perhaps, coming up with a
        fresh mechanism to drop transitions (only) could also be less
        confusing for developers. Of course this would require
        developers such as Wojciech to rewrite some of the code to use
        Panama instead of JNI.</p>
      <p>And, coming back to clock_gettime, my feeling is that with the
        right tools (e.g. some intrinsics), we can make that go a lot
        faster than what shown above. Being able to quickly get a
        timestamp seems a widely-enough applicable use case to deserves
        some special treatment. So, perhaps, it's worth considering a
        _spectrum of solutions_ on how to improve the status quo, rather
        than investing solely on the removal of thread transitions.<br>
      </p>
      <p>Maurizio<br>
      </p>
      <p>[1] - <a class="moz-txt-link-freetext" href="https://github.com/openjdk/jdk19/pull/90/files" moz-do-not-send="true">https://github.com/openjdk/jdk19/pull/90/files<br>
        </a>[2] - <a class="moz-txt-link-freetext" href="https://youtu.be/LoyBTqkSkZk?t=742" moz-do-not-send="true">https://youtu.be/LoyBTqkSkZk?t=742</a></p>
      <p><br>
      </p>
      <div class="moz-cite-prefix">On 04/07/2022 18:38, Vitaly
        Davidovich wrote:<br>
      </div>
      <blockquote type="cite" cite="mid:CAHjP37E62eEbrDtS7HF0eZ0wA65xTWVF_eqpZFubP=4PTXEYVg@mail.gmail.com">
        <div dir="auto">To not sidetrack this thread with my previous
          reply:</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Maurizio - are you saying java criticals are
          *already* hindering ZGC and/or other planned Hotspot
          improvements? Or that theoretically they could and you’d like
          to remove/deprecate them now(ish)?</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">If it’s the former, perhaps it’s prudent to keep
          them around until a compelling case surfaces where they
          preclude or severely restrict evolution of the platform? If
          it’s the former, would be curious what that is but would also
          understand the rationale behind wanting to remove it.</div>
        <div><br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Mon, Jul 4, 2022 at
              1:26 PM Vitaly Davidovich <<a href="mailto:vitalyd@gmail.com" moz-do-not-send="true" class="moz-txt-link-freetext">vitalyd@gmail.com</a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
              <div><br>
              </div>
              <div><br>
                <div class="gmail_quote">
                  <div dir="ltr" class="gmail_attr">On Mon, Jul 4, 2022
                    at 1:13 PM Wojciech Kudla <<a href="mailto:wkudla.kernel@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">wkudla.kernel@gmail.com</a>>
                    wrote:<br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px
                    0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                    <div dir="ltr">
                      <div>
                        <div>
                          <div>Thanks for your input, Vitaly. I'd be
                            interested to find out more about the nature
                            of the HW noise you observed in your
                            benchmarks as our results were very
                            consistent and it was pretty straightforward
                            to pinpoint the culprit as JNI call
                            overhead. Maybe it was just easier for us
                            because we disallow C- and P-state
                            transitions and put a lot of effort to
                            eliminate platform jitter in general. Were
                            you maybe running on a CPU model that
                            doesn't support constant TSC? I would also
                            suggest retrying with LAPIC interrupts
                            suppressed (with: cli/sti) to maybe see if
                            it's the kernel and not the hardware.</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <div dir="auto">This was on a Broadwell Xeon chipset
                    with constant tsc.  All the typical jitter sources
                    were reduced: C/P states disabled in bios, max turbo
                    enabled, IRQs steered away, core isolated, etc.  By
                    the way, by noise I don’t mean the results
                    themselves were noisy - they were constant run to
                    run.  I just meant the delta between normal vs
                    critical JNI entrypoints was very minimal - ie “in
                    the noise”, particularly with rdtsc.</div>
                  <div dir="auto"><br>
                  </div>
                  <div dir="auto">I can try to remeasure on newer Intel
                    but see below …</div>
                  <blockquote class="gmail_quote" style="margin:0px 0px
                    0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                    <div dir="ltr">
                      <div>
                        <div>
                          <div dir="auto"><br>
                            <br>
                          </div>
                          100% agree on rdtsc(p) and snippets. There are
                          some narrow usecases were one can get some
                          substantial speed ups with direct access to
                          prefetch or by abusing misprediction to keep
                          icache hot. These scenarios are sadly only
                          available with inline assembly. I know of a
                          few shops that go to the length of forking
                          Graal, etc to achieve that but am quite
                          convinced such capabilities would be welcome
                          and utilized by many more groups if they were
                          easily accessible from java.</div>
                      </div>
                    </div>
                  </blockquote>
                  <div dir="auto">I’m of the firm (and perhaps
                    controversial for some :)) opinion these days that
                    Java is simply the wrong platform/tool for low
                    latency cases that warrant this level of control. 
                    There’re very strong headwinds even outside of JNI
                    costs.  And the “real” problem with JNI, besides
                    transition costs, is lack of inlining into the
                    native calls.  So even if JVM transition costs are
                    fully eliminated, there’s still an optimization
                    fence due to lost inlining (not unlike native code
                    calling native fns via shared libs).</div>
                  <div dir="auto"><br>
                  </div>
                  <div dir="auto">That’s not say that perf regressions
                    are welcomed - nobody likes those :).</div>
                </div>
              </div>
              <div>
                <div class="gmail_quote">
                  <blockquote class="gmail_quote" style="margin:0px 0px
                    0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                    <div dir="ltr">
                      <div>
                        <div dir="auto"><br>
                          <br>
                        </div>
                        Thanks,<br>
                      </div>
                      W.<br>
                    </div>
                    <br>
                    <div class="gmail_quote">
                      <div dir="ltr" class="gmail_attr">On Mon, Jul 4,
                        2022 at 5:51 PM Vitaly Davidovich <<a href="mailto:vitalyd@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">vitalyd@gmail.com</a>>
                        wrote:<br>
                      </div>
                      <blockquote class="gmail_quote" style="margin:0px
                        0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                        <div dir="auto">I’d add rdtsc(p) wrapper
                          functions to the list.  These are usually
                          either inline asm or compiler intrinsic in the
                          JNI entrypoint.  In addition, any native libs
                          exposed via JNI that have “trivial” functions
                          are also candidates for faster calling
                          conventions.  There’re sometimes way to
                          mitigate the call overhead (eg batching) but
                          it’s not always feasible.</div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">I’ll add that last time I tried
                          to measure the improvement of Java criticals
                          for clock_gettime (and rdtsc) it looked to be
                          in the noise on the hardware I was testing
                          on.  It got the point where I had to
                          instrument the critical and normal JNI
                          entrypoints to confirm the critical was being
                          hit.  The critical calling convention isn’t
                          significantly different *if* basic primitives
                          (or no args at all) are passed as args. 
                          JNIEnv*, IIRC, is loaded from a register so
                          that’s minor.  jclass (for static calls, which
                          is what’s relevant here) should be a compiled
                          constant.  Critical call still has a GCLocker
                          check.  So I’m not actually sure what the
                          significant difference is for “lightweight”
                          (ie few primitive or no args, primitive return
                          types) calls.</div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">In general, I do think it’d be
                          nice if there was a faster native call
                          sequence, even if it comes with a caveat
                          emptor and/or special requirements on the
                          callee (not unlike the requirements for
                          criticals).  I think Vladimir Ivanov was
                          working on “snippets” that allowed dynamic
                          construction of a native call, possibly
                          including assembly.  Not sure where that
                          exploration is these days, but that would be a
                          welcome capability.</div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">My $.02.  Happy 4th of July for
                          those celebrating!</div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">Vitaly</div>
                        <div><br>
                          <div class="gmail_quote">
                            <div dir="ltr" class="gmail_attr">On Mon,
                              Jul 4, 2022 at 12:04 PM Maurizio
                              Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>
                              wrote:<br>
                            </div>
                            <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                              <div>
                                <p>Hi,<br>
                                  while I'm not an expert with some of
                                  the IO calls you mention (some of my
                                  colleagues are more knowledgeable in
                                  this area, so I'm sure they will have
                                  more info), my general sense is that,
                                  as with getrusage, if there is a
                                  system call involved, you already pay
                                  a hefty price for the user to kernel
                                  transition. On my machine this seem to
                                  cost around 200ns. In these cases,
                                  using JNI critical to shave off a
                                  dozen of nanoseconds (at best!) seems
                                  just not worth it.</p>
                                <p>So, of the functions in your list,
                                  the ones in which I *believe* 
                                  dropping transitions would have the
                                  most effect are (if we exclude getpid,
                                  for which another approach is
                                  possible) clock_gettime and getcpu, I
                                  believe, as they might use vdso [1],
                                  which typically brings the performance
                                  of these call closer to calls to
                                  shared lib functions.<br>
                                </p>
                                <p>If you have examples e.g. where
                                  performance of recvmsg (or related
                                  calls) varies significantly between
                                  base JNI and critical JNI, please send
                                  them our way; I'm sure some of my
                                  colleagues would be intersted to take
                                  a look.<br>
                                </p>
                                <p>Popping back a couple of levels, I
                                  think it would be helpful to also
                                  define what's an acceptable regression
                                  in this context. Of course, in an
                                  ideal world,  we'd like to see no
                                  performance regression at all. But JNI
                                  critical is an unsupported interface,
                                  which might misbehave with modern
                                  garbage collectors (e.g. ZGC) and that
                                  requires quite a bit of internal
                                  complexity which might, in the
                                  medium/long run, hinder the evolution
                                  of the Java platform (all these things
                                  have _some_ cost, even if the cost is
                                  not directly material to developers).
                                  In this vein, I think calls like
                                  clock_gettime tend to be more
                                  problematic: as they complete very
                                  quickly, you see the cost of
                                  transitions a lot more. In other
                                  cases, where syscalls are involved,
                                  the cost associated to transitions are
                                  more likely to be "in the noise". Of
                                  course if we look at absolute numbers,
                                  dropping transitions would always
                                  yield "faster" code; but at the same
                                  time, going from 250ns to 245ns is
                                  very unlikely to result in visible
                                  performance difference when
                                  considering an application as a whole,
                                  so I think it's critical here to
                                  decide _which_ use cases to
                                  prioritize.<br>
                                </p>
                                <p>I think a good outcome of this
                                  discussion would be if we could come
                                  to some shared understanding of which
                                  native calls are truly problematic
                                  (e.g. clock_gettime-like), and then
                                  for the JDK to provide better (and
                                  more maintainable) alternatives for
                                  those (which might even be faster than
                                  using critical JNI).<br>
                                </p>
                                <p>Thanks<br>
                                  Maurizio<br>
                                </p>
                                <p>[1] - <a href="https://man7.org/linux/man-pages/man7/vdso.7.html" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://man7.org/linux/man-pages/man7/vdso.7.html</a><br>
                                </p>
                              </div>
                              <div>
                                <div>On 04/07/2022 12:23, Wojciech Kudla
                                  wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div>
                                      <div>Thanks Maurizio,<br>
                                        <br>
                                      </div>
                                      I raised this case mainly about
                                      clock_gettime and recvmsg/sendmsg,
                                      I think we're focusing on the
                                      wrong things here. Feel free to
                                      drop the two syscalls from the
                                      discussion entirely, but the main
                                      usecases I have been presenting
                                      throughout this thread definitely
                                      stand.<br>
                                      <br>
                                    </div>
                                    <div>Thanks<br>
                                    </div>
                                    <br>
                                  </div>
                                  <br>
                                  <div class="gmail_quote">
                                    <div dir="ltr" class="gmail_attr">On
                                      Mon, Jul 4, 2022 at 10:54 AM
                                      Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>
                                      wrote:<br>
                                    </div>
                                    <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                                      <div>
                                        <p>Hi Wojtek,<br>
                                          thanks for sharing this list,
                                          I think this is a good
                                          starting point to understand
                                          more about your use case.</p>
                                        <p>Last week I've been looking
                                          at "getrusage" (as you
                                          mentioned it in an earlier
                                          email), and I was surprised to
                                          see that the call took a
                                          pointer to a (fairly big)
                                          struct which then needed to be
                                          initialized with some
                                          thread-local state:</p>
                                        <p><a href="https://man7.org/linux/man-pages/man2/getrusage.2.html" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://man7.org/linux/man-pages/man2/getrusage.2.html</a></p>
                                        <p>I've looked at the
                                          implementation, and it seems
                                          to be doing memset on the
                                          user-provided struct pointer,
                                          plus all the fields
                                          assignment. Eyeballing the
                                          implementation, this does not
                                          seem to me like a "classic"
                                          use case where dropping
                                          transition would help much. I
                                          mean, surely dropping
                                          transitions would help shaving
                                          some nanoseconds off the call,
                                          but it doesn't seem to me that
                                          the call would be shortlived
                                          enough to make a difference.
                                          Do you have some benchmarks on
                                          this one? I did some [1] and
                                          the call overhead seemed to
                                          come up at 260ns/op - w/o
                                          transition you might perhaps
                                          be able to get to 250ns, but
                                          that's in the noise?<br>
                                        </p>
                                        <p>As for getpid, note that you
                                          can do (since Java 9):<br>
                                          <br>
                                          ProcessHandle.current().pid();<br>
                                          <br>
                                          I believe the impl caches the
                                          result, so it shouldn't even
                                          make the native call.<br>
                                        </p>
                                        <p>Maurizio</p>
                                        <p>[1] - <a href="http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java</a><br>
                                        </p>
                                        <div>On 02/07/2022 07:42,
                                          Wojciech Kudla wrote:<br>
                                        </div>
                                        <blockquote type="cite">
                                          <div dir="ltr">
                                            <div>
                                              <div>Hi Maurizio,<br>
                                                <br>
                                              </div>
                                              Thanks for staying on
                                              this.<br>
                                              <br>
                                              > Could you please
                                              provide a rough list of
                                              the native calls you make
                                              where you believe critical
                                              JNI is having a real
                                              impact in the performance
                                              of your application?<br>
                                            </div>
                                            <div><br>
                                              From the top of my head:<br>
                                            </div>
                                            <div>clock_gettime<br>
                                            </div>
                                            <div>recvmsg<br>
                                            </div>
                                            <div>recvmmsg</div>
                                            <div>sendmsg<br>
                                            </div>
                                            <div>sendmmsg</div>
                                            <div>select<br>
                                            </div>
                                            <div>getpid</div>
                                            <div>getcpu<br>
                                            </div>
                                            <div>getrusage<br>
                                            </div>
                                            <div><br>
                                            </div>
                                            <div>> Also, could you
                                              please tell us whether any
                                              of these calls need to
                                              interact with Java arrays?<br>
                                            </div>
                                            <div>No arrays or objects of
                                              any type involved.
                                              Everything happens by the
                                              means of passing raw
                                              pointers as longs and
                                              using other primitive
                                              types as function
                                              arguments.<br>
                                            </div>
                                            <div><br>
                                              > In other words, do
                                              you use critical JNI to
                                              remove the cost associated
                                              with thread transitions,
                                              or are you also taking
                                              advantage of accessing
                                              on-heap memory _directly_
                                              from native code?<br>
                                            </div>
                                            <div>Criticial JNI natives
                                              are used solely to remove
                                              the cost of transitions.
                                              We don't get anywhere near
                                              java heap in native code.<br>
                                              <br>
                                            </div>
                                            <div>In general I think it
                                              makes a lot of sense for
                                              Java as a
                                              language/platform to have
                                              some guards around unsafe
                                              code, but on the other
                                              hand the popularity of
                                              libraries employing Unsafe
                                              and their success in more
                                              performance-oriented
                                              corners of software
                                              engineering is a clear
                                              indicator there is a need
                                              for the JVM to provide
                                              access to more low-level
                                              primitives and mechanisms.
                                              <br>
                                            </div>
                                            <div>I think it's entirely
                                              fair to tell developers
                                              that all bets are off when
                                              they get into some
                                              non-idiomatic scenarios
                                              but please don't take away
                                              a feature that greatly
                                              contributed to Java's
                                              success.<br>
                                              <br>
                                            </div>
                                            <div>Kind regards,<br>
                                            </div>
                                            <div>Wojtek<br>
                                            </div>
                                          </div>
                                          <br>
                                          <div class="gmail_quote">
                                            <div dir="ltr" class="gmail_attr">On Wed,
                                              Jun 29, 2022 at 5:20 PM
                                              Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>
                                              wrote:<br>
                                            </div>
                                            <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                                              <div>
                                                <p>Hi Wojciech,<br>
                                                  picking up this thread
                                                  again. After some
                                                  internal discussion,
                                                  we realize that we
                                                  don't know enough
                                                  about your use case.
                                                  While re-enabling JNI
                                                  critical would
                                                  obviously provide a
                                                  quick fix, we're
                                                  afraid that (a)
                                                  developers might end
                                                  up depending on JNI
                                                  critical when they
                                                  don't need to (perhaps
                                                  also unaware of the
                                                  consequences of
                                                  depending on it) and
                                                  (b) that there might
                                                  actually be _better_
                                                  (as in: much faster)
                                                  solutions than using
                                                  critical native calls
                                                  to address at least
                                                  some of your use cases
                                                  (that seemed to be the
                                                  case with the
                                                  clock_gettime example
                                                  you mentioned). Could
                                                  you please provide a
                                                  rough list of the
                                                  native calls you make
                                                  where you believe
                                                  critical JNI is having
                                                  a real impact in the
                                                  performance of your
                                                  application? Also,
                                                  could you please tell
                                                  us whether any of
                                                  these calls need to
                                                  interact with Java
                                                  arrays? In other
                                                  words, do you use
                                                  critical JNI to remove
                                                  the cost associated
                                                  with thread
                                                  transitions, or are
                                                  you also taking
                                                  advantage of accessing
                                                  on-heap memory
                                                  _directly_ from native
                                                  code?</p>
                                                <p>Regards<br>
                                                  Maurizio<br>
                                                </p>
                                                <div>On 13/06/2022
                                                  21:38, Wojciech Kudla
                                                  wrote:<br>
                                                </div>
                                                <blockquote type="cite">
                                                  <div dir="ltr">
                                                    <div>
                                                      <div>
                                                        <div>
                                                          <div>
                                                          <div>Hi Mark,<br>
                                                          <br>
                                                          </div>
                                                          Thanks for
                                                          your input and
                                                          apologies for
                                                          the delayed
                                                          response.<br>
                                                          <br>
                                                          > If the
                                                          platform
                                                          included, say,
                                                          an
                                                          intrinsified
                                                          System.nanoRealTime()<br>
                                                          method that
                                                          returned
                                                          clock_gettime(CLOCK_REALTIME),
                                                          how much would<br>
                                                          that help
                                                          developers in
                                                          your unnamed
                                                          industry?<br>
                                                          <br>
                                                          </div>
                                                          Exposing
                                                          realtime clock
                                                          with
                                                          nanosecond
                                                          granularity in
                                                          the JDK would
                                                          be a great
                                                          step forward.
                                                          I should have
                                                          made it clear
                                                          that I
                                                          represent
                                                          fintech corner
                                                          (investment
                                                          banking to be
                                                          exact) but the
                                                          issues my
                                                          message
                                                          touches upon
                                                          span areas
                                                          such as HPC,
                                                          audio
                                                          processing,
                                                          gaming, and
                                                          defense
                                                          industry so
                                                          it's not like
                                                          we have an
                                                          isolated case.<br>
                                                          <br>
                                                          > In a
                                                          similar vein,
                                                          if people are
                                                          finding it
                                                          necessary to
                                                          “replace parts<br>
                                                          of NIO with
                                                          hand-crafted
                                                          native code”
                                                          then it would
                                                          be interesting
                                                          to<br>
                                                          understand
                                                          what their
                                                          requirements
                                                          are<br>
                                                          <br>
                                                        </div>
                                                        As for the other
                                                        example I
                                                        provided with
                                                        making very
                                                        short lived
                                                        syscalls such as
                                                        recvmsg/recvmmsg
                                                        the premise is
                                                        getting access
                                                        to hardware
                                                        timestamps on
                                                        the ingress and
                                                        egress ends as
                                                        well as enabling
                                                        batch receive
                                                        with a single
                                                        syscall and
                                                        otherwise
                                                        exploiting
                                                        features
                                                        unavailable from
                                                        the JDK (like
                                                        access to CMSG
                                                        interface,
                                                        scatter/gather,
                                                        etc).<br>
                                                      </div>
                                                      <div>There are
                                                        also other
                                                        examples of
                                                        calls that we'd
                                                        love to make
                                                        often and at
                                                        lowest possible
                                                        cost (ie.
                                                        getrusage) but
                                                        I'm not sure if
                                                        there's a strong
                                                        case for some of
                                                        these ideas,
                                                        that's why it
                                                        might be worth
                                                        looking into
                                                        more generic
                                                        approach for
                                                        performance
                                                        sensitive code.<br>
                                                      </div>
                                                      <div>Hope this
                                                        does better job
                                                        at explaining
                                                        where we're
                                                        coming from than
                                                        my previous
                                                        messages.<br>
                                                      </div>
                                                      <div><br>
                                                      </div>
                                                      Thanks,<br>
                                                    </div>
                                                    W<br>
                                                  </div>
                                                  <br>
                                                  <div class="gmail_quote">
                                                    <div dir="ltr" class="gmail_attr">On
                                                      Tue, Jun 7, 2022
                                                      at 6:31 PM <<a href="mailto:mark.reinhold@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">mark.reinhold@oracle.com</a>> wrote:<br>
                                                    </div>
                                                    <blockquote class="gmail_quote" style="margin:0px
                                                      0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">2022/6/6
                                                      0:24:17 -0700, <a href="mailto:wkudla.kernel@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">wkudla.kernel@gmail.com</a>:<br>
                                                      >> Yes for
                                                      System.nanoTime(),
                                                      but
                                                      System.currentTimeMillis()
                                                      reports<br>
                                                      >>
                                                      CLOCK_REALTIME.<br>
                                                      > <br>
                                                      > Unfortunately
System.currentTimeMillis() offers only millisecond<br>
                                                      > granularity
                                                      which is the
                                                      reason why our
                                                      industry has to
                                                      resort to<br>
                                                      >
                                                      clock_gettime.<br>
                                                      <br>
                                                      If the platform
                                                      included, say, an
                                                      intrinsified
                                                      System.nanoRealTime()<br>
                                                      method that
                                                      returned
clock_gettime(CLOCK_REALTIME), how much would<br>
                                                      that help
                                                      developers in your
                                                      unnamed industry?<br>
                                                      <br>
                                                      In a similar vein,
                                                      if people are
                                                      finding it
                                                      necessary to
                                                      “replace parts<br>
                                                      of NIO with
                                                      hand-crafted
                                                      native code” then
                                                      it would be
                                                      interesting to<br>
                                                      understand what
                                                      their requirements
                                                      are.  Some simple
                                                      enhancements to<br>
                                                      the NIO API would
                                                      be much less
                                                      costly to design
                                                      and implement than
                                                      a<br>
                                                      generalized
                                                      user-level
                                                      native-call
                                                      intrinsification
                                                      mechanism.<br>
                                                      <br>
                                                      - Mark<br>
                                                    </blockquote>
                                                  </div>
                                                </blockquote>
                                              </div>
                                            </blockquote>
                                          </div>
                                        </blockquote>
                                      </div>
                                    </blockquote>
                                  </div>
                                </blockquote>
                              </div>
                            </blockquote>
                          </div>
                        </div>
                        -- <br>
                        <div dir="ltr">Sent from my phone</div>
                      </blockquote>
                    </div>
                  </blockquote>
                </div>
              </div>
              -- <br>
              <div dir="ltr" data-smartmail="gmail_signature">Sent from
                my phone</div>
            </blockquote>
          </div>
        </div>
        -- <br>
        <div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Sent from my phone</div>
      </blockquote>
    </blockquote>
  </body>
</html>