<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>Hi,<br>
      As Erik explained in his reply, what we call "critical JNI" comes
      in two pieces: one removes Java to native thread transitions
      (which is what Wojciech is referring to), while another part
      interacts with the GC locker (basically to allow critical JNI code
      to access Java arrays w/o copying). I think the latter part is the
      most problematic GC-wise.<br>
    </p>
    <p>Then, regarding the former, I think there are still questions as
      to whether dropping transitions is the best way to get the
      performance boost required; for instance, yesterday I did some
      experiments with an experimental patch from Jorn (kudos) which
      re-enables an opt-in for "trivial" native calls in the Panama API.
      I used it to test clock_gettime, and, while there's an
      improvement, the results I got were not as conclusive as one might
      expect expected. This is what I get w/ state transitions:</p>
    <p>```<br>
      Benchmark                                 Mode  Cnt   Score  
      Error  Units<br>
      ClockgettimeTest.panama_monotonic         avgt   30  27.814 ±
      0.165  ns/op<br>
      ClockgettimeTest.panama_monotonic_coarse  avgt   30  12.094 ±
      0.103  ns/op<br>
      ClockgettimeTest.panama_monotonic_raw     avgt   30  27.719 ±
      0.393  ns/op<br>
      ClockgettimeTest.panama_realtime          avgt   30  27.133 ±
      0.280  ns/op<br>
      ClockgettimeTest.panama_realtime_coarse   avgt   30  26.812 ±
      0.384  ns/op<br>
      ```<br>
    </p>
    <p>And this is what I get with transitions removed:</p>
    <p>```<br>
      Benchmark                                 Mode  Cnt   Score  
      Error  Units<br>
      ClockgettimeTest.panama_monotonic         avgt   30  22.383 ±
      0.213  ns/op<br>
      ClockgettimeTest.panama_monotonic_coarse  avgt   30   6.312 ±
      0.117  ns/op<br>
      ClockgettimeTest.panama_monotonic_raw     avgt   30  22.731 ±
      0.279  ns/op<br>
      ClockgettimeTest.panama_realtime          avgt   30  22.503 ±
      0.292  ns/op<br>
      ClockgettimeTest.panama_realtime_coarse   avgt   30  21.853 ±
      0.100  ns/op<br>
    </p>
    <p>```<br>
    </p>
    <p>Here we can see a gain of 4-5ns, obtained by dropping the
      transition. The only case where this makes a significant
      difference is with the monotonic_coarse flavor. In the other cases
      there's a difference, yes, but not as pronounced, simply because
      the term we're comparing against is bigger: it's easy to see a 5ns
      gain if your function runs for 10ns in total - but such a gain
      starts to get lost in the "noise" when functions run for longer.
      And that's the main issue with removing Java->native
      transitions: the "window" in which this optimization yield a
      positive effect is extremely narrow (anything lasting longer than
      30ns won't probably appreciate much difference), but, as you can
      see from the PR in [1], the VM changes required to support it
      touch quite a bit of stuff!</p>
    <p>Luckily, selectively disabling transitions from Panama is
      slightly more straightforward and, perhaps, for stuff like recvmsg
      syscalls that are bypassed, there's not much else we can do: while
      one could imagine Panama special-casing calls to clock_gettime, as
      that's a known "leaf", the same cannot be done with rcvmsg, which
      is in general a blocking call. Panama also has a "trusted mode"
      flag (--enable-native-access), so there is a way in the Panama API
      to distinguish between safe and unsafe API point, which also helps
      with this. The risk of course is for developers to see whatever
      mechanism is provided as some kind of "make my code go fast
      please" and apply it blindly, w/o fully understanding the
      consequences. What I said before about "extremely narrow window"
      remains true: in the vast majority of cases (like 99%) dropping
      state transitions can result in very big downsides, while the
      corresponding upsides are not big enough to even be noticeable
      (the Q/A in [2] arrives at a very similar conclusion).<br>
    </p>
    <p>All this said, selectively disabling state transitions from
      native calls made using the Panama foreign API seem the most
      straightforward way to offset the performance delta introduced by
      the removal of critical JNI. In part it's because the Panama API
      is more flexible, e.g. function descriptors allows us to model the
      distinction between a trivial and non-trivial call; in part it's
      because, as stated above, Panama can already reason about calls
      that are "unsafe" and that require extra permissions. And, finally
      it's also because, if we added back critical JNI, we'd probably
      add it back w/o its most problematic GC locker parts (that's what
      [1] does AFAIK) - which means it won't be a complete code
      reversal. So, perhaps, coming up with a fresh mechanism to drop
      transitions (only) could also be less confusing for developers. Of
      course this would require developers such as Wojciech to rewrite
      some of the code to use Panama instead of JNI.</p>
    <p>And, coming back to clock_gettime, my feeling is that with the
      right tools (e.g. some intrinsics), we can make that go a lot
      faster than what shown above. Being able to quickly get a
      timestamp seems a widely-enough applicable use case to deserves
      some special treatment. So, perhaps, it's worth considering a
      _spectrum of solutions_ on how to improve the status quo, rather
      than investing solely on the removal of thread transitions.<br>
    </p>
    <p>Maurizio<br>
    </p>
    <p>[1] - <a class="moz-txt-link-freetext" href="https://github.com/openjdk/jdk19/pull/90/files">https://github.com/openjdk/jdk19/pull/90/files<br>
      </a>[2] - <a class="moz-txt-link-freetext" href="https://youtu.be/LoyBTqkSkZk?t=742">https://youtu.be/LoyBTqkSkZk?t=742</a></p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 04/07/2022 18:38, Vitaly Davidovich
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:CAHjP37E62eEbrDtS7HF0eZ0wA65xTWVF_eqpZFubP=4PTXEYVg@mail.gmail.com">
      <div dir="auto">To not sidetrack this thread with my previous
        reply:</div>
      <div dir="auto"><br>
      </div>
      <div dir="auto">Maurizio - are you saying java criticals are
        *already* hindering ZGC and/or other planned Hotspot
        improvements? Or that theoretically they could and you’d like to
        remove/deprecate them now(ish)?</div>
      <div dir="auto"><br>
      </div>
      <div dir="auto">If it’s the former, perhaps it’s prudent to keep
        them around until a compelling case surfaces where they preclude
        or severely restrict evolution of the platform? If it’s the
        former, would be curious what that is but would also understand
        the rationale behind wanting to remove it.</div>
      <div><br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Mon, Jul 4, 2022 at 1:26
            PM Vitaly Davidovich <<a href="mailto:vitalyd@gmail.com" moz-do-not-send="true" class="moz-txt-link-freetext">vitalyd@gmail.com</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
            <div><br>
            </div>
            <div><br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Mon, Jul 4, 2022 at
                  1:13 PM Wojciech Kudla <<a href="mailto:wkudla.kernel@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">wkudla.kernel@gmail.com</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                  <div dir="ltr">
                    <div>
                      <div>
                        <div>Thanks for your input, Vitaly. I'd be
                          interested to find out more about the nature
                          of the HW noise you observed in your
                          benchmarks as our results were very consistent
                          and it was pretty straightforward to pinpoint
                          the culprit as JNI call overhead. Maybe it was
                          just easier for us because we disallow C- and
                          P-state transitions and put a lot of effort to
                          eliminate platform jitter in general. Were you
                          maybe running on a CPU model that doesn't
                          support constant TSC? I would also suggest
                          retrying with LAPIC interrupts suppressed
                          (with: cli/sti) to maybe see if it's the
                          kernel and not the hardware.</div>
                      </div>
                    </div>
                  </div>
                </blockquote>
                <div dir="auto">This was on a Broadwell Xeon chipset
                  with constant tsc.  All the typical jitter sources
                  were reduced: C/P states disabled in bios, max turbo
                  enabled, IRQs steered away, core isolated, etc.  By
                  the way, by noise I don’t mean the results themselves
                  were noisy - they were constant run to run.  I just
                  meant the delta between normal vs critical JNI
                  entrypoints was very minimal - ie “in the noise”,
                  particularly with rdtsc.</div>
                <div dir="auto"><br>
                </div>
                <div dir="auto">I can try to remeasure on newer Intel
                  but see below …</div>
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                  <div dir="ltr">
                    <div>
                      <div>
                        <div dir="auto"><br>
                          <br>
                        </div>
                        100% agree on rdtsc(p) and snippets. There are
                        some narrow usecases were one can get some
                        substantial speed ups with direct access to
                        prefetch or by abusing misprediction to keep
                        icache hot. These scenarios are sadly only
                        available with inline assembly. I know of a few
                        shops that go to the length of forking Graal,
                        etc to achieve that but am quite convinced such
                        capabilities would be welcome and utilized by
                        many more groups if they were easily accessible
                        from java.</div>
                    </div>
                  </div>
                </blockquote>
                <div dir="auto">I’m of the firm (and perhaps
                  controversial for some :)) opinion these days that
                  Java is simply the wrong platform/tool for low latency
                  cases that warrant this level of control.  There’re
                  very strong headwinds even outside of JNI costs.  And
                  the “real” problem with JNI, besides transition costs,
                  is lack of inlining into the native calls.  So even if
                  JVM transition costs are fully eliminated, there’s
                  still an optimization fence due to lost inlining (not
                  unlike native code calling native fns via shared
                  libs).</div>
                <div dir="auto"><br>
                </div>
                <div dir="auto">That’s not say that perf regressions are
                  welcomed - nobody likes those :).</div>
              </div>
            </div>
            <div>
              <div class="gmail_quote">
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                  <div dir="ltr">
                    <div>
                      <div dir="auto"><br>
                        <br>
                      </div>
                      Thanks,<br>
                    </div>
                    W.<br>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Mon, Jul 4,
                      2022 at 5:51 PM Vitaly Davidovich <<a href="mailto:vitalyd@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">vitalyd@gmail.com</a>>
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                      <div dir="auto">I’d add rdtsc(p) wrapper functions
                        to the list.  These are usually either inline
                        asm or compiler intrinsic in the JNI
                        entrypoint.  In addition, any native libs
                        exposed via JNI that have “trivial” functions
                        are also candidates for faster calling
                        conventions.  There’re sometimes way to mitigate
                        the call overhead (eg batching) but it’s not
                        always feasible.</div>
                      <div dir="auto"><br>
                      </div>
                      <div dir="auto">I’ll add that last time I tried to
                        measure the improvement of Java criticals for
                        clock_gettime (and rdtsc) it looked to be in the
                        noise on the hardware I was testing on.  It got
                        the point where I had to instrument the critical
                        and normal JNI entrypoints to confirm the
                        critical was being hit.  The critical calling
                        convention isn’t significantly different *if*
                        basic primitives (or no args at all) are passed
                        as args.  JNIEnv*, IIRC, is loaded from a
                        register so that’s minor.  jclass (for static
                        calls, which is what’s relevant here) should be
                        a compiled constant.  Critical call still has a
                        GCLocker check.  So I’m not actually sure what
                        the significant difference is for “lightweight”
                        (ie few primitive or no args, primitive return
                        types) calls.</div>
                      <div dir="auto"><br>
                      </div>
                      <div dir="auto">In general, I do think it’d be
                        nice if there was a faster native call sequence,
                        even if it comes with a caveat emptor and/or
                        special requirements on the callee (not unlike
                        the requirements for criticals).  I think
                        Vladimir Ivanov was working on “snippets” that
                        allowed dynamic construction of a native call,
                        possibly including assembly.  Not sure where
                        that exploration is these days, but that would
                        be a welcome capability.</div>
                      <div dir="auto"><br>
                      </div>
                      <div dir="auto">My $.02.  Happy 4th of July for
                        those celebrating!</div>
                      <div dir="auto"><br>
                      </div>
                      <div dir="auto">Vitaly</div>
                      <div><br>
                        <div class="gmail_quote">
                          <div dir="ltr" class="gmail_attr">On Mon, Jul
                            4, 2022 at 12:04 PM Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>
                            wrote:<br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                            <div>
                              <p>Hi,<br>
                                while I'm not an expert with some of the
                                IO calls you mention (some of my
                                colleagues are more knowledgeable in
                                this area, so I'm sure they will have
                                more info), my general sense is that, as
                                with getrusage, if there is a system
                                call involved, you already pay a hefty
                                price for the user to kernel transition.
                                On my machine this seem to cost around
                                200ns. In these cases, using JNI
                                critical to shave off a dozen of
                                nanoseconds (at best!) seems just not
                                worth it.</p>
                              <p>So, of the functions in your list, the
                                ones in which I *believe*  dropping
                                transitions would have the most effect
                                are (if we exclude getpid, for which
                                another approach is possible)
                                clock_gettime and getcpu, I believe, as
                                they might use vdso [1], which typically
                                brings the performance of these call
                                closer to calls to shared lib functions.<br>
                              </p>
                              <p>If you have examples e.g. where
                                performance of recvmsg (or related
                                calls) varies significantly between base
                                JNI and critical JNI, please send them
                                our way; I'm sure some of my colleagues
                                would be intersted to take a look.<br>
                              </p>
                              <p>Popping back a couple of levels, I
                                think it would be helpful to also define
                                what's an acceptable regression in this
                                context. Of course, in an ideal world, 
                                we'd like to see no performance
                                regression at all. But JNI critical is
                                an unsupported interface, which might
                                misbehave with modern garbage collectors
                                (e.g. ZGC) and that requires quite a bit
                                of internal complexity which might, in
                                the medium/long run, hinder the
                                evolution of the Java platform (all
                                these things have _some_ cost, even if
                                the cost is not directly material to
                                developers). In this vein, I think calls
                                like clock_gettime tend to be more
                                problematic: as they complete very
                                quickly, you see the cost of transitions
                                a lot more. In other cases, where
                                syscalls are involved, the cost
                                associated to transitions are more
                                likely to be "in the noise". Of course
                                if we look at absolute numbers, dropping
                                transitions would always yield "faster"
                                code; but at the same time, going from
                                250ns to 245ns is very unlikely to
                                result in visible performance difference
                                when considering an application as a
                                whole, so I think it's critical here to
                                decide _which_ use cases to prioritize.<br>
                              </p>
                              <p>I think a good outcome of this
                                discussion would be if we could come to
                                some shared understanding of which
                                native calls are truly problematic (e.g.
                                clock_gettime-like), and then for the
                                JDK to provide better (and more
                                maintainable) alternatives for those
                                (which might even be faster than using
                                critical JNI).<br>
                              </p>
                              <p>Thanks<br>
                                Maurizio<br>
                              </p>
                              <p>[1] - <a href="https://man7.org/linux/man-pages/man7/vdso.7.html" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://man7.org/linux/man-pages/man7/vdso.7.html</a><br>
                              </p>
                            </div>
                            <div>
                              <div>On 04/07/2022 12:23, Wojciech Kudla
                                wrote:<br>
                              </div>
                              <blockquote type="cite">
                                <div dir="ltr">
                                  <div>
                                    <div>Thanks Maurizio,<br>
                                      <br>
                                    </div>
                                    I raised this case mainly about
                                    clock_gettime and recvmsg/sendmsg, I
                                    think we're focusing on the wrong
                                    things here. Feel free to drop the
                                    two syscalls from the discussion
                                    entirely, but the main usecases I
                                    have been presenting throughout this
                                    thread definitely stand.<br>
                                    <br>
                                  </div>
                                  <div>Thanks<br>
                                  </div>
                                  <br>
                                </div>
                                <br>
                                <div class="gmail_quote">
                                  <div dir="ltr" class="gmail_attr">On
                                    Mon, Jul 4, 2022 at 10:54 AM
                                    Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>
                                    wrote:<br>
                                  </div>
                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                                    <div>
                                      <p>Hi Wojtek,<br>
                                        thanks for sharing this list, I
                                        think this is a good starting
                                        point to understand more about
                                        your use case.</p>
                                      <p>Last week I've been looking at
                                        "getrusage" (as you mentioned it
                                        in an earlier email), and I was
                                        surprised to see that the call
                                        took a pointer to a (fairly big)
                                        struct which then needed to be
                                        initialized with some
                                        thread-local state:</p>
                                      <p><a href="https://man7.org/linux/man-pages/man2/getrusage.2.html" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://man7.org/linux/man-pages/man2/getrusage.2.html</a></p>
                                      <p>I've looked at the
                                        implementation, and it seems to
                                        be doing memset on the
                                        user-provided struct pointer,
                                        plus all the fields assignment.
                                        Eyeballing the implementation,
                                        this does not seem to me like a
                                        "classic" use case where
                                        dropping transition would help
                                        much. I mean, surely dropping
                                        transitions would help shaving
                                        some nanoseconds off the call,
                                        but it doesn't seem to me that
                                        the call would be shortlived
                                        enough to make a difference. Do
                                        you have some benchmarks on this
                                        one? I did some [1] and the call
                                        overhead seemed to come up at
                                        260ns/op - w/o transition you
                                        might perhaps be able to get to
                                        250ns, but that's in the noise?<br>
                                      </p>
                                      <p>As for getpid, note that you
                                        can do (since Java 9):<br>
                                        <br>
                                        ProcessHandle.current().pid();<br>
                                        <br>
                                        I believe the impl caches the
                                        result, so it shouldn't even
                                        make the native call.<br>
                                      </p>
                                      <p>Maurizio</p>
                                      <p>[1] - <a href="http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java</a><br>
                                      </p>
                                      <div>On 02/07/2022 07:42, Wojciech
                                        Kudla wrote:<br>
                                      </div>
                                      <blockquote type="cite">
                                        <div dir="ltr">
                                          <div>
                                            <div>Hi Maurizio,<br>
                                              <br>
                                            </div>
                                            Thanks for staying on this.<br>
                                            <br>
                                            > Could you please
                                            provide a rough list of the
                                            native calls you make where
                                            you believe critical JNI is
                                            having a real impact in the
                                            performance of your
                                            application?<br>
                                          </div>
                                          <div><br>
                                            From the top of my head:<br>
                                          </div>
                                          <div>clock_gettime<br>
                                          </div>
                                          <div>recvmsg<br>
                                          </div>
                                          <div>recvmmsg</div>
                                          <div>sendmsg<br>
                                          </div>
                                          <div>sendmmsg</div>
                                          <div>select<br>
                                          </div>
                                          <div>getpid</div>
                                          <div>getcpu<br>
                                          </div>
                                          <div>getrusage<br>
                                          </div>
                                          <div><br>
                                          </div>
                                          <div>> Also, could you
                                            please tell us whether any
                                            of these calls need to
                                            interact with Java arrays?<br>
                                          </div>
                                          <div>No arrays or objects of
                                            any type involved.
                                            Everything happens by the
                                            means of passing raw
                                            pointers as longs and using
                                            other primitive types as
                                            function arguments.<br>
                                          </div>
                                          <div><br>
                                            > In other words, do you
                                            use critical JNI to remove
                                            the cost associated with
                                            thread transitions, or are
                                            you also taking advantage of
                                            accessing on-heap memory
                                            _directly_ from native code?<br>
                                          </div>
                                          <div>Criticial JNI natives are
                                            used solely to remove the
                                            cost of transitions. We
                                            don't get anywhere near java
                                            heap in native code.<br>
                                            <br>
                                          </div>
                                          <div>In general I think it
                                            makes a lot of sense for
                                            Java as a language/platform
                                            to have some guards around
                                            unsafe code, but on the
                                            other hand the popularity of
                                            libraries employing Unsafe
                                            and their success in more
                                            performance-oriented corners
                                            of software engineering is a
                                            clear indicator there is a
                                            need for the JVM to provide
                                            access to more low-level
                                            primitives and mechanisms. <br>
                                          </div>
                                          <div>I think it's entirely
                                            fair to tell developers that
                                            all bets are off when they
                                            get into some non-idiomatic
                                            scenarios but please don't
                                            take away a feature that
                                            greatly contributed to
                                            Java's success.<br>
                                            <br>
                                          </div>
                                          <div>Kind regards,<br>
                                          </div>
                                          <div>Wojtek<br>
                                          </div>
                                        </div>
                                        <br>
                                        <div class="gmail_quote">
                                          <div dir="ltr" class="gmail_attr">On Wed,
                                            Jun 29, 2022 at 5:20 PM
                                            Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>
                                            wrote:<br>
                                          </div>
                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                                            <div>
                                              <p>Hi Wojciech,<br>
                                                picking up this thread
                                                again. After some
                                                internal discussion, we
                                                realize that we don't
                                                know enough about your
                                                use case. While
                                                re-enabling JNI critical
                                                would obviously provide
                                                a quick fix, we're
                                                afraid that (a)
                                                developers might end up
                                                depending on JNI
                                                critical when they don't
                                                need to (perhaps also
                                                unaware of the
                                                consequences of
                                                depending on it) and (b)
                                                that there might
                                                actually be _better_ (as
                                                in: much faster)
                                                solutions than using
                                                critical native calls to
                                                address at least some of
                                                your use cases (that
                                                seemed to be the case
                                                with the clock_gettime
                                                example you mentioned).
                                                Could you please provide
                                                a rough list of the
                                                native calls you make
                                                where you believe
                                                critical JNI is having a
                                                real impact in the
                                                performance of your
                                                application? Also, could
                                                you please tell us
                                                whether any of these
                                                calls need to interact
                                                with Java arrays? In
                                                other words, do you use
                                                critical JNI to remove
                                                the cost associated with
                                                thread transitions, or
                                                are you also taking
                                                advantage of accessing
                                                on-heap memory
                                                _directly_ from native
                                                code?</p>
                                              <p>Regards<br>
                                                Maurizio<br>
                                              </p>
                                              <div>On 13/06/2022 21:38,
                                                Wojciech Kudla wrote:<br>
                                              </div>
                                              <blockquote type="cite">
                                                <div dir="ltr">
                                                  <div>
                                                    <div>
                                                      <div>
                                                        <div>
                                                          <div>Hi Mark,<br>
                                                          <br>
                                                          </div>
                                                          Thanks for
                                                          your input and
                                                          apologies for
                                                          the delayed
                                                          response.<br>
                                                          <br>
                                                          > If the
                                                          platform
                                                          included, say,
                                                          an
                                                          intrinsified
                                                          System.nanoRealTime()<br>
                                                          method that
                                                          returned
                                                          clock_gettime(CLOCK_REALTIME),
                                                          how much would<br>
                                                          that help
                                                          developers in
                                                          your unnamed
                                                          industry?<br>
                                                          <br>
                                                        </div>
                                                        Exposing
                                                        realtime clock
                                                        with nanosecond
                                                        granularity in
                                                        the JDK would be
                                                        a great step
                                                        forward. I
                                                        should have made
                                                        it clear that I
                                                        represent
                                                        fintech corner
                                                        (investment
                                                        banking to be
                                                        exact) but the
                                                        issues my
                                                        message touches
                                                        upon span areas
                                                        such as HPC,
                                                        audio
                                                        processing,
                                                        gaming, and
                                                        defense industry
                                                        so it's not like
                                                        we have an
                                                        isolated case.<br>
                                                        <br>
                                                        > In a
                                                        similar vein, if
                                                        people are
                                                        finding it
                                                        necessary to
                                                        “replace parts<br>
                                                        of NIO with
                                                        hand-crafted
                                                        native code”
                                                        then it would be
                                                        interesting to<br>
                                                        understand what
                                                        their
                                                        requirements are<br>
                                                        <br>
                                                      </div>
                                                      As for the other
                                                      example I provided
                                                      with making very
                                                      short lived
                                                      syscalls such as
                                                      recvmsg/recvmmsg
                                                      the premise is
                                                      getting access to
                                                      hardware
                                                      timestamps on the
                                                      ingress and egress
                                                      ends as well as
                                                      enabling batch
                                                      receive with a
                                                      single syscall and
                                                      otherwise
                                                      exploiting
                                                      features
                                                      unavailable from
                                                      the JDK (like
                                                      access to CMSG
                                                      interface,
                                                      scatter/gather,
                                                      etc).<br>
                                                    </div>
                                                    <div>There are also
                                                      other examples of
                                                      calls that we'd
                                                      love to make often
                                                      and at lowest
                                                      possible cost (ie.
                                                      getrusage) but I'm
                                                      not sure if
                                                      there's a strong
                                                      case for some of
                                                      these ideas,
                                                      that's why it
                                                      might be worth
                                                      looking into more
                                                      generic approach
                                                      for performance
                                                      sensitive code.<br>
                                                    </div>
                                                    <div>Hope this does
                                                      better job at
                                                      explaining where
                                                      we're coming from
                                                      than my previous
                                                      messages.<br>
                                                    </div>
                                                    <div><br>
                                                    </div>
                                                    Thanks,<br>
                                                  </div>
                                                  W<br>
                                                </div>
                                                <br>
                                                <div class="gmail_quote">
                                                  <div dir="ltr" class="gmail_attr">On
                                                    Tue, Jun 7, 2022 at
                                                    6:31 PM <<a href="mailto:mark.reinhold@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">mark.reinhold@oracle.com</a>> wrote:<br>
                                                  </div>
                                                  <blockquote class="gmail_quote" style="margin:0px
                                                    0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">2022/6/6
                                                    0:24:17 -0700, <a href="mailto:wkudla.kernel@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">wkudla.kernel@gmail.com</a>:<br>
                                                    >> Yes for
                                                    System.nanoTime(),
                                                    but
                                                    System.currentTimeMillis()
                                                    reports<br>
                                                    >>
                                                    CLOCK_REALTIME.<br>
                                                    > <br>
                                                    > Unfortunately
                                                    System.currentTimeMillis()
                                                    offers only
                                                    millisecond<br>
                                                    > granularity
                                                    which is the reason
                                                    why our industry has
                                                    to resort to<br>
                                                    > clock_gettime.<br>
                                                    <br>
                                                    If the platform
                                                    included, say, an
                                                    intrinsified
                                                    System.nanoRealTime()<br>
                                                    method that returned
clock_gettime(CLOCK_REALTIME), how much would<br>
                                                    that help developers
                                                    in your unnamed
                                                    industry?<br>
                                                    <br>
                                                    In a similar vein,
                                                    if people are
                                                    finding it necessary
                                                    to “replace parts<br>
                                                    of NIO with
                                                    hand-crafted native
                                                    code” then it would
                                                    be interesting to<br>
                                                    understand what
                                                    their requirements
                                                    are.  Some simple
                                                    enhancements to<br>
                                                    the NIO API would be
                                                    much less costly to
                                                    design and implement
                                                    than a<br>
                                                    generalized
                                                    user-level
                                                    native-call
                                                    intrinsification
                                                    mechanism.<br>
                                                    <br>
                                                    - Mark<br>
                                                  </blockquote>
                                                </div>
                                              </blockquote>
                                            </div>
                                          </blockquote>
                                        </div>
                                      </blockquote>
                                    </div>
                                  </blockquote>
                                </div>
                              </blockquote>
                            </div>
                          </blockquote>
                        </div>
                      </div>
                      -- <br>
                      <div dir="ltr">Sent from my phone</div>
                    </blockquote>
                  </div>
                </blockquote>
              </div>
            </div>
            -- <br>
            <div dir="ltr" data-smartmail="gmail_signature">Sent from my
              phone</div>
          </blockquote>
        </div>
      </div>
      -- <br>
      <div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Sent from my phone</div>
    </blockquote>
  </body>
</html>