<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>Hi,<br>

      As Erik explained in his reply, what we call "critical JNI" comes

      in two pieces: one removes Java to native thread transitions

      (which is what Wojciech is referring to), while another part

      interacts with the GC locker (basically to allow critical JNI code

      to access Java arrays w/o copying). I think the latter part is the

      most problematic GC-wise.<br>

    </p>

    <p>Then, regarding the former, I think there are still questions as

      to whether dropping transitions is the best way to get the

      performance boost required; for instance, yesterday I did some

      experiments with an experimental patch from Jorn (kudos) which

      re-enables an opt-in for "trivial" native calls in the Panama API.

      I used it to test clock_gettime, and, while there's an

      improvement, the results I got were not as conclusive as one might

      expect expected. This is what I get w/ state transitions:</p>

    <p>```<br>

      Benchmark                                 Mode  Cnt   Score  

      Error  Units<br>

      ClockgettimeTest.panama_monotonic         avgt   30  27.814 ±

      0.165  ns/op<br>

      ClockgettimeTest.panama_monotonic_coarse  avgt   30  12.094 ±

      0.103  ns/op<br>

      ClockgettimeTest.panama_monotonic_raw     avgt   30  27.719 ±

      0.393  ns/op<br>

      ClockgettimeTest.panama_realtime          avgt   30  27.133 ±

      0.280  ns/op<br>

      ClockgettimeTest.panama_realtime_coarse   avgt   30  26.812 ±

      0.384  ns/op<br>

      ```<br>

    </p>

    <p>And this is what I get with transitions removed:</p>

    <p>```<br>

      Benchmark                                 Mode  Cnt   Score  

      Error  Units<br>

      ClockgettimeTest.panama_monotonic         avgt   30  22.383 ±

      0.213  ns/op<br>

      ClockgettimeTest.panama_monotonic_coarse  avgt   30   6.312 ±

      0.117  ns/op<br>

      ClockgettimeTest.panama_monotonic_raw     avgt   30  22.731 ±

      0.279  ns/op<br>

      ClockgettimeTest.panama_realtime          avgt   30  22.503 ±

      0.292  ns/op<br>

      ClockgettimeTest.panama_realtime_coarse   avgt   30  21.853 ±

      0.100  ns/op<br>

    </p>

    <p>```<br>

    </p>

    <p>Here we can see a gain of 4-5ns, obtained by dropping the

      transition. The only case where this makes a significant

      difference is with the monotonic_coarse flavor. In the other cases

      there's a difference, yes, but not as pronounced, simply because

      the term we're comparing against is bigger: it's easy to see a 5ns

      gain if your function runs for 10ns in total - but such a gain

      starts to get lost in the "noise" when functions run for longer.

      And that's the main issue with removing Java->native

      transitions: the "window" in which this optimization yield a

      positive effect is extremely narrow (anything lasting longer than

      30ns won't probably appreciate much difference), but, as you can

      see from the PR in [1], the VM changes required to support it

      touch quite a bit of stuff!</p>

    <p>Luckily, selectively disabling transitions from Panama is

      slightly more straightforward and, perhaps, for stuff like recvmsg

      syscalls that are bypassed, there's not much else we can do: while

      one could imagine Panama special-casing calls to clock_gettime, as

      that's a known "leaf", the same cannot be done with rcvmsg, which

      is in general a blocking call. Panama also has a "trusted mode"

      flag (--enable-native-access), so there is a way in the Panama API

      to distinguish between safe and unsafe API point, which also helps

      with this. The risk of course is for developers to see whatever

      mechanism is provided as some kind of "make my code go fast

      please" and apply it blindly, w/o fully understanding the

      consequences. What I said before about "extremely narrow window"

      remains true: in the vast majority of cases (like 99%) dropping

      state transitions can result in very big downsides, while the

      corresponding upsides are not big enough to even be noticeable

      (the Q/A in [2] arrives at a very similar conclusion).<br>

    </p>

    <p>All this said, selectively disabling state transitions from

      native calls made using the Panama foreign API seem the most

      straightforward way to offset the performance delta introduced by

      the removal of critical JNI. In part it's because the Panama API

      is more flexible, e.g. function descriptors allows us to model the

      distinction between a trivial and non-trivial call; in part it's

      because, as stated above, Panama can already reason about calls

      that are "unsafe" and that require extra permissions. And, finally

      it's also because, if we added back critical JNI, we'd probably

      add it back w/o its most problematic GC locker parts (that's what

      [1] does AFAIK) - which means it won't be a complete code

      reversal. So, perhaps, coming up with a fresh mechanism to drop

      transitions (only) could also be less confusing for developers. Of

      course this would require developers such as Wojciech to rewrite

      some of the code to use Panama instead of JNI.</p>

    <p>And, coming back to clock_gettime, my feeling is that with the

      right tools (e.g. some intrinsics), we can make that go a lot

      faster than what shown above. Being able to quickly get a

      timestamp seems a widely-enough applicable use case to deserves

      some special treatment. So, perhaps, it's worth considering a

      _spectrum of solutions_ on how to improve the status quo, rather

      than investing solely on the removal of thread transitions.<br>

    </p>

    <p>Maurizio<br>

    </p>

    <p>[1] - <a class="moz-txt-link-freetext" href="https://github.com/openjdk/jdk19/pull/90/files">https://github.com/openjdk/jdk19/pull/90/files<br>

      </a>[2] - <a class="moz-txt-link-freetext" href="https://youtu.be/LoyBTqkSkZk?t=742">https://youtu.be/LoyBTqkSkZk?t=742</a></p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 04/07/2022 18:38, Vitaly Davidovich

      wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:CAHjP37E62eEbrDtS7HF0eZ0wA65xTWVF_eqpZFubP=4PTXEYVg@mail.gmail.com">

      <div dir="auto">To not sidetrack this thread with my previous

        reply:</div>

      <div dir="auto"><br>

      </div>

      <div dir="auto">Maurizio - are you saying java criticals are

        *already* hindering ZGC and/or other planned Hotspot

        improvements? Or that theoretically they could and you’d like to

        remove/deprecate them now(ish)?</div>

      <div dir="auto"><br>

      </div>

      <div dir="auto">If it’s the former, perhaps it’s prudent to keep

        them around until a compelling case surfaces where they preclude

        or severely restrict evolution of the platform? If it’s the

        former, would be curious what that is but would also understand

        the rationale behind wanting to remove it.</div>

      <div><br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">On Mon, Jul 4, 2022 at 1:26

            PM Vitaly Davidovich <<a href="mailto:vitalyd@gmail.com" moz-do-not-send="true" class="moz-txt-link-freetext">vitalyd@gmail.com</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

            <div><br>

            </div>

            <div><br>

              <div class="gmail_quote">

                <div dir="ltr" class="gmail_attr">On Mon, Jul 4, 2022 at

                  1:13 PM Wojciech Kudla <<a href="mailto:wkudla.kernel@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">wkudla.kernel@gmail.com</a>>

                  wrote:<br>

                </div>

                <blockquote class="gmail_quote" style="margin:0px 0px

                  0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                  <div dir="ltr">

                    <div>

                      <div>

                        <div>Thanks for your input, Vitaly. I'd be

                          interested to find out more about the nature

                          of the HW noise you observed in your

                          benchmarks as our results were very consistent

                          and it was pretty straightforward to pinpoint

                          the culprit as JNI call overhead. Maybe it was

                          just easier for us because we disallow C- and

                          P-state transitions and put a lot of effort to

                          eliminate platform jitter in general. Were you

                          maybe running on a CPU model that doesn't

                          support constant TSC? I would also suggest

                          retrying with LAPIC interrupts suppressed

                          (with: cli/sti) to maybe see if it's the

                          kernel and not the hardware.</div>

                      </div>

                    </div>

                  </div>

                </blockquote>

                <div dir="auto">This was on a Broadwell Xeon chipset

                  with constant tsc.  All the typical jitter sources

                  were reduced: C/P states disabled in bios, max turbo

                  enabled, IRQs steered away, core isolated, etc.  By

                  the way, by noise I don’t mean the results themselves

                  were noisy - they were constant run to run.  I just

                  meant the delta between normal vs critical JNI

                  entrypoints was very minimal - ie “in the noise”,

                  particularly with rdtsc.</div>

                <div dir="auto"><br>

                </div>

                <div dir="auto">I can try to remeasure on newer Intel

                  but see below …</div>

                <blockquote class="gmail_quote" style="margin:0px 0px

                  0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                  <div dir="ltr">

                    <div>

                      <div>

                        <div dir="auto"><br>

                          <br>

                        </div>

                        100% agree on rdtsc(p) and snippets. There are

                        some narrow usecases were one can get some

                        substantial speed ups with direct access to

                        prefetch or by abusing misprediction to keep

                        icache hot. These scenarios are sadly only

                        available with inline assembly. I know of a few

                        shops that go to the length of forking Graal,

                        etc to achieve that but am quite convinced such

                        capabilities would be welcome and utilized by

                        many more groups if they were easily accessible

                        from java.</div>

                    </div>

                  </div>

                </blockquote>

                <div dir="auto">I’m of the firm (and perhaps

                  controversial for some :)) opinion these days that

                  Java is simply the wrong platform/tool for low latency

                  cases that warrant this level of control.  There’re

                  very strong headwinds even outside of JNI costs.  And

                  the “real” problem with JNI, besides transition costs,

                  is lack of inlining into the native calls.  So even if

                  JVM transition costs are fully eliminated, there’s

                  still an optimization fence due to lost inlining (not

                  unlike native code calling native fns via shared

                  libs).</div>

                <div dir="auto"><br>

                </div>

                <div dir="auto">That’s not say that perf regressions are

                  welcomed - nobody likes those :).</div>

              </div>

            </div>

            <div>

              <div class="gmail_quote">

                <blockquote class="gmail_quote" style="margin:0px 0px

                  0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                  <div dir="ltr">

                    <div>

                      <div dir="auto"><br>

                        <br>

                      </div>

                      Thanks,<br>

                    </div>

                    W.<br>

                  </div>

                  <br>

                  <div class="gmail_quote">

                    <div dir="ltr" class="gmail_attr">On Mon, Jul 4,

                      2022 at 5:51 PM Vitaly Davidovich <<a href="mailto:vitalyd@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">vitalyd@gmail.com</a>>

                      wrote:<br>

                    </div>

                    <blockquote class="gmail_quote" style="margin:0px

                      0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                      <div dir="auto">I’d add rdtsc(p) wrapper functions

                        to the list.  These are usually either inline

                        asm or compiler intrinsic in the JNI

                        entrypoint.  In addition, any native libs

                        exposed via JNI that have “trivial” functions

                        are also candidates for faster calling

                        conventions.  There’re sometimes way to mitigate

                        the call overhead (eg batching) but it’s not

                        always feasible.</div>

                      <div dir="auto"><br>

                      </div>

                      <div dir="auto">I’ll add that last time I tried to

                        measure the improvement of Java criticals for

                        clock_gettime (and rdtsc) it looked to be in the

                        noise on the hardware I was testing on.  It got

                        the point where I had to instrument the critical

                        and normal JNI entrypoints to confirm the

                        critical was being hit.  The critical calling

                        convention isn’t significantly different *if*

                        basic primitives (or no args at all) are passed

                        as args.  JNIEnv*, IIRC, is loaded from a

                        register so that’s minor.  jclass (for static

                        calls, which is what’s relevant here) should be

                        a compiled constant.  Critical call still has a

                        GCLocker check.  So I’m not actually sure what

                        the significant difference is for “lightweight”

                        (ie few primitive or no args, primitive return

                        types) calls.</div>

                      <div dir="auto"><br>

                      </div>

                      <div dir="auto">In general, I do think it’d be

                        nice if there was a faster native call sequence,

                        even if it comes with a caveat emptor and/or

                        special requirements on the callee (not unlike

                        the requirements for criticals).  I think

                        Vladimir Ivanov was working on “snippets” that

                        allowed dynamic construction of a native call,

                        possibly including assembly.  Not sure where

                        that exploration is these days, but that would

                        be a welcome capability.</div>

                      <div dir="auto"><br>

                      </div>

                      <div dir="auto">My $.02.  Happy 4th of July for

                        those celebrating!</div>

                      <div dir="auto"><br>

                      </div>

                      <div dir="auto">Vitaly</div>

                      <div><br>

                        <div class="gmail_quote">

                          <div dir="ltr" class="gmail_attr">On Mon, Jul

                            4, 2022 at 12:04 PM Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>

                            wrote:<br>

                          </div>

                          <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                            <div>

                              <p>Hi,<br>

                                while I'm not an expert with some of the

                                IO calls you mention (some of my

                                colleagues are more knowledgeable in

                                this area, so I'm sure they will have

                                more info), my general sense is that, as

                                with getrusage, if there is a system

                                call involved, you already pay a hefty

                                price for the user to kernel transition.

                                On my machine this seem to cost around

                                200ns. In these cases, using JNI

                                critical to shave off a dozen of

                                nanoseconds (at best!) seems just not

                                worth it.</p>

                              <p>So, of the functions in your list, the

                                ones in which I *believe*  dropping

                                transitions would have the most effect

                                are (if we exclude getpid, for which

                                another approach is possible)

                                clock_gettime and getcpu, I believe, as

                                they might use vdso [1], which typically

                                brings the performance of these call

                                closer to calls to shared lib functions.<br>

                              </p>

                              <p>If you have examples e.g. where

                                performance of recvmsg (or related

                                calls) varies significantly between base

                                JNI and critical JNI, please send them

                                our way; I'm sure some of my colleagues

                                would be intersted to take a look.<br>

                              </p>

                              <p>Popping back a couple of levels, I

                                think it would be helpful to also define

                                what's an acceptable regression in this

                                context. Of course, in an ideal world, 

                                we'd like to see no performance

                                regression at all. But JNI critical is

                                an unsupported interface, which might

                                misbehave with modern garbage collectors

                                (e.g. ZGC) and that requires quite a bit

                                of internal complexity which might, in

                                the medium/long run, hinder the

                                evolution of the Java platform (all

                                these things have _some_ cost, even if

                                the cost is not directly material to

                                developers). In this vein, I think calls

                                like clock_gettime tend to be more

                                problematic: as they complete very

                                quickly, you see the cost of transitions

                                a lot more. In other cases, where

                                syscalls are involved, the cost

                                associated to transitions are more

                                likely to be "in the noise". Of course

                                if we look at absolute numbers, dropping

                                transitions would always yield "faster"

                                code; but at the same time, going from

                                250ns to 245ns is very unlikely to

                                result in visible performance difference

                                when considering an application as a

                                whole, so I think it's critical here to

                                decide _which_ use cases to prioritize.<br>

                              </p>

                              <p>I think a good outcome of this

                                discussion would be if we could come to

                                some shared understanding of which

                                native calls are truly problematic (e.g.

                                clock_gettime-like), and then for the

                                JDK to provide better (and more

                                maintainable) alternatives for those

                                (which might even be faster than using

                                critical JNI).<br>

                              </p>

                              <p>Thanks<br>

                                Maurizio<br>

                              </p>

                              <p>[1] - <a href="https://man7.org/linux/man-pages/man7/vdso.7.html" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://man7.org/linux/man-pages/man7/vdso.7.html</a><br>

                              </p>

                            </div>

                            <div>

                              <div>On 04/07/2022 12:23, Wojciech Kudla

                                wrote:<br>

                              </div>

                              <blockquote type="cite">

                                <div dir="ltr">

                                  <div>

                                    <div>Thanks Maurizio,<br>

                                      <br>

                                    </div>

                                    I raised this case mainly about

                                    clock_gettime and recvmsg/sendmsg, I

                                    think we're focusing on the wrong

                                    things here. Feel free to drop the

                                    two syscalls from the discussion

                                    entirely, but the main usecases I

                                    have been presenting throughout this

                                    thread definitely stand.<br>

                                    <br>

                                  </div>

                                  <div>Thanks<br>

                                  </div>

                                  <br>

                                </div>

                                <br>

                                <div class="gmail_quote">

                                  <div dir="ltr" class="gmail_attr">On

                                    Mon, Jul 4, 2022 at 10:54 AM

                                    Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>

                                    wrote:<br>

                                  </div>

                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                                    <div>

                                      <p>Hi Wojtek,<br>

                                        thanks for sharing this list, I

                                        think this is a good starting

                                        point to understand more about

                                        your use case.</p>

                                      <p>Last week I've been looking at

                                        "getrusage" (as you mentioned it

                                        in an earlier email), and I was

                                        surprised to see that the call

                                        took a pointer to a (fairly big)

                                        struct which then needed to be

                                        initialized with some

                                        thread-local state:</p>

                                      <p><a href="https://man7.org/linux/man-pages/man2/getrusage.2.html" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://man7.org/linux/man-pages/man2/getrusage.2.html</a></p>

                                      <p>I've looked at the

                                        implementation, and it seems to

                                        be doing memset on the

                                        user-provided struct pointer,

                                        plus all the fields assignment.

                                        Eyeballing the implementation,

                                        this does not seem to me like a

                                        "classic" use case where

                                        dropping transition would help

                                        much. I mean, surely dropping

                                        transitions would help shaving

                                        some nanoseconds off the call,

                                        but it doesn't seem to me that

                                        the call would be shortlived

                                        enough to make a difference. Do

                                        you have some benchmarks on this

                                        one? I did some [1] and the call

                                        overhead seemed to come up at

                                        260ns/op - w/o transition you

                                        might perhaps be able to get to

                                        250ns, but that's in the noise?<br>

                                      </p>

                                      <p>As for getpid, note that you

                                        can do (since Java 9):<br>

                                        <br>

                                        ProcessHandle.current().pid();<br>

                                        <br>

                                        I believe the impl caches the

                                        result, so it shouldn't even

                                        make the native call.<br>

                                      </p>

                                      <p>Maurizio</p>

                                      <p>[1] - <a href="http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java</a><br>

                                      </p>

                                      <div>On 02/07/2022 07:42, Wojciech

                                        Kudla wrote:<br>

                                      </div>

                                      <blockquote type="cite">

                                        <div dir="ltr">

                                          <div>

                                            <div>Hi Maurizio,<br>

                                              <br>

                                            </div>

                                            Thanks for staying on this.<br>

                                            <br>

                                            > Could you please

                                            provide a rough list of the

                                            native calls you make where

                                            you believe critical JNI is

                                            having a real impact in the

                                            performance of your

                                            application?<br>

                                          </div>

                                          <div><br>

                                            From the top of my head:<br>

                                          </div>

                                          <div>clock_gettime<br>

                                          </div>

                                          <div>recvmsg<br>

                                          </div>

                                          <div>recvmmsg</div>

                                          <div>sendmsg<br>

                                          </div>

                                          <div>sendmmsg</div>

                                          <div>select<br>

                                          </div>

                                          <div>getpid</div>

                                          <div>getcpu<br>

                                          </div>

                                          <div>getrusage<br>

                                          </div>

                                          <div><br>

                                          </div>

                                          <div>> Also, could you

                                            please tell us whether any

                                            of these calls need to

                                            interact with Java arrays?<br>

                                          </div>

                                          <div>No arrays or objects of

                                            any type involved.

                                            Everything happens by the

                                            means of passing raw

                                            pointers as longs and using

                                            other primitive types as

                                            function arguments.<br>

                                          </div>

                                          <div><br>

                                            > In other words, do you

                                            use critical JNI to remove

                                            the cost associated with

                                            thread transitions, or are

                                            you also taking advantage of

                                            accessing on-heap memory

                                            _directly_ from native code?<br>

                                          </div>

                                          <div>Criticial JNI natives are

                                            used solely to remove the

                                            cost of transitions. We

                                            don't get anywhere near java

                                            heap in native code.<br>

                                            <br>

                                          </div>

                                          <div>In general I think it

                                            makes a lot of sense for

                                            Java as a language/platform

                                            to have some guards around

                                            unsafe code, but on the

                                            other hand the popularity of

                                            libraries employing Unsafe

                                            and their success in more

                                            performance-oriented corners

                                            of software engineering is a

                                            clear indicator there is a

                                            need for the JVM to provide

                                            access to more low-level

                                            primitives and mechanisms. <br>

                                          </div>

                                          <div>I think it's entirely

                                            fair to tell developers that

                                            all bets are off when they

                                            get into some non-idiomatic

                                            scenarios but please don't

                                            take away a feature that

                                            greatly contributed to

                                            Java's success.<br>

                                            <br>

                                          </div>

                                          <div>Kind regards,<br>

                                          </div>

                                          <div>Wojtek<br>

                                          </div>

                                        </div>

                                        <br>

                                        <div class="gmail_quote">

                                          <div dir="ltr" class="gmail_attr">On Wed,

                                            Jun 29, 2022 at 5:20 PM

                                            Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>

                                            wrote:<br>

                                          </div>

                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">

                                            <div>

                                              <p>Hi Wojciech,<br>

                                                picking up this thread

                                                again. After some

                                                internal discussion, we

                                                realize that we don't

                                                know enough about your

                                                use case. While

                                                re-enabling JNI critical

                                                would obviously provide

                                                a quick fix, we're

                                                afraid that (a)

                                                developers might end up

                                                depending on JNI

                                                critical when they don't

                                                need to (perhaps also

                                                unaware of the

                                                consequences of

                                                depending on it) and (b)

                                                that there might

                                                actually be _better_ (as

                                                in: much faster)

                                                solutions than using

                                                critical native calls to

                                                address at least some of

                                                your use cases (that

                                                seemed to be the case

                                                with the clock_gettime

                                                example you mentioned).

                                                Could you please provide

                                                a rough list of the

                                                native calls you make

                                                where you believe

                                                critical JNI is having a

                                                real impact in the

                                                performance of your

                                                application? Also, could

                                                you please tell us

                                                whether any of these

                                                calls need to interact

                                                with Java arrays? In

                                                other words, do you use

                                                critical JNI to remove

                                                the cost associated with

                                                thread transitions, or

                                                are you also taking

                                                advantage of accessing

                                                on-heap memory

                                                _directly_ from native

                                                code?</p>

                                              <p>Regards<br>

                                                Maurizio<br>

                                              </p>

                                              <div>On 13/06/2022 21:38,

                                                Wojciech Kudla wrote:<br>

                                              </div>

                                              <blockquote type="cite">

                                                <div dir="ltr">

                                                  <div>

                                                    <div>

                                                      <div>

                                                        <div>

                                                          <div>Hi Mark,<br>

                                                          <br>

                                                          </div>

                                                          Thanks for

                                                          your input and

                                                          apologies for

                                                          the delayed

                                                          response.<br>

                                                          <br>

                                                          > If the

                                                          platform

                                                          included, say,

                                                          an

                                                          intrinsified

                                                          System.nanoRealTime()<br>

                                                          method that

                                                          returned

                                                          clock_gettime(CLOCK_REALTIME),

                                                          how much would<br>

                                                          that help

                                                          developers in

                                                          your unnamed

                                                          industry?<br>

                                                          <br>

                                                        </div>

                                                        Exposing

                                                        realtime clock

                                                        with nanosecond

                                                        granularity in

                                                        the JDK would be

                                                        a great step

                                                        forward. I

                                                        should have made

                                                        it clear that I

                                                        represent

                                                        fintech corner

                                                        (investment

                                                        banking to be

                                                        exact) but the

                                                        issues my

                                                        message touches

                                                        upon span areas

                                                        such as HPC,

                                                        audio

                                                        processing,

                                                        gaming, and

                                                        defense industry

                                                        so it's not like

                                                        we have an

                                                        isolated case.<br>

                                                        <br>

                                                        > In a

                                                        similar vein, if

                                                        people are

                                                        finding it

                                                        necessary to

                                                        “replace parts<br>

                                                        of NIO with

                                                        hand-crafted

                                                        native code”

                                                        then it would be

                                                        interesting to<br>

                                                        understand what

                                                        their

                                                        requirements are<br>

                                                        <br>

                                                      </div>

                                                      As for the other

                                                      example I provided

                                                      with making very

                                                      short lived

                                                      syscalls such as

                                                      recvmsg/recvmmsg

                                                      the premise is

                                                      getting access to

                                                      hardware

                                                      timestamps on the

                                                      ingress and egress

                                                      ends as well as

                                                      enabling batch

                                                      receive with a

                                                      single syscall and

                                                      otherwise

                                                      exploiting

                                                      features

                                                      unavailable from

                                                      the JDK (like

                                                      access to CMSG

                                                      interface,

                                                      scatter/gather,

                                                      etc).<br>

                                                    </div>

                                                    <div>There are also

                                                      other examples of

                                                      calls that we'd

                                                      love to make often

                                                      and at lowest

                                                      possible cost (ie.

                                                      getrusage) but I'm

                                                      not sure if

                                                      there's a strong

                                                      case for some of

                                                      these ideas,

                                                      that's why it

                                                      might be worth

                                                      looking into more

                                                      generic approach

                                                      for performance

                                                      sensitive code.<br>

                                                    </div>

                                                    <div>Hope this does

                                                      better job at

                                                      explaining where

                                                      we're coming from

                                                      than my previous

                                                      messages.<br>

                                                    </div>

                                                    <div><br>

                                                    </div>

                                                    Thanks,<br>

                                                  </div>

                                                  W<br>

                                                </div>

                                                <br>

                                                <div class="gmail_quote">

                                                  <div dir="ltr" class="gmail_attr">On

                                                    Tue, Jun 7, 2022 at

                                                    6:31 PM <<a href="mailto:mark.reinhold@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">mark.reinhold@oracle.com</a>> wrote:<br>

                                                  </div>

                                                  <blockquote class="gmail_quote" style="margin:0px

                                                    0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">2022/6/6

                                                    0:24:17 -0700, <a href="mailto:wkudla.kernel@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">wkudla.kernel@gmail.com</a>:<br>

                                                    >> Yes for

                                                    System.nanoTime(),

                                                    but

                                                    System.currentTimeMillis()

                                                    reports<br>

                                                    >>

                                                    CLOCK_REALTIME.<br>

                                                    > <br>

                                                    > Unfortunately

                                                    System.currentTimeMillis()

                                                    offers only

                                                    millisecond<br>

                                                    > granularity

                                                    which is the reason

                                                    why our industry has

                                                    to resort to<br>

                                                    > clock_gettime.<br>

                                                    <br>

                                                    If the platform

                                                    included, say, an

                                                    intrinsified

                                                    System.nanoRealTime()<br>

                                                    method that returned

clock_gettime(CLOCK_REALTIME), how much would<br>

                                                    that help developers

                                                    in your unnamed

                                                    industry?<br>

                                                    <br>

                                                    In a similar vein,

                                                    if people are

                                                    finding it necessary

                                                    to “replace parts<br>

                                                    of NIO with

                                                    hand-crafted native

                                                    code” then it would

                                                    be interesting to<br>

                                                    understand what

                                                    their requirements

                                                    are.  Some simple

                                                    enhancements to<br>

                                                    the NIO API would be

                                                    much less costly to

                                                    design and implement

                                                    than a<br>

                                                    generalized

                                                    user-level

                                                    native-call

                                                    intrinsification

                                                    mechanism.<br>

                                                    <br>

                                                    - Mark<br>

                                                  </blockquote>

                                                </div>

                                              </blockquote>

                                            </div>

                                          </blockquote>

                                        </div>

                                      </blockquote>

                                    </div>

                                  </blockquote>

                                </div>

                              </blockquote>

                            </div>

                          </blockquote>

                        </div>

                      </div>

                      -- <br>

                      <div dir="ltr">Sent from my phone</div>

                    </blockquote>

                  </div>

                </blockquote>

              </div>

            </div>

            -- <br>

            <div dir="ltr" data-smartmail="gmail_signature">Sent from my

              phone</div>

          </blockquote>

        </div>

      </div>

      -- <br>

      <div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Sent from my phone</div>

    </blockquote>

  </body>

</html>