<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>Hi Wojtek,<br>
      thanks for sharing this list, I think this is a good starting
      point to understand more about your use case.</p>
    <p>Last week I've been looking at "getrusage" (as you mentioned it
      in an earlier email), and I was surprised to see that the call
      took a pointer to a (fairly big) struct which then needed to be
      initialized with some thread-local state:</p>
    <p><a class="moz-txt-link-freetext" href="https://man7.org/linux/man-pages/man2/getrusage.2.html">https://man7.org/linux/man-pages/man2/getrusage.2.html</a></p>
    <p>I've looked at the implementation, and it seems to be doing
      memset on the user-provided struct pointer, plus all the fields
      assignment. Eyeballing the implementation, this does not seem to
      me like a "classic" use case where dropping transition would help
      much. I mean, surely dropping transitions would help shaving some
      nanoseconds off the call, but it doesn't seem to me that the call
      would be shortlived enough to make a difference. Do you have some
      benchmarks on this one? I did some [1] and the call overhead
      seemed to come up at 260ns/op - w/o transition you might perhaps
      be able to get to 250ns, but that's in the noise?<br>
    </p>
    <p>As for getpid, note that you can do (since Java 9):<br>
      <br>
      ProcessHandle.current().pid();<br>
      <br>
      I believe the impl caches the result, so it shouldn't even make
      the native call.<br>
    </p>
    <p>Maurizio</p>
    <p>[1] -
      <a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java">http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java</a><br>
    </p>
    <div class="moz-cite-prefix">On 02/07/2022 07:42, Wojciech Kudla
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:CADV2yPmwYu0_Bef8wzPrjW9=67bU1X=385weFq6WWwCm+HpgFw@mail.gmail.com">
      
      <div dir="ltr">
        <div>
          <div>Hi Maurizio,<br>
            <br>
          </div>
          Thanks for staying on this.<br>
          <br>
          > Could you please provide a rough list of the native calls
          you make where you believe critical JNI is having a real
          impact in the performance of your application?<br>
        </div>
        <div><br>
          From the top of my head:<br>
        </div>
        <div>clock_gettime<br>
        </div>
        <div>recvmsg<br>
        </div>
        <div>recvmmsg</div>
        <div>sendmsg<br>
        </div>
        <div>sendmmsg</div>
        <div>select<br>
        </div>
        <div>getpid</div>
        <div>getcpu<br>
        </div>
        <div>getrusage<br>
        </div>
        <div><br>
        </div>
        <div>> Also, could you please tell us whether any of these
          calls need to interact with Java arrays?<br>
        </div>
        <div>No arrays or objects of any type involved. Everything
          happens by the means of passing raw pointers as longs and
          using other primitive types as function arguments.<br>
        </div>
        <div><br>
          > In other words, do you use critical JNI to remove the
          cost associated with thread transitions, or are you also
          taking advantage of accessing on-heap memory _directly_ from
          native code?<br>
        </div>
        <div>Criticial JNI natives are used solely to remove the cost of
          transitions. We don't get anywhere near java heap in native
          code.<br>
          <br>
        </div>
        <div>In general I think it makes a lot of sense for Java as a
          language/platform to have some guards around unsafe code, but
          on the other hand the popularity of libraries employing Unsafe
          and their success in more performance-oriented corners of
          software engineering is a clear indicator there is a need for
          the JVM to provide access to more low-level primitives and
          mechanisms. <br>
        </div>
        <div>I think it's entirely fair to tell developers that all bets
          are off when they get into some non-idiomatic scenarios but
          please don't take away a feature that greatly contributed to
          Java's success.<br>
          <br>
        </div>
        <div>Kind regards,<br>
        </div>
        <div>Wojtek<br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Wed, Jun 29, 2022 at 5:20
          PM Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div>
            <p>Hi Wojciech,<br>
              picking up this thread again. After some internal
              discussion, we realize that we don't know enough about
              your use case. While re-enabling JNI critical would
              obviously provide a quick fix, we're afraid that (a)
              developers might end up depending on JNI critical when
              they don't need to (perhaps also unaware of the
              consequences of depending on it) and (b) that there might
              actually be _better_ (as in: much faster) solutions than
              using critical native calls to address at least some of
              your use cases (that seemed to be the case with the
              clock_gettime example you mentioned). Could you please
              provide a rough list of the native calls you make where
              you believe critical JNI is having a real impact in the
              performance of your application? Also, could you please
              tell us whether any of these calls need to interact with
              Java arrays? In other words, do you use critical JNI to
              remove the cost associated with thread transitions, or are
              you also taking advantage of accessing on-heap memory
              _directly_ from native code?</p>
            <p>Regards<br>
              Maurizio<br>
            </p>
            <div>On 13/06/2022 21:38, Wojciech Kudla wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">
                <div>
                  <div>
                    <div>
                      <div>
                        <div>Hi Mark,<br>
                          <br>
                        </div>
                        Thanks for your input and apologies for the
                        delayed response.<br>
                        <br>
                        > If the platform included, say, an
                        intrinsified System.nanoRealTime()<br>
                        method that returned
                        clock_gettime(CLOCK_REALTIME), how much would<br>
                        that help developers in your unnamed industry?<br>
                        <br>
                      </div>
                      Exposing realtime clock with nanosecond
                      granularity in the JDK would be a great step
                      forward. I should have made it clear that I
                      represent fintech corner (investment banking to be
                      exact) but the issues my message touches upon span
                      areas such as HPC, audio processing, gaming, and
                      defense industry so it's not like we have an
                      isolated case.<br>
                      <br>
                      > In a similar vein, if people are finding it
                      necessary to “replace parts<br>
                      of NIO with hand-crafted native code” then it
                      would be interesting to<br>
                      understand what their requirements are<br>
                      <br>
                    </div>
                    As for the other example I provided with making very
                    short lived syscalls such as recvmsg/recvmmsg the
                    premise is getting access to hardware timestamps on
                    the ingress and egress ends as well as enabling
                    batch receive with a single syscall and otherwise
                    exploiting features unavailable from the JDK (like
                    access to CMSG interface, scatter/gather, etc).<br>
                  </div>
                  <div>There are also other examples of calls that we'd
                    love to make often and at lowest possible cost (ie.
                    getrusage) but I'm not sure if there's a strong case
                    for some of these ideas, that's why it might be
                    worth looking into more generic approach for
                    performance sensitive code.<br>
                  </div>
                  <div>Hope this does better job at explaining where
                    we're coming from than my previous messages.<br>
                  </div>
                  <div><br>
                  </div>
                  Thanks,<br>
                </div>
                W<br>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Tue, Jun 7, 2022 at
                  6:31 PM <<a href="mailto:mark.reinhold@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">mark.reinhold@oracle.com</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px 0.8ex;border-left:1px solid
                  rgb(204,204,204);padding-left:1ex">2022/6/6 0:24:17
                  -0700, <a href="mailto:wkudla.kernel@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">wkudla.kernel@gmail.com</a>:<br>
                  >> Yes for System.nanoTime(), but
                  System.currentTimeMillis() reports<br>
                  >> CLOCK_REALTIME.<br>
                  > <br>
                  > Unfortunately System.currentTimeMillis() offers
                  only millisecond<br>
                  > granularity which is the reason why our industry
                  has to resort to<br>
                  > clock_gettime.<br>
                  <br>
                  If the platform included, say, an intrinsified
                  System.nanoRealTime()<br>
                  method that returned clock_gettime(CLOCK_REALTIME),
                  how much would<br>
                  that help developers in your unnamed industry?<br>
                  <br>
                  In a similar vein, if people are finding it necessary
                  to “replace parts<br>
                  of NIO with hand-crafted native code” then it would be
                  interesting to<br>
                  understand what their requirements are.  Some simple
                  enhancements to<br>
                  the NIO API would be much less costly to design and
                  implement than a<br>
                  generalized user-level native-call intrinsification
                  mechanism.<br>
                  <br>
                  - Mark<br>
                </blockquote>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </body>
</html>