<div dir="ltr"><div><div>Thanks Maurizio,<br><br></div>I raised this case mainly about clock_gettime and recvmsg/sendmsg, I think we're focusing on the wrong things here. Feel free to drop the two syscalls from the discussion entirely, but the main usecases I have been presenting throughout this thread definitely stand.<br><br></div><div>Thanks<br></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com">maurizio.cimadamore@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi Wojtek,<br>
thanks for sharing this list, I think this is a good starting
point to understand more about your use case.</p>
<p>Last week I've been looking at "getrusage" (as you mentioned it
in an earlier email), and I was surprised to see that the call
took a pointer to a (fairly big) struct which then needed to be
initialized with some thread-local state:</p>
<p><a href="https://man7.org/linux/man-pages/man2/getrusage.2.html" target="_blank">https://man7.org/linux/man-pages/man2/getrusage.2.html</a></p>
<p>I've looked at the implementation, and it seems to be doing
memset on the user-provided struct pointer, plus all the fields
assignment. Eyeballing the implementation, this does not seem to
me like a "classic" use case where dropping transition would help
much. I mean, surely dropping transitions would help shaving some
nanoseconds off the call, but it doesn't seem to me that the call
would be shortlived enough to make a difference. Do you have some
benchmarks on this one? I did some [1] and the call overhead
seemed to come up at 260ns/op - w/o transition you might perhaps
be able to get to 250ns, but that's in the noise?<br>
</p>
<p>As for getpid, note that you can do (since Java 9):<br>
<br>
ProcessHandle.current().pid();<br>
<br>
I believe the impl caches the result, so it shouldn't even make
the native call.<br>
</p>
<p>Maurizio</p>
<p>[1] -
<a href="http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java" target="_blank">http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java</a><br>
</p>
<div>On 02/07/2022 07:42, Wojciech Kudla
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>Hi Maurizio,<br>
<br>
</div>
Thanks for staying on this.<br>
<br>
> Could you please provide a rough list of the native calls
you make where you believe critical JNI is having a real
impact in the performance of your application?<br>
</div>
<div><br>
From the top of my head:<br>
</div>
<div>clock_gettime<br>
</div>
<div>recvmsg<br>
</div>
<div>recvmmsg</div>
<div>sendmsg<br>
</div>
<div>sendmmsg</div>
<div>select<br>
</div>
<div>getpid</div>
<div>getcpu<br>
</div>
<div>getrusage<br>
</div>
<div><br>
</div>
<div>> Also, could you please tell us whether any of these
calls need to interact with Java arrays?<br>
</div>
<div>No arrays or objects of any type involved. Everything
happens by the means of passing raw pointers as longs and
using other primitive types as function arguments.<br>
</div>
<div><br>
> In other words, do you use critical JNI to remove the
cost associated with thread transitions, or are you also
taking advantage of accessing on-heap memory _directly_ from
native code?<br>
</div>
<div>Criticial JNI natives are used solely to remove the cost of
transitions. We don't get anywhere near java heap in native
code.<br>
<br>
</div>
<div>In general I think it makes a lot of sense for Java as a
language/platform to have some guards around unsafe code, but
on the other hand the popularity of libraries employing Unsafe
and their success in more performance-oriented corners of
software engineering is a clear indicator there is a need for
the JVM to provide access to more low-level primitives and
mechanisms. <br>
</div>
<div>I think it's entirely fair to tell developers that all bets
are off when they get into some non-idiomatic scenarios but
please don't take away a feature that greatly contributed to
Java's success.<br>
<br>
</div>
<div>Kind regards,<br>
</div>
<div>Wojtek<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Jun 29, 2022 at 5:20
PM Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank">maurizio.cimadamore@oracle.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi Wojciech,<br>
picking up this thread again. After some internal
discussion, we realize that we don't know enough about
your use case. While re-enabling JNI critical would
obviously provide a quick fix, we're afraid that (a)
developers might end up depending on JNI critical when
they don't need to (perhaps also unaware of the
consequences of depending on it) and (b) that there might
actually be _better_ (as in: much faster) solutions than
using critical native calls to address at least some of
your use cases (that seemed to be the case with the
clock_gettime example you mentioned). Could you please
provide a rough list of the native calls you make where
you believe critical JNI is having a real impact in the
performance of your application? Also, could you please
tell us whether any of these calls need to interact with
Java arrays? In other words, do you use critical JNI to
remove the cost associated with thread transitions, or are
you also taking advantage of accessing on-heap memory
_directly_ from native code?</p>
<p>Regards<br>
Maurizio<br>
</p>
<div>On 13/06/2022 21:38, Wojciech Kudla wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>Hi Mark,<br>
<br>
</div>
Thanks for your input and apologies for the
delayed response.<br>
<br>
> If the platform included, say, an
intrinsified System.nanoRealTime()<br>
method that returned
clock_gettime(CLOCK_REALTIME), how much would<br>
that help developers in your unnamed industry?<br>
<br>
</div>
Exposing realtime clock with nanosecond
granularity in the JDK would be a great step
forward. I should have made it clear that I
represent fintech corner (investment banking to be
exact) but the issues my message touches upon span
areas such as HPC, audio processing, gaming, and
defense industry so it's not like we have an
isolated case.<br>
<br>
> In a similar vein, if people are finding it
necessary to “replace parts<br>
of NIO with hand-crafted native code” then it
would be interesting to<br>
understand what their requirements are<br>
<br>
</div>
As for the other example I provided with making very
short lived syscalls such as recvmsg/recvmmsg the
premise is getting access to hardware timestamps on
the ingress and egress ends as well as enabling
batch receive with a single syscall and otherwise
exploiting features unavailable from the JDK (like
access to CMSG interface, scatter/gather, etc).<br>
</div>
<div>There are also other examples of calls that we'd
love to make often and at lowest possible cost (ie.
getrusage) but I'm not sure if there's a strong case
for some of these ideas, that's why it might be
worth looking into more generic approach for
performance sensitive code.<br>
</div>
<div>Hope this does better job at explaining where
we're coming from than my previous messages.<br>
</div>
<div><br>
</div>
Thanks,<br>
</div>
W<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Jun 7, 2022 at
6:31 PM <<a href="mailto:mark.reinhold@oracle.com" target="_blank">mark.reinhold@oracle.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">2022/6/6 0:24:17
-0700, <a href="mailto:wkudla.kernel@gmail.com" target="_blank">wkudla.kernel@gmail.com</a>:<br>
>> Yes for System.nanoTime(), but
System.currentTimeMillis() reports<br>
>> CLOCK_REALTIME.<br>
> <br>
> Unfortunately System.currentTimeMillis() offers
only millisecond<br>
> granularity which is the reason why our industry
has to resort to<br>
> clock_gettime.<br>
<br>
If the platform included, say, an intrinsified
System.nanoRealTime()<br>
method that returned clock_gettime(CLOCK_REALTIME),
how much would<br>
that help developers in your unnamed industry?<br>
<br>
In a similar vein, if people are finding it necessary
to “replace parts<br>
of NIO with hand-crafted native code” then it would be
interesting to<br>
understand what their requirements are. Some simple
enhancements to<br>
the NIO API would be much less costly to design and
implement than a<br>
generalized user-level native-call intrinsification
mechanism.<br>
<br>
- Mark<br>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote></div>