Obsoleting JavaCritical

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Mon Jul 4 09:53:47 UTC 2022


Hi Wojtek,
thanks for sharing this list, I think this is a good starting point to 
understand more about your use case.

Last week I've been looking at "getrusage" (as you mentioned it in an 
earlier email), and I was surprised to see that the call took a pointer 
to a (fairly big) struct which then needed to be initialized with some 
thread-local state:

https://man7.org/linux/man-pages/man2/getrusage.2.html

I've looked at the implementation, and it seems to be doing memset on 
the user-provided struct pointer, plus all the fields assignment. 
Eyeballing the implementation, this does not seem to me like a "classic" 
use case where dropping transition would help much. I mean, surely 
dropping transitions would help shaving some nanoseconds off the call, 
but it doesn't seem to me that the call would be shortlived enough to 
make a difference. Do you have some benchmarks on this one? I did some 
[1] and the call overhead seemed to come up at 260ns/op - w/o transition 
you might perhaps be able to get to 250ns, but that's in the noise?

As for getpid, note that you can do (since Java 9):

ProcessHandle.current().pid();

I believe the impl caches the result, so it shouldn't even make the 
native call.

Maurizio

[1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java

On 02/07/2022 07:42, Wojciech Kudla wrote:
> Hi Maurizio,
>
> Thanks for staying on this.
>
> > Could you please provide a rough list of the native calls you make 
> where you believe critical JNI is having a real impact in the 
> performance of your application?
>
> From the top of my head:
> clock_gettime
> recvmsg
> recvmmsg
> sendmsg
> sendmmsg
> select
> getpid
> getcpu
> getrusage
>
> > Also, could you please tell us whether any of these calls need to 
> interact with Java arrays?
> No arrays or objects of any type involved. Everything happens by the 
> means of passing raw pointers as longs and using other primitive types 
> as function arguments.
>
> > In other words, do you use critical JNI to remove the cost 
> associated with thread transitions, or are you also taking advantage 
> of accessing on-heap memory _directly_ from native code?
> Criticial JNI natives are used solely to remove the cost of 
> transitions. We don't get anywhere near java heap in native code.
>
> In general I think it makes a lot of sense for Java as a 
> language/platform to have some guards around unsafe code, but on the 
> other hand the popularity of libraries employing Unsafe and their 
> success in more performance-oriented corners of software engineering 
> is a clear indicator there is a need for the JVM to provide access to 
> more low-level primitives and mechanisms.
> I think it's entirely fair to tell developers that all bets are off 
> when they get into some non-idiomatic scenarios but please don't take 
> away a feature that greatly contributed to Java's success.
>
> Kind regards,
> Wojtek
>
> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com> wrote:
>
>     Hi Wojciech,
>     picking up this thread again. After some internal discussion, we
>     realize that we don't know enough about your use case. While
>     re-enabling JNI critical would obviously provide a quick fix,
>     we're afraid that (a) developers might end up depending on JNI
>     critical when they don't need to (perhaps also unaware of the
>     consequences of depending on it) and (b) that there might actually
>     be _better_ (as in: much faster) solutions than using critical
>     native calls to address at least some of your use cases (that
>     seemed to be the case with the clock_gettime example you
>     mentioned). Could you please provide a rough list of the native
>     calls you make where you believe critical JNI is having a real
>     impact in the performance of your application? Also, could you
>     please tell us whether any of these calls need to interact with
>     Java arrays? In other words, do you use critical JNI to remove the
>     cost associated with thread transitions, or are you also taking
>     advantage of accessing on-heap memory _directly_ from native code?
>
>     Regards
>     Maurizio
>
>     On 13/06/2022 21:38, Wojciech Kudla wrote:
>>     Hi Mark,
>>
>>     Thanks for your input and apologies for the delayed response.
>>
>>     > If the platform included, say, an intrinsified
>>     System.nanoRealTime()
>>     method that returned clock_gettime(CLOCK_REALTIME), how much would
>>     that help developers in your unnamed industry?
>>
>>     Exposing realtime clock with nanosecond granularity in the JDK
>>     would be a great step forward. I should have made it clear that I
>>     represent fintech corner (investment banking to be exact) but the
>>     issues my message touches upon span areas such as HPC, audio
>>     processing, gaming, and defense industry so it's not like we have
>>     an isolated case.
>>
>>     > In a similar vein, if people are finding it necessary to
>>     “replace parts
>>     of NIO with hand-crafted native code” then it would be interesting to
>>     understand what their requirements are
>>
>>     As for the other example I provided with making very short lived
>>     syscalls such as recvmsg/recvmmsg the premise is getting access
>>     to hardware timestamps on the ingress and egress ends as well as
>>     enabling batch receive with a single syscall and otherwise
>>     exploiting features unavailable from the JDK (like access to CMSG
>>     interface, scatter/gather, etc).
>>     There are also other examples of calls that we'd love to make
>>     often and at lowest possible cost (ie. getrusage) but I'm not
>>     sure if there's a strong case for some of these ideas, that's why
>>     it might be worth looking into more generic approach for
>>     performance sensitive code.
>>     Hope this does better job at explaining where we're coming from
>>     than my previous messages.
>>
>>     Thanks,
>>     W
>>
>>     On Tue, Jun 7, 2022 at 6:31 PM <mark.reinhold at oracle.com> wrote:
>>
>>         2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com:
>>         >> Yes for System.nanoTime(), but System.currentTimeMillis()
>>         reports
>>         >> CLOCK_REALTIME.
>>         >
>>         > Unfortunately System.currentTimeMillis() offers only
>>         millisecond
>>         > granularity which is the reason why our industry has to
>>         resort to
>>         > clock_gettime.
>>
>>         If the platform included, say, an intrinsified
>>         System.nanoRealTime()
>>         method that returned clock_gettime(CLOCK_REALTIME), how much
>>         would
>>         that help developers in your unnamed industry?
>>
>>         In a similar vein, if people are finding it necessary to
>>         “replace parts
>>         of NIO with hand-crafted native code” then it would be
>>         interesting to
>>         understand what their requirements are.  Some simple
>>         enhancements to
>>         the NIO API would be much less costly to design and implement
>>         than a
>>         generalized user-level native-call intrinsification mechanism.
>>
>>         - Mark
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20220704/975158e6/attachment-0001.htm>


More information about the panama-dev mailing list