Using JFR as an alternative of SecurityManager for monitoring network

Mon May 3 14:55:10 UTC 2021

Hi Lim,

Thanks for providing details of your use case; it is always helpful to have something concrete to discuss. There were references to JFR in some of the Security Manager deprecation discussions as an alternative for monitoring.
I will focus primarily on the JFR related questions because the talks about the Security Manager deprecation, in general, is held on other, more suitable lists.

A proviso for this response is that we are only talking about using a Security Manager to do the monitoring. We are not talking about request management, blocking/preventing requests because that is something JFR, as a framework, will never allow the user to do. A central design tenet in JFR is to prevent users from introducing blocking to applications/systems, intentionally or inadvertently. JFR does not allow for synchronous callbacks, which, to a certain extent, explains some of the results you are seeing.

Your example is interesting and a bit disconcerting, as it demonstrates how easy it is to hook into I/O traffic by installing a custom Security Manager. It is convenient to have all request attempts get funnelled through a single tap point. At the same time, this very point can also become a bottleneck and hurt performance and scalability if one is not careful.

Here are some answers to your questions:

"In SM only [4], I can see exactly what is performed before the action is happen.
Note that all sockets connections are logged and with the addition of URL.
The SecurityManager has prefixed with "[SM]". Optionally, I can get the stacktrace using SecurityManager.getClassContext() if needed."

[MG] Yes, all socket connections are now also serialized over the output stream. To accomplish something similar using JFR, one could imagine a new event, perhaps called "CreateSocket", corresponding to the call to "checkConnect()".

"In JFR Only [5], the first InetAddress connection is not captured by the JFR with SocketRead and SocketWrite event enabled when SM is able to monitor it."

[MG] That is because the SM is invoked before performing a cached name resolution lookup, and a cached lookup does not generate any network activity. Hence no SocketRead/SocketWrite JFR events generated.

"On HttpURLConnection - The message is display after getting the reply from the website which is on "[16:00:13.444]". In line 7, the message displayed on [16:00:13.499] but the action has happened on StartTime='16:00:13.058'
with "Write Event". Note that there is the continuation of the events until all the bytes has been read/written."

[MG] Yes, the difference here is that JFR is asynchronous to avoid serialization. The JFR events will have a timestamp taken when the event happened, but the display of the event will naturally need to occur after being recorded. By default, the JVM emits events to the disk every second.

"On HttpClient - which starts from line 120 "[16:00:15.488]" shows the similarity as the above but the first two events are useless because the "Host", "Address" and "Port" does not record the actual destination.
Only after the third event, I can obtain the useful information but at the same time, the StackTrace information become useless since it does not show the originating class."

[MG] The JFR events record the stack traces for the code that is doing the network calls. You are correct that it does not include the original requestor class, which does not perform any calling, but only issues an asynchronous operation.

"Unexpectedly, The SM managed to monitor the *reading and writing file* by the JFR that I *explicitly used streamed version*. Why does JFR in this case create temp files and delete it? Without security manager, I wouldn't have know Streamed JFR write to disk. Shouldn't the stream is only kept in memory?"

[MG] Keeping data to disk is by design and is central to the inner workings of JFR, and provides many benefits. The disk offers much more space to store history compared to memory. It is yet another performance design not to allow tight coupling with consumers who cannot read process in-memory data quickly enough, as this would block the system or cause data to be lost.

We use the disk also in the streaming case because streaming needs to co-exist with non-streaming recordings. In addition, storing the data on disk allows for cross-process streaming - another process does not need to keep up with the data production in-memory but can read at its own pace from disk. All disk writes in JFR are asynchronous, as it is a background task that writes out data continuously to free up precious in-memory space. In the future, there might be a more critical use case for only working in RAM. If that is the case today, perhaps in a disk restrained environment, it can be solved by mounting a RAM disk and let JFR use it instead by specifying -XX:FlightRecorderOptions:repository=/ramdisk.

"I have several questions about JFR, particularity the streamed version

1. I not sure why "jdk.jfr.internal.tool.PrettyWriter" is not exposed because I found useful method like "formatMethod" so that I do not need to manually parse the stacktrace in my JFR code [3]."

[MG] There are many ways to format output, and it is difficult to expose something to cover all possible cases. We want more time to see how values are being formatted before considering an API for it.

"2. Is it an implementation detail that using Streamed JFR create/delete files on disk? Since if the program crash/force terminate, the temp .jfr is not deleted and depending on the events enabled, it can consume a lot of disk space."

[MG] See the earlier note about the disk usage. It is correct that temporary files can linger on disk in cases of abnormal termination, such as crashes. At the same time, it is also a benefit, especially for support personnel, in that they have the history available regarding what lead up to the problematic situation. It can be compared somewhat to core file creation on a crash, only that the .jfr files are usually much much smaller compared to a core file.  

"3. Currently the streamed event are show *after* the network calls are happened, while the SM is show *before* the network calls took place.
Although there is "StartTime" which shows the exact time when the event happened, it is not show orderly in the log [5][6]."

[MG] Again, JFR does not let users introduce serialization.

"4. Is there way to get "before" an actual event is occurred like:
rs.beforeEvent("jdk.SocketWrite", System.out::println) so that the log is shown like in the SecurityManager implementation [4]."

[MG] No, not just like that using the API. "Before" is a bit meaningless in terms of JFR, because events are points in time. "Before" connotes interception, most likely a means to "hook into" an operation about to be attempted. Again, there are many reasons why JFR does not allow this.

"5. In the HttpClient method of calling network, the first 2 events address and host is not actual destination, and the port seems random.
Is this normal because the host is not resolved yet?"

[MG] The asynchronous implementation of HttpClient perform a lot of things - the parts that end up in JFR might not be easily translatable and may need some more detailed processing to get to the relevant data. 

6. Will JDK-8265962 - "Evaluate adding Networking JFR events", that was described on another thread [7], will address the InetAddress if implemented?

[MG] JDK-8265962 - "Evaluate adding Networking JFR events" will be worked on by domain experts related to networking in the JDK. Their work will focus on providing general, highly performant and scalable monitoring in this area. I'm not quite sure what you are referring to with "will address the InetAddress if implemented". Insights into name resolution are tricky as you have the caching aspects to consider (as seen with your example).  In addition, it is questionable if name resolutions that do not involve network traffic should be reported. Naively, one could put in an event to report name resolutions, but this comes at the cost of heavy traffic and large amounts of data generated. What is of interest, in my opinion, is the actual network traffic issued, and perhaps not so much the internal setup mechanism of name resolution.

"I think that using JFR is a good approach but it is far from usable in my use case, I'm not sure this is the best practice for using JFR in programmatic way and how can it be improved. Or if possible, how do I make it "emulate" the behavior of SM?"

[MG] As you have probably seen by now, many things differ between using a Security Manager as a monitoring tool vs using JFR out-of-the-box. A better comparison/experience would probably have been achieved if JFR events were already located in the corresponding places to reflect on your particular use case more directly. But this is good feedback, as it denotes areas of interest and JFR can provide the data you are after. The challenge is to craft highly performant and scalable events and introduce them in the proper code locations. It is one of the reasons why domain experts primarily handle them. 

We need to keep in mind that there exists a fundamental difference here:

Using the Security Manager in this way is convenient and obviously powerful but also dangerous. The user must be cautious not to accidentally introduce serialization into the JVM because all requests now traverse arbitrary user code. 

JFR, on the other hand, is designed not to let this happen, which is one of the main reasons it can maintain its low overhead, high performance, and scalability. One of the trade-offs it sacrifices for this is the convenience and power that comes with synchronous callbacks.

If there are important data points that are missing today that would improve overall monitoring aspects, they can be complemented and introduced into the JDK. Perhaps HttpClient can be instrumented with specific events to let you get the URL information you need? Again, this is something that will need to be decided by the domain experts, so perhaps you can add a note to JDK-8265962 if you have specific concerns regarding what data you believe would be helpful.

Thank you
Markus

-----Original Message-----
From: hotspot-jfr-dev <hotspot-jfr-dev-retn at openjdk.java.net> On Behalf Of Lim
Sent: den 29 april 2021 12:07
To: hotspot-jfr-dev at openjdk.java.net
Subject: Using JFR as an alternative of SecurityManager for monitoring network

Hi,

Since the SecurityManager will be deprecated in JEP 411, I have been evaluating JFR for my use case which is monitoring libraries in a program for network communications.

So I decided to make sample scenario that describes my use case below:

Main.java [1];
The code have three parts that perform network connections:
  Obtain the IP addresses from hostname.
  Using the older HttpURLConnection method, searched from the internet.
  The new HttpClient method.

NetMonitorSM.java [2];
Using SecurityManager to monitor Network and file usage (".jfr" file read/write/delete).

NetMonitorJFR.java [3];
Using JFR to monitor Socket Connection, which is the only way to determine if there are network connections.

Below are the logs that I seen using different methods for the monitoring.

In SM only [4], I can see exactly what is performed before the action is happen.
Note that all sockets connections are logged and with the addition of URL.
The SecurityManager has prefixed with "[SM]". Optionally, I can get the stacktrace using SecurityManager.getClassContext() if needed.

In JFR Only [5], the first InetAddress connection is not captured by the JFR with SocketRead and SocketWrite event enabled when SM is able to monitor it.

On HttpURLConnection - The message is display after getting the reply from the website which is on "[16:00:13.444]". In line 7, the message displayed on [16:00:13.499] but the action has happened on StartTime='16:00:13.058'
with "Write Event". Note that there is the continuation of the events until all the bytes has been read/written.

On HttpClient - which starts from line 120 "[16:00:15.488]" shows the similarity as the above but the first two events are useless because the "Host", "Address" and "Port" does not record the actual destination.
Only after the third event, I can obtain the useful information but at the same time, the StackTrace information become useless since it does not show the originating class.

With both SecurityManager and Java Flight Recorder enabled [6], It gives an interesting insight of how SM interacts with JFR.
First is the SM logs are shown first before the JFR event shows.

Unexpectedly, The SM managed to monitor the *reading and writing file* by the JFR that I *explicitly used streamed version*. Why does JFR in this case create temp files and delete it? Without security manager, I wouldn't have know Streamed JFR write to disk. Shouldn't the stream is only kept in memory?

I have several questions about JFR, particularity the streamed version

1. I not sure why "jdk.jfr.internal.tool.PrettyWriter" is not exposed because I found useful method like "formatMethod" so that I do not need to manually parse the stacktrace in my JFR code [3].

2. Is it an implementation detail that using Streamed JFR create/delete files on disk? Since if the program crash/force terminate, the temp .jfr is not deleted and depending on the events enabled, it can consume a lot of disk space.

3. Currently the streamed event are show *after* the network calls are happened, while the SM is show *before* the network calls took place.
Although there is "StartTime" which shows the exact time when the event happened, it is not show orderly in the log [5][6].

4. Is there way to get "before" an actual event is occurred like:
rs.beforeEvent("jdk.SocketWrite", System.out::println) so that the log is shown like in the SecurityManager implementation [4].

5. In the HttpClient method of calling network, the first 2 events address and host is not actual destination, and the port seems random.
Is this normal because the host is not resolved yet?

6. Will JDK-8265962 - "Evaluate adding Networking JFR events", that was described on another thread [7], will address the InetAddress if implemented?

I think that using JFR is a good approach but it is far from usable in my use case, I'm not sure this is the best practice for using JFR in programmatic way and how can it be improved. Or if possible, how do I make it "emulate" the behavior of SM?

Thanks

[1] https://paste.ee/p/vCfZr#section0
[2] https://paste.ee/p/vCfZr#section1
[3] https://paste.ee/p/vCfZr#section2
[4] https://paste.ee/p/Mzczr#section0
[5] https://paste.ee/p/Mzczr#section1
[6] https://paste.ee/p/Mzczr#section2
[7] https://mail.openjdk.java.net/pipermail/security-dev/2021-April/025633.html