Using JFR as an alternative of SecurityManager for monitoring network

Mon May 31 17:37:10 UTC 2021

Hi Lim,

There is nothing technical that will change with the JEP for JDK 17, it will only communicate a warning that the feature is targeted for removal sometime in the future.

With "serialization" I refer to operations having to occur serially, instead of parallel and/or concurrent. Effects of intentional or unintentional locks and critical sections being introduced - the impact can range from insignificant to serious, depending on contention.

> Does JFR provide real-time events for "time sensitive" situations, 
> which I receive the event when it happens? For example, there is a 
> client that sends JFR stream events to a central monitoring server.
> When the client has performed suspicious connections, the monitoring 
> server should isolate the client from the network and generate an 
> alert. If there is a large delay between the event happening and 
> dispatch, it can potentially affect the whole network.
> Will "EventSettings.withoutThreshold().withPeriod(Duration.ofSeconds(0))"
> will help achieve this goal?
>

There is no such thing at the moment.  The parameters you list will only control how much data is recorded (i.e. it controls the cutoff for recording outliers). Also, the VM will only emit events to disk once every second. So even if its recorded, there is a delay of at least one second before it is materialized and made available for consumption.

> What I mean by "before" is actions prior to establishing the 
> connection (SocketRead/Write), that will log the creation (when it 
> created, where the class is originated, what is requested), since it 
> will only happen once only, not continuously like SocketRead/Write 
> that will "pollute the log". Otherwise I agree.

There is currently some work being done in the networking group to add a CreateSocket event.

> Will serialization occur if log to file only (via log4j)? I can 
> understand the potential if arbitrary user code were parsed for future 
> processing.

Depends on the implementation of whatever is used / called from the Security Manager hooks. Writing to a file or to  an output stream requires some means to coordinate threads. Asynchronous behavior is preferred.

> Currently I'm unable to add a note to JDK-8265962 because I have no 
> account there, I will make my note here on what I believe are helpful;
>
> DNS request event without cache. (Honestly, I will be more interested 
> in non-cached requests) DNS request event with cache.
> I think both of these events can become one with 
> "getValue("isCached")" to differentiate it.
>
> Socket creation event - before a connection is requested, prior to 
> SocketRead/Write.
> Socket received event - When receiving an initial connection handshake 
> from client.
>
> URL event - getting URL info similar to URLPermission. It should work 
> on HttpURLConnection/HttpClient/or any potential URLs that can call
> network)
>
> A flow of the URL event that I imagined:
> URL event -> Socket Create event -> SocketRead/SocketWrite
>
> In my opinion, these events should happen before any connection is 
> made so if there is an anomaly, it can be known earlier.
>
> Side note (might be out of this network monitoring scope):
> I found that "jdk.FileRead" and "jdk.FileWrite" have similar behavior 
> to SocketRead/Write which will produce logs until all the bytes are 
> read/written.
> Currently there isn't a way to determine if a file is accessed or 
> deleted so I imagine events like FileAccess, FileCreate and FileDelete 
> can be added, if it is scalable and performant.

I know the networking group has started to look into some of these, at least the Socket creation event.

In general, I would not rely on JFR to let you "act" on something "before" it happens - although the events will have a timestamp detailing exactly when something happened, for performance, there are many layers of internal caching as the event moves towards materialization.

Thanks
Markus

-----Original Message-----
From: Lim <lim.chainz11+mailing at gmail.com> 
Sent: den 31 maj 2021 18:27
To: Markus Gronlund <markus.gronlund at oracle.com>
Cc: hotspot-jfr-dev at openjdk.java.net
Subject: Re: Using JFR as an alternative of SecurityManager for monitoring network

Hi Markus, I will appreciate to have your feedback for my previous message, since the JEP will be targeted in Java 17.

Thanks

On Tue, May 18, 2021 at 12:09 PM Lim <lim.chainz11+mailing at gmail.com> wrote:
>
> Hi Markus,
>
> Thank you for your explanation about the JFR behavior that I'm seeing, 
> it helps me understand more about the design of this framework but I 
> have some areas that I do not quite understand.
>
> In your reply, you have mention about serialization. Can I know what 
> role it plays? Is it about the "Serializable interface" or getting the 
> result in a serial way? Since I have displayed it in the logs, does it 
> mean I have accidentally introduced serialization if I parse the 
> events in order?
>
> >[MG] Yes, all socket connections are now also serialized over the output stream. To accomplish something similar using JFR, one could imagine a new event, perhaps called "CreateSocket", corresponding to the call to "checkConnect()"
>
> It will be great if this event is available in the next JDK version.
>
> >[MG] That is because the SM is invoked before performing a cached name resolution lookup, and a cached lookup does not generate any network activity. Hence no SocketRead/SocketWrite JFR events generated.
>
> From what I have understood is InetAddress will perform cached and 
> non-cached name resolution, and SM does not differentiate both 
> requests. How will it be cached when the name resolution has never 
> been made before?
>
> >[MG] Yes, the difference here is that JFR is asynchronous to avoid serialization. The JFR events will have a timestamp taken when the event happened, but the display of the event will naturally need to occur after being recorded. By default, the JVM emits events to the disk every second.
>
> Does JFR provide real-time events for "time sensitive" situations, 
> which I receive the event when it happens? For example, there is a 
> client that sends JFR stream events to a central monitoring server.
> When the client has performed suspicious connections, the monitoring 
> server should isolate the client from the network and generate an 
> alert. If there is a large delay between the event happening and 
> dispatch, it can potentially affect the whole network.
> Will "EventSettings.withoutThreshold().withPeriod(Duration.ofSeconds(0))"
> will help achieve this goal?
>
> >[MG] The JFR events record the stack traces for the code that is doing the network calls. You are correct that it does not include the original requestor class, which does not perform any calling, but only issues an asynchronous operation.
>
> Is there a way that I can get the requester class somehow with the 
> correct timestamp and destination address? Both synchronous and 
> asynchronous operation of HTTPClient have similar behavior.
>
> >[MG] No, not just like that using the API. "Before" is a bit meaningless in terms of JFR, because events are points in time. "Before" connotes interception, most likely a means to "hook into" an operation about to be attempted. Again, there are many reasons why JFR does not allow this.
>
> What I mean by "before" is actions prior to establishing the 
> connection (SocketRead/Write), that will log the creation (when it 
> created, where the class is originated, what is requested), since it 
> will only happen once only, not continuously like SocketRead/Write 
> that will "pollute the log". Otherwise I agree.
>
> >[MG] The asynchronous implementation of HttpClient perform a lot of things - the parts that end up in JFR might not be easily translatable and may need some more detailed processing to get to the relevant data.
>
> I still not quite understand why HttpClient will *make request to 
> localhost first* with random port number (which have the requestor 
> class), then the intended destination later (which does not have 
> requester class)? I can do future processing such as checking the 
> address is not localhost but in "time-sensitive" situation, it may 
> cause delay in actions.
>
> >In addition, it is questionable if name resolutions that do not involve network traffic should be reported. Naively, one could put in an event to report name resolutions, but this comes at the cost of heavy traffic and large amounts of data generated. What is of interest, in my opinion, is the actual network traffic issued, and perhaps not so much the internal setup mechanism of name resolution.
>
> In my opinion, there is a risk which is using DNS exfiltration to 
> covertly leak data out from the network where there will be no actual 
> network traffic occur. There might be OS/Network level of monitoring 
> that can thwart the attempts, but unable to know which class performs 
> the name resolution requests which will make it hard to find the 
> cause. In addition, problematic libraries that continuously produce 
> DNS queries can be found more easily.
>
> >Using the Security Manager in this way is convenient and obviously powerful but also dangerous. The user must be cautious not to accidentally introduce serialization into the JVM because all requests now traverse arbitrary user code.
>
> Will serialization occur if log to file only (via log4j)? I can 
> understand the potential if arbitrary user code were parsed for future 
> processing.
>
> >If there are important data points that are missing today that would improve overall monitoring aspects, they can be complemented and introduced into the JDK. Perhaps HttpClient can be instrumented with specific events to let you get the URL information you need? Again, this is something that will need to be decided by the domain experts, so perhaps you can add a note to JDK-8265962 if you have specific concerns regarding what data you believe would be helpful.
>
> Currently I'm unable to add a note to JDK-8265962 because I have no 
> account there, I will make my note here on what I believe are helpful;
>
> DNS request event without cache. (Honestly, I will be more interested 
> in non-cached requests) DNS request event with cache.
> I think both of these events can become one with 
> "getValue("isCached")" to differentiate it.
>
> Socket creation event - before a connection is requested, prior to 
> SocketRead/Write.
> Socket received event - When receiving an initial connection handshake 
> from client.
>
> URL event - getting URL info similar to URLPermission. It should work 
> on HttpURLConnection/HttpClient/or any potential URLs that can call
> network)
>
> A flow of the URL event that I imagined:
> URL event -> Socket Create event -> SocketRead/SocketWrite
>
> In my opinion, these events should happen before any connection is 
> made so if there is an anomaly, it can be known earlier.
>
> Side note (might be out of this network monitoring scope):
> I found that "jdk.FileRead" and "jdk.FileWrite" have similar behavior 
> to SocketRead/Write which will produce logs until all the bytes are 
> read/written.
> Currently there isn't a way to determine if a file is accessed or 
> deleted so I imagine events like FileAccess, FileCreate and FileDelete 
> can be added, if it is scalable and performant.
>
> Thanks
>
>
>
>
>
>
> On Mon, May 3, 2021 at 10:55 PM Markus Gronlund 
> <markus.gronlund at oracle.com> wrote:
> >
> > Hi Lim,
> >
> > Thanks for providing details of your use case; it is always helpful to have something concrete to discuss. There were references to JFR in some of the Security Manager deprecation discussions as an alternative for monitoring.
> > I will focus primarily on the JFR related questions because the talks about the Security Manager deprecation, in general, is held on other, more suitable lists.
> >
> > A proviso for this response is that we are only talking about using a Security Manager to do the monitoring. We are not talking about request management, blocking/preventing requests because that is something JFR, as a framework, will never allow the user to do. A central design tenet in JFR is to prevent users from introducing blocking to applications/systems, intentionally or inadvertently. JFR does not allow for synchronous callbacks, which, to a certain extent, explains some of the results you are seeing.
> >
> > Your example is interesting and a bit disconcerting, as it demonstrates how easy it is to hook into I/O traffic by installing a custom Security Manager. It is convenient to have all request attempts get funnelled through a single tap point. At the same time, this very point can also become a bottleneck and hurt performance and scalability if one is not careful.
> >
> > Here are some answers to your questions:
> >
> > "In SM only [4], I can see exactly what is performed before the action is happen.
> > Note that all sockets connections are logged and with the addition of URL.
> > The SecurityManager has prefixed with "[SM]". Optionally, I can get the stacktrace using SecurityManager.getClassContext() if needed."
> >
> > [MG] Yes, all socket connections are now also serialized over the output stream. To accomplish something similar using JFR, one could imagine a new event, perhaps called "CreateSocket", corresponding to the call to "checkConnect()".
> >
> > "In JFR Only [5], the first InetAddress connection is not captured by the JFR with SocketRead and SocketWrite event enabled when SM is able to monitor it."
> >
> > [MG] That is because the SM is invoked before performing a cached name resolution lookup, and a cached lookup does not generate any network activity. Hence no SocketRead/SocketWrite JFR events generated.
> >
> > "On HttpURLConnection - The message is display after getting the reply from the website which is on "[16:00:13.444]". In line 7, the message displayed on [16:00:13.499] but the action has happened on StartTime='16:00:13.058'
> > with "Write Event". Note that there is the continuation of the events until all the bytes has been read/written."
> >
> > [MG] Yes, the difference here is that JFR is asynchronous to avoid serialization. The JFR events will have a timestamp taken when the event happened, but the display of the event will naturally need to occur after being recorded. By default, the JVM emits events to the disk every second.
> >
> > "On HttpClient - which starts from line 120 "[16:00:15.488]" shows the similarity as the above but the first two events are useless because the "Host", "Address" and "Port" does not record the actual destination.
> > Only after the third event, I can obtain the useful information but at the same time, the StackTrace information become useless since it does not show the originating class."
> >
> > [MG] The JFR events record the stack traces for the code that is doing the network calls. You are correct that it does not include the original requestor class, which does not perform any calling, but only issues an asynchronous operation.
> >
> > "Unexpectedly, The SM managed to monitor the *reading and writing file* by the JFR that I *explicitly used streamed version*. Why does JFR in this case create temp files and delete it? Without security manager, I wouldn't have know Streamed JFR write to disk. Shouldn't the stream is only kept in memory?"
> >
> > [MG] Keeping data to disk is by design and is central to the inner workings of JFR, and provides many benefits. The disk offers much more space to store history compared to memory. It is yet another performance design not to allow tight coupling with consumers who cannot read process in-memory data quickly enough, as this would block the system or cause data to be lost.
> >
> > We use the disk also in the streaming case because streaming needs to co-exist with non-streaming recordings. In addition, storing the data on disk allows for cross-process streaming - another process does not need to keep up with the data production in-memory but can read at its own pace from disk. All disk writes in JFR are asynchronous, as it is a background task that writes out data continuously to free up precious in-memory space. In the future, there might be a more critical use case for only working in RAM. If that is the case today, perhaps in a disk restrained environment, it can be solved by mounting a RAM disk and let JFR use it instead by specifying -XX:FlightRecorderOptions:repository=/ramdisk.
> >
> > "I have several questions about JFR, particularity the streamed 
> > version
> >
> > 1. I not sure why "jdk.jfr.internal.tool.PrettyWriter" is not exposed because I found useful method like "formatMethod" so that I do not need to manually parse the stacktrace in my JFR code [3]."
> >
> > [MG] There are many ways to format output, and it is difficult to expose something to cover all possible cases. We want more time to see how values are being formatted before considering an API for it.
> >
> > "2. Is it an implementation detail that using Streamed JFR create/delete files on disk? Since if the program crash/force terminate, the temp .jfr is not deleted and depending on the events enabled, it can consume a lot of disk space."
> >
> > [MG] See the earlier note about the disk usage. It is correct that temporary files can linger on disk in cases of abnormal termination, such as crashes. At the same time, it is also a benefit, especially for support personnel, in that they have the history available regarding what lead up to the problematic situation. It can be compared somewhat to core file creation on a crash, only that the .jfr files are usually much much smaller compared to a core file.
> >
> > "3. Currently the streamed event are show *after* the network calls are happened, while the SM is show *before* the network calls took place.
> > Although there is "StartTime" which shows the exact time when the event happened, it is not show orderly in the log [5][6]."
> >
> > [MG] Again, JFR does not let users introduce serialization.
> >
> > "4. Is there way to get "before" an actual event is occurred like:
> > rs.beforeEvent("jdk.SocketWrite", System.out::println) so that the log is shown like in the SecurityManager implementation [4]."
> >
> > [MG] No, not just like that using the API. "Before" is a bit meaningless in terms of JFR, because events are points in time. "Before" connotes interception, most likely a means to "hook into" an operation about to be attempted. Again, there are many reasons why JFR does not allow this.
> >
> > "5. In the HttpClient method of calling network, the first 2 events address and host is not actual destination, and the port seems random.
> > Is this normal because the host is not resolved yet?"
> >
> > [MG] The asynchronous implementation of HttpClient perform a lot of things - the parts that end up in JFR might not be easily translatable and may need some more detailed processing to get to the relevant data.
> >
> > 6. Will JDK-8265962 - "Evaluate adding Networking JFR events", that was described on another thread [7], will address the InetAddress if implemented?
> >
> > [MG] JDK-8265962 - "Evaluate adding Networking JFR events" will be worked on by domain experts related to networking in the JDK. Their work will focus on providing general, highly performant and scalable monitoring in this area. I'm not quite sure what you are referring to with "will address the InetAddress if implemented". Insights into name resolution are tricky as you have the caching aspects to consider (as seen with your example).  In addition, it is questionable if name resolutions that do not involve network traffic should be reported. Naively, one could put in an event to report name resolutions, but this comes at the cost of heavy traffic and large amounts of data generated. What is of interest, in my opinion, is the actual network traffic issued, and perhaps not so much the internal setup mechanism of name resolution.
> >
> > "I think that using JFR is a good approach but it is far from usable in my use case, I'm not sure this is the best practice for using JFR in programmatic way and how can it be improved. Or if possible, how do I make it "emulate" the behavior of SM?"
> >
> > [MG] As you have probably seen by now, many things differ between using a Security Manager as a monitoring tool vs using JFR out-of-the-box. A better comparison/experience would probably have been achieved if JFR events were already located in the corresponding places to reflect on your particular use case more directly. But this is good feedback, as it denotes areas of interest and JFR can provide the data you are after. The challenge is to craft highly performant and scalable events and introduce them in the proper code locations. It is one of the reasons why domain experts primarily handle them.
> >
> > We need to keep in mind that there exists a fundamental difference here:
> >
> > Using the Security Manager in this way is convenient and obviously powerful but also dangerous. The user must be cautious not to accidentally introduce serialization into the JVM because all requests now traverse arbitrary user code.
> >
> > JFR, on the other hand, is designed not to let this happen, which is one of the main reasons it can maintain its low overhead, high performance, and scalability. One of the trade-offs it sacrifices for this is the convenience and power that comes with synchronous callbacks.
> >
> > If there are important data points that are missing today that would improve overall monitoring aspects, they can be complemented and introduced into the JDK. Perhaps HttpClient can be instrumented with specific events to let you get the URL information you need? Again, this is something that will need to be decided by the domain experts, so perhaps you can add a note to JDK-8265962 if you have specific concerns regarding what data you believe would be helpful.
> >
> > Thank you
> > Markus
> >
> > -----Original Message-----
> > From: hotspot-jfr-dev <hotspot-jfr-dev-retn at openjdk.java.net> On 
> > Behalf Of Lim
> > Sent: den 29 april 2021 12:07
> > To: hotspot-jfr-dev at openjdk.java.net
> > Subject: Using JFR as an alternative of SecurityManager for 
> > monitoring network
> >
> > Hi,
> >
> > Since the SecurityManager will be deprecated in JEP 411, I have been evaluating JFR for my use case which is monitoring libraries in a program for network communications.
> >
> > So I decided to make sample scenario that describes my use case below:
> >
> > Main.java [1];
> > The code have three parts that perform network connections:
> >   Obtain the IP addresses from hostname.
> >   Using the older HttpURLConnection method, searched from the internet.
> >   The new HttpClient method.
> >
> > NetMonitorSM.java [2];
> > Using SecurityManager to monitor Network and file usage (".jfr" file read/write/delete).
> >
> > NetMonitorJFR.java [3];
> > Using JFR to monitor Socket Connection, which is the only way to determine if there are network connections.
> >
> >
> > Below are the logs that I seen using different methods for the monitoring.
> >
> > In SM only [4], I can see exactly what is performed before the action is happen.
> > Note that all sockets connections are logged and with the addition of URL.
> > The SecurityManager has prefixed with "[SM]". Optionally, I can get the stacktrace using SecurityManager.getClassContext() if needed.
> >
> > In JFR Only [5], the first InetAddress connection is not captured by the JFR with SocketRead and SocketWrite event enabled when SM is able to monitor it.
> >
> > On HttpURLConnection - The message is display after getting the reply from the website which is on "[16:00:13.444]". In line 7, the message displayed on [16:00:13.499] but the action has happened on StartTime='16:00:13.058'
> > with "Write Event". Note that there is the continuation of the events until all the bytes has been read/written.
> >
> > On HttpClient - which starts from line 120 "[16:00:15.488]" shows the similarity as the above but the first two events are useless because the "Host", "Address" and "Port" does not record the actual destination.
> > Only after the third event, I can obtain the useful information but at the same time, the StackTrace information become useless since it does not show the originating class.
> >
> > With both SecurityManager and Java Flight Recorder enabled [6], It gives an interesting insight of how SM interacts with JFR.
> > First is the SM logs are shown first before the JFR event shows.
> >
> > Unexpectedly, The SM managed to monitor the *reading and writing file* by the JFR that I *explicitly used streamed version*. Why does JFR in this case create temp files and delete it? Without security manager, I wouldn't have know Streamed JFR write to disk. Shouldn't the stream is only kept in memory?
> >
> >
> > I have several questions about JFR, particularity the streamed 
> > version
> >
> > 1. I not sure why "jdk.jfr.internal.tool.PrettyWriter" is not exposed because I found useful method like "formatMethod" so that I do not need to manually parse the stacktrace in my JFR code [3].
> >
> > 2. Is it an implementation detail that using Streamed JFR create/delete files on disk? Since if the program crash/force terminate, the temp .jfr is not deleted and depending on the events enabled, it can consume a lot of disk space.
> >
> > 3. Currently the streamed event are show *after* the network calls are happened, while the SM is show *before* the network calls took place.
> > Although there is "StartTime" which shows the exact time when the event happened, it is not show orderly in the log [5][6].
> >
> > 4. Is there way to get "before" an actual event is occurred like:
> > rs.beforeEvent("jdk.SocketWrite", System.out::println) so that the log is shown like in the SecurityManager implementation [4].
> >
> > 5. In the HttpClient method of calling network, the first 2 events address and host is not actual destination, and the port seems random.
> > Is this normal because the host is not resolved yet?
> >
> > 6. Will JDK-8265962 - "Evaluate adding Networking JFR events", that was described on another thread [7], will address the InetAddress if implemented?
> >
> > I think that using JFR is a good approach but it is far from usable in my use case, I'm not sure this is the best practice for using JFR in programmatic way and how can it be improved. Or if possible, how do I make it "emulate" the behavior of SM?
> >
> > Thanks
> >
> > [1] 
> > https://urldefense.com/v3/__https://paste.ee/p/vCfZr*section0__;Iw!!
> > GqivPVa7Brio!MQj4KVRRyKxI2YZrqfyFLCaABVGvPrh95H8ICF-GxQuPwcgwlDc-c2s
> > 1RrREQu9fyFm2$ [2] 
> > https://urldefense.com/v3/__https://paste.ee/p/vCfZr*section1__;Iw!!
> > GqivPVa7Brio!MQj4KVRRyKxI2YZrqfyFLCaABVGvPrh95H8ICF-GxQuPwcgwlDc-c2s
> > 1RrREQiYRZFjb$ [3] 
> > https://urldefense.com/v3/__https://paste.ee/p/vCfZr*section2__;Iw!!
> > GqivPVa7Brio!MQj4KVRRyKxI2YZrqfyFLCaABVGvPrh95H8ICF-GxQuPwcgwlDc-c2s
> > 1RrREQs_vhrtP$ [4] 
> > https://urldefense.com/v3/__https://paste.ee/p/Mzczr*section0__;Iw!!
> > GqivPVa7Brio!MQj4KVRRyKxI2YZrqfyFLCaABVGvPrh95H8ICF-GxQuPwcgwlDc-c2s
> > 1RrREQkHSuWcw$ [5] 
> > https://urldefense.com/v3/__https://paste.ee/p/Mzczr*section1__;Iw!!
> > GqivPVa7Brio!MQj4KVRRyKxI2YZrqfyFLCaABVGvPrh95H8ICF-GxQuPwcgwlDc-c2s
> > 1RrREQrtpwipX$ [6] 
> > https://urldefense.com/v3/__https://paste.ee/p/Mzczr*section2__;Iw!!
> > GqivPVa7Brio!MQj4KVRRyKxI2YZrqfyFLCaABVGvPrh95H8ICF-GxQuPwcgwlDc-c2s
> > 1RrREQnx-NQWS$ [7] 
> > https://mail.openjdk.java.net/pipermail/security-dev/2021-April/0256
> > 33.html