RFR: 8310996: Add JFR event for connect operations
Adds a JFR event for socket connect operations. Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events. ------------- Commit messages: - fix default settings - fix merge - Merge branch 'master' into JDK-8310996 - added tests and support for sockets. - 8310996: Add JFR event for connect operations Changes: https://git.openjdk.org/jdk/pull/21528/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310996 Stats: 247 lines in 12 files changed: 236 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21528.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21528/head:pull/21528 PR: https://git.openjdk.org/jdk/pull/21528
On Wed, 16 Oct 2024 01:19:15 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
src/java.base/share/classes/sun/nio/ch/NioSocketImpl.java line 624:
622: SocketConnectEvent.commit(start, duration, isa.getHostString(), address.getHostAddress(), port, connected); 623: } 624: }
Would it be possible to update the JBS or PR description to indicate if the intent is to record an event when the connection cannot be established? I'm asking the change will only record an event when a connection is successfully established ("connected" is always true here). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1802451480
On Wed, 16 Oct 2024 06:40:33 GMT, Alan Bateman <alanb@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
src/java.base/share/classes/sun/nio/ch/NioSocketImpl.java line 624:
622: SocketConnectEvent.commit(start, duration, isa.getHostString(), address.getHostAddress(), port, connected); 623: } 624: }
Would it be possible to update the JBS or PR description to indicate if the intent is to record an event when the connection cannot be established? I'm asking because the change will only record an event when a connection is successfully established ("connected" is always true here).
JFR will record exceptions already of course but I think for troubleshooting purposes, recording an event when "connect" hangs and eventually fails is very useful to have.
Capturing all calls even if they threw an exception does seem pretty useful. I'll update the JBS ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1807983535
On Wed, 16 Oct 2024 01:19:15 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
src/java.base/share/classes/sun/nio/ch/SocketChannelImpl.java line 1006:
1004: boolean connected = implConnect(sa); 1005: SocketConnectEvent.offer(start, connected, sa); 1006: return connected;
It would be useful if the JBS or PR could say what the intent it for SocketChannels that are configured non-blocking. I assume that implConnect will execute in <20ms (the threshold in the JFR config) so no event will be ever be recorded when configured non-blocking. It would be possible to save the timestamp when connecting, and then use in finishConnect but that does depend on timely usage. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1802458279
On Wed, 16 Oct 2024 06:46:54 GMT, Alan Bateman <alanb@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
src/java.base/share/classes/sun/nio/ch/SocketChannelImpl.java line 1006:
1004: boolean connected = implConnect(sa); 1005: SocketConnectEvent.offer(start, connected, sa); 1006: return connected;
It would be useful if the JBS or PR could say what the intent it for SocketChannels that are configured non-blocking. I assume that implConnect will execute in <20ms (the threshold in the JFR config) so no event will be ever be recorded when configured non-blocking. It would be possible to save the timestamp when connecting, and then use in finishConnect but that does depend on timely usage.
The non-blocking case is poorly handled. I'll update the JBS. The untimely use of finishConnect would cause an artificially bad looking event duration which might be a bit misleading. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1807984988
On Mon, 21 Oct 2024 00:39:27 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
The non-blocking case is poorly handled. I'll update the JBS. The untimely use of finishConnect would cause an artificially bad looking event duration which might be a bit misleading.
My preference for now would be to focus on the blocking case. Why did my application pause for 30s? The other part to this will be the lookup methods in InetAddress as sometimes the apparent hang isn't establishing the connection but the DNS lookup. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1808149416
On Wed, 16 Oct 2024 01:19:15 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
I think this change should have a release note. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21528#issuecomment-2416674860
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision: Added support for connection failure and non-blocking connections. If an exception is thrown while attempting a connection, the message from the exception is stored in the event. The start time of the initial connect call is stored and used when a finishConnect call is successful or an exception is thrown. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21528/files - new: https://git.openjdk.org/jdk/pull/21528/files/05227c9d..7113bd75 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=00-01 Stats: 91 lines in 4 files changed: 44 ins; 14 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/21528.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21528/head:pull/21528 PR: https://git.openjdk.org/jdk/pull/21528
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision: improved exception names and handling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21528/files - new: https://git.openjdk.org/jdk/pull/21528/files/7113bd75..a8898ffc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=01-02 Stats: 33 lines in 3 files changed: 9 ins; 8 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/21528.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21528/head:pull/21528 PR: https://git.openjdk.org/jdk/pull/21528
On Tue, 5 Nov 2024 16:48:14 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
improved exception names and handling
Discussed with Tim as there are a number of issues that will need attention. A SocketConnectEvent should only be "offered" for connect events, not cases such as "already connected" or "socket closed". For SocketChannel there is also a socket adaptor (blockingConnect method) that will need to be updated. The non-blocking connect/finishConnect is complicated and there are several issues there. Finally, remoteAddress requires stateLock. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21528#issuecomment-2457759513
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision: suggested changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21528/files - new: https://git.openjdk.org/jdk/pull/21528/files/a8898ffc..ce9d39e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=02-03 Stats: 76 lines in 2 files changed: 49 ins; 15 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21528.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21528/head:pull/21528 PR: https://git.openjdk.org/jdk/pull/21528
On Wed, 6 Nov 2024 00:15:44 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
suggested changes
src/jdk.jfr/share/classes/jdk/jfr/internal/JDKEvents.java line 2:
1: /* 2: * Copyright (c) 2016, 2023, Oracle and/or its affiliates. All rights reserved.
I assume you didn't mean to change that. src/jdk.jfr/share/conf/jfr/profile.jfc line 741:
739: <setting name="stackTrace">true</setting> 740: <setting name="threshold" control="socket-threshold">10 ms</setting> 741: </event>
In default.jfr the threshold for the socket events is 20ms, but 10ms in profile.jfc. Is that intentional? test/jdk/jdk/jfr/event/io/TestSocketChannelEvents.java line 106:
104: 105: try (SocketChannel sc = SocketChannel.open(ssc.getLocalAddress())) { 106: addExpectedEvent(IOEvent.createSocketConnectEvent(sc.socket()));
This is SocketChannel in blocking mode where the connect succeeds. There is also the non-blocking and where connect fails. In addition the connection can established with the socket adaptor. So 6 possible cases for SocketChannel if the test is expanded. test/jdk/jdk/jfr/event/io/TestSocketEvents.java line 108:
106: try (Socket s = new Socket()) { 107: s.connect(ss.getLocalSocketAddress()); 108: addExpectedEvent(IOEvent.createSocketConnectEvent(s));
This is Socket.connect success case, I assume we'll need a test for fail case too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1830512546 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1830516492 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1830545551 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1830537634
On Wed, 6 Nov 2024 07:31:37 GMT, Alan Bateman <alanb@openjdk.org> wrote:
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
suggested changes
src/jdk.jfr/share/classes/jdk/jfr/internal/JDKEvents.java line 2:
1: /* 2: * Copyright (c) 2016, 2023, Oracle and/or its affiliates. All rights reserved.
I assume you didn't mean to change that.
Not sure how that happened, but I'll fix it
test/jdk/jdk/jfr/event/io/TestSocketChannelEvents.java line 106:
104: 105: try (SocketChannel sc = SocketChannel.open(ssc.getLocalAddress())) { 106: addExpectedEvent(IOEvent.createSocketConnectEvent(sc.socket()));
This is SocketChannel in blocking mode where the connect succeeds. There is also the non-blocking and where connect fails. In addition the connection can established with the socket adaptor. So 6 possible cases for SocketChannel if the test is expanded.
yes, testing still to be done
test/jdk/jdk/jfr/event/io/TestSocketEvents.java line 108:
106: try (Socket s = new Socket()) { 107: s.connect(ss.getLocalSocketAddress()); 108: addExpectedEvent(IOEvent.createSocketConnectEvent(s));
This is Socket.connect success case, I assume we'll need a test for fail case too.
yes ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1831527507 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1831529879 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1831529362
On Wed, 6 Nov 2024 00:15:44 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
suggested changes
The update in ce9d39e2 has the changes that I discussed with Tim yesterday. Specifically, it fixes several issues with the non-blocking case, only records an event if the connect method actually attempts to establish a connection, and fixes the socket adaptor. So I think this to NioSocketImpl and SocketChannelImpl are good now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21528#issuecomment-2458949631
On Wed, 6 Nov 2024 00:15:44 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
suggested changes
The code changes look reasonable to me. I had to re-read the documentation of `finishConnect` to get to that point. I mistakenly thought that this method was reserved for the fully non blocking case. I hadn't realised you could start connect() in non blocking mode and finish it in blocking mode... Thanks @AlanBateman for the thorough review of all these odd cases. ------------- PR Review: https://git.openjdk.org/jdk/pull/21528#pullrequestreview-2420545416
On Wed, 6 Nov 2024 00:15:44 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
suggested changes
@tprinzing Are you planning on all the tests? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21528#issuecomment-2473184324
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision: Added more tests for socket connect events. - SocketAdapter connect - SocketAdapter connect with exception - Socket connect with exception - SocketChannel connect with exception - SocketChannel non-blocking connect - SocketChannel non-blocking connect with exception ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21528/files - new: https://git.openjdk.org/jdk/pull/21528/files/ce9d39e2..13f81570 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=03-04 Stats: 177 lines in 5 files changed: 162 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/21528.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21528/head:pull/21528 PR: https://git.openjdk.org/jdk/pull/21528
On Fri, 22 Nov 2024 20:32:50 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
Added more tests for socket connect events.
- SocketAdapter connect - SocketAdapter connect with exception - Socket connect with exception - SocketChannel connect with exception - SocketChannel non-blocking connect - SocketChannel non-blocking connect with exception
Thanks for the update, it's good to have tests for the 8 possible cases that might record a SocketConnect event. If you have the energy for it, we could also add tests to check that a SocketConnect is not recorded for cases where the connect method fails for other reasons, e.g. already connected, Socket closed, ... but only if you want. src/java.base/share/classes/jdk/internal/event/SocketConnectEvent.java line 37:
35: * {@code jdk.jfr.events.SocketConnectEvent } where the metadata for the event is 36: * provided with annotations. Some of the methods are replaced by generated 37: * methods when jfr is enabled. Note that the order of the arguments of the
In passing, there is a mix of "JFR" and "jfr" through-out, I assume you want "JFR" everywhere. src/java.base/share/classes/jdk/internal/event/SocketConnectEvent.java line 51:
49: 50: /** 51: * Actually commit an event. The implementation is generated automatically.
I assume "Actually" should be removed, this method commits an event. src/jdk.jfr/share/classes/jdk/jfr/events/SocketConnectEvent.java line 38:
36: @Label("Socket Connect") 37: @Category("Java Application") 38: @Description("Connecting a socket")
I wonder if we can improve on this description. The event is recorded when a connection is established or cannot be established so it's more like "Socket Connection". test/jdk/jdk/jfr/event/io/IOHelper.java line 135:
133: } 134: 135: public static void checkConnectionEventException(RecordedEvent event, IOException ioe) {
I assume this should checkConnectEventException. test/jdk/jdk/jfr/event/io/TestSocketChannelEvents.java line 157:
155: while (! sc.finishConnect()) { 156: Thread.sleep(1); 157: }
The connect method returns true if the connection is established, you should only need to poll finishConnect if the connection is not established immediately. Using a sleep is okay here (although 1ms is very low) and I'm guessing you've done this to avoid using a Selector. test/jdk/jdk/jfr/event/io/TestSocketChannelEvents.java line 216:
214: } catch (IOException ioe) { 215: // we expect this 216: connectException = ioe;
I think you'll need to adjust the try-catch to only set connectException if the connect or finishConnect methods fails. If the open or configureBlocking fails then connectException should not be set, instead the test should fail. ------------- PR Review: https://git.openjdk.org/jdk/pull/21528#pullrequestreview-2456365810 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1855145320 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1855145369 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1855145935 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1855146240 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1855150781 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1855151098
On Sat, 23 Nov 2024 08:02:42 GMT, Alan Bateman <alanb@openjdk.org> wrote:
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
Added more tests for socket connect events.
- SocketAdapter connect - SocketAdapter connect with exception - Socket connect with exception - SocketChannel connect with exception - SocketChannel non-blocking connect - SocketChannel non-blocking connect with exception
src/jdk.jfr/share/classes/jdk/jfr/events/SocketConnectEvent.java line 38:
36: @Label("Socket Connect") 37: @Category("Java Application") 38: @Description("Connecting a socket")
I wonder if we can improve on this description. The event is recorded when a connection is established or cannot be established so it's more like "Socket Connection".
As I understand it, the event succeeds if exceptionMessage is null. Perhaps this should be made more explicit with a failed field. Alternatively, there could be two events: one for success and one for failure. What is the typical duration of a failed event? If it is above 10-20 ms, two events might not be as useful since all failures will be recorded anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1855573527
On Sun, 24 Nov 2024 23:15:02 GMT, Erik Gahlin <egahlin@openjdk.org> wrote:
Perhaps this should be made more explicit with a failed field. Alternatively, there could be two events: one for success and one for failure. What is the typical duration of a failed event? If it is above 10-20 ms, two events might not be as useful since all failures will be recorded anyway.
If a connection cannot be established then it might be immediate, 10s of milliseconds, maybe 60+ seconds in some cases. A slow down or stall waiting for a connection to be established seems a useful event to have recorded. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1856011185
On Mon, 25 Nov 2024 07:54:34 GMT, Alan Bateman <alanb@openjdk.org> wrote:
If a connection cannot be established then it might be immediate, 10s of milliseconds, maybe 60+ seconds in some cases. A slow down or stall waiting for a connection to be established seems a useful event to have recorded.
If it's immediate, a potential Socket Connection Failure event could overflow the buffers and we can't have it with threshold = 0s. Otherwise, it might be interesting to have something like: `$ jfr view socket-connection-failures recording.jfr` to see a complete list of failures per host/port and then have: `$ jfr view slow-socket-connections recording.jfr` to find which ones are slow, i.e. more than 10-20 ms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1856393748
On Mon, 25 Nov 2024 10:59:35 GMT, Erik Gahlin <egahlin@openjdk.org> wrote:
Perhaps this should be made more explicit with a failed field. Alternatively, there could be two events: one for success and one for failure. What is the typical duration of a failed event? If it is above 10-20 ms, two events might not be as useful since all failures will be recorded anyway.
If a connection cannot be established then it might be immediate, 10s of milliseconds, maybe 60+ seconds in some cases. A slow down or stall waiting for a connection to be established seems a useful event to have recorded.
If a connection cannot be established then it might be immediate, 10s of milliseconds, maybe 60+ seconds in some cases. A slow down or stall waiting for a connection to be established seems a useful event to have recorded.
If it's immediate, a potential Socket Connection Failure event could overflow the buffers and we can't have it with threshold = 0s. Otherwise, it might be interesting to have something like:
`$ jfr view socket-connection-failures recording.jfr`
to see a complete list of failures per host/port and then have:
`$ jfr view slow-socket-connections recording.jfr`
to find which ones are slow, i.e. more than 10-20 ms.
Having a view for connect failures that doesn't require exceptions=all could be useful. Does this mean two events, one an instant event for the failures, the other a duration event for the connections that are slow to establish? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1856496052
On Mon, 25 Nov 2024 12:00:52 GMT, Alan Bateman <alanb@openjdk.org> wrote:
If a connection cannot be established then it might be immediate, 10s of milliseconds, maybe 60+ seconds in some cases. A slow down or stall waiting for a connection to be established seems a useful event to have recorded.
If it's immediate, a potential Socket Connection Failure event could overflow the buffers and we can't have it with threshold = 0s. Otherwise, it might be interesting to have something like:
`$ jfr view socket-connection-failures recording.jfr`
to see a complete list of failures per host/port and then have:
`$ jfr view slow-socket-connections recording.jfr`
to find which ones are slow, i.e. more than 10-20 ms.
Having a view for connect failures that doesn't require exceptions=all could be useful. Does this mean two events, one an instant event for the failures, the other a duration event for the connections that are slow to establish?
A connection failure introduces a latency in the application, so probably best to have such an event durational as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1856610190
On Mon, 25 Nov 2024 13:23:23 GMT, Erik Gahlin <egahlin@openjdk.org> wrote:
Having a view for connect failures that doesn't require exceptions=all could be useful. Does this mean two events, one an instant event for the failures, the other a duration event for the connections that are slow to establish?
A connection failure introduces a latency in the application, so probably best to have such an event durational as well.
@egahlin The updated PR proposes two duration events: jdk.SocketConnect for when a connection is established, and jdk.SocketConnectFailed for when a connection cannot be established. Naming aside, it seems that would allow the jfr views that you listed above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1867647214
On Tue, 3 Dec 2024 12:34:20 GMT, Alan Bateman <alanb@openjdk.org> wrote:
A connection failure introduces a latency in the application, so probably best to have such an event durational as well.
@egahlin The updated PR proposes two duration events: jdk.SocketConnect for when a connection is established, and jdk.SocketConnectFailed for when a connection cannot be established. Naming aside, it seems that would allow the jfr views that you listed above.
We could have two views with only one event. The query for the view could filter for exceptionMessage != null or a failure property. The advantage of having two events is that the failure event could have a threshold of 0 ms. We are planning to add a throttling mechanism for exception events, perhaps per call site. The same mechanism could be used for a failed event. If you receive 500 events per second for a call site, there is little value in having additional events. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1868506166
On Tue, 3 Dec 2024 23:42:34 GMT, Erik Gahlin <egahlin@openjdk.org> wrote:
@egahlin The updated PR proposes two duration events: jdk.SocketConnect for when a connection is established, and jdk.SocketConnectFailed for when a connection cannot be established. Naming aside, it seems that would allow the jfr views that you listed above.
We could have two views with only one event. The query for the view could filter for exceptionMessage != null or a failure property. The advantage of having two events is that the failure event could have a threshold of 0 ms.
We are planning to add a throttling mechanism for exception events, perhaps per call site. The same mechanism could be used for a failed event. If you receive 500 events per second for a call site, there is little value in having additional events.
We need to help Tim on the question of whether there is one or two events. An application that makes outbound network connections may run slowly for several reasons. A duration event may help to diagnose this, irrespective of whether the connection is established successfully or it fails, so one event is okay. Separately, another big source of latency is the name service / DNS lookup which happens before getting to the Socket connect. Maybe further work could add events to InetAddress for this. When hunting misbehaving behavior then focusing on the cases where a connection cannot be established may be more interesting. So it's possible someone might want to run with threshold=0 to see all failed events. If there is throttling support, and since we control call site for both the successful and failed cases, could we live with one event? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1869400913
On Wed, 4 Dec 2024 12:26:20 GMT, Alan Bateman <alanb@openjdk.org> wrote:
We could have two views with only one event. The query for the view could filter for exceptionMessage != null or a failure property. The advantage of having two events is that the failure event could have a threshold of 0 ms.
We are planning to add a throttling mechanism for exception events, perhaps per call site. The same mechanism could be used for a failed event. If you receive 500 events per second for a call site, there is little value in having additional events.
We need to help Tim on the question of whether there is one or two events.
An application that makes outbound network connections may run slowly for several reasons. A duration event may help to diagnose this, irrespective of whether the connection is established successfully or it fails, so one event is okay. Separately, another big source of latency is the name service / DNS lookup which happens before getting to the Socket connect. Maybe further work could add events to InetAddress for this.
When hunting misbehaving behavior then focusing on the cases where a connection cannot be established may be more interesting. So it's possible someone might want to run with threshold=0 to see all failed events. If there is throttling support, and since we control call site for both the successful and failed cases, could we live with one event?
I'm not sure if one or two events are most suitable. If possible, I would like to discuss it with Markus to get some more input. He will back in January. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1881347326
On Thu, 12 Dec 2024 03:58:29 GMT, Erik Gahlin <egahlin@openjdk.org> wrote:
We need to help Tim on the question of whether there is one or two events.
An application that makes outbound network connections may run slowly for several reasons. A duration event may help to diagnose this, irrespective of whether the connection is established successfully or it fails, so one event is okay. Separately, another big source of latency is the name service / DNS lookup which happens before getting to the Socket connect. Maybe further work could add events to InetAddress for this.
When hunting misbehaving behavior then focusing on the cases where a connection cannot be established may be more interesting. So it's possible someone might want to run with threshold=0 to see all failed events. If there is throttling support, and since we control call site for both the successful and failed cases, could we live with one event?
I'm not sure if one or two events are most suitable. If possible, I would like to discuss it with Markus to get some more input. He will back in January.
Regarding, one or two events. I'm fine with integrating as-is and then revisit before FC, if needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1893019773
On Thu, 19 Dec 2024 19:16:55 GMT, Erik Gahlin <egahlin@openjdk.org> wrote:
I'm not sure if one or two events are most suitable. If possible, I would like to discuss it with Markus to get some more input. He will back in January.
Regarding one or two events. I'm fine with integrating as-is and then revisit it before FC, if needed.
I talked to Markus, and we think two events would provide the greatest flexibility for future settings and events. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1922757394
On Sat, 23 Nov 2024 08:36:03 GMT, Alan Bateman <alanb@openjdk.org> wrote:
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
Added more tests for socket connect events.
- SocketAdapter connect - SocketAdapter connect with exception - Socket connect with exception - SocketChannel connect with exception - SocketChannel non-blocking connect - SocketChannel non-blocking connect with exception
test/jdk/jdk/jfr/event/io/TestSocketChannelEvents.java line 157:
155: while (! sc.finishConnect()) { 156: Thread.sleep(1); 157: }
The connect method returns true if the connection is established, you should only need to poll finishConnect if the connection is not established immediately. Using a sleep is okay here (although 1ms is very low) and I'm guessing you've done this to avoid using a Selector.
It should be OK to sleep for longer if you don't want to use a selector.
test/jdk/jdk/jfr/event/io/TestSocketChannelEvents.java line 216:
214: } catch (IOException ioe) { 215: // we expect this 216: connectException = ioe;
I think you'll need to adjust the try-catch to only set connectException if the connect or finishConnect methods fails. If the open or configureBlocking fails then connectException should not be set, instead the test should fail.
In addition there's no guarantee that connect will fail - so the test should guard for the case where connect might succeed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1867981816 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1868041972
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision: split socket connect failure out to its own event. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21528/files - new: https://git.openjdk.org/jdk/pull/21528/files/13f81570..f7b3be00 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=04-05 Stats: 327 lines in 16 files changed: 240 ins; 21 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/21528.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21528/head:pull/21528 PR: https://git.openjdk.org/jdk/pull/21528
On Tue, 3 Dec 2024 00:40:24 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
split socket connect failure out to its own event.
Would it be possible to sync up the branch from master as it hasn't been sync'ed up since Oct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21528#issuecomment-2513817462
On Tue, 3 Dec 2024 00:40:24 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
split socket connect failure out to its own event.
src/java.base/share/classes/jdk/internal/event/SocketConnectFailedEvent.java line 145:
143: if (shouldCommit(duration)) { 144: commit(start, duration, host, address.getHostAddress(), port, connectEx.getMessage()); 145: }
Would it be better to pass `connectEx.toString()` here to capture the type of the exception and avoid the awkward case where the message could be the empty string or null? Same in the other offer() method above... src/jdk.jfr/share/classes/jdk/jfr/events/SocketConnectFailedEvent.java line 52:
50: @Label("Connect Exception Message") 51: public String connectExceptionMessage; 52: }
There seem to be a missing newline at the end of this file src/jdk.jfr/share/conf/jfr/default.jfc line 746:
744: <setting name="enabled">true</setting> 745: <setting name="stackTrace">true</setting> 746: <setting name="threshold" control="socket-threshold">20 ms</setting>
Does this event needs a threshold? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1867888476 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1867909881 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1867912497
On Tue, 3 Dec 2024 15:12:10 GMT, Daniel Fuchs <dfuchs@openjdk.org> wrote:
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
split socket connect failure out to its own event.
src/jdk.jfr/share/conf/jfr/default.jfc line 746:
744: <setting name="enabled">true</setting> 745: <setting name="stackTrace">true</setting> 746: <setting name="threshold" control="socket-threshold">20 ms</setting>
Does this event needs a threshold?
I think it needs to be a duration event, otherwise it's possible to a floor events when connections fail immediately. With the current proposal then it is at least configurable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1868102788
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'master' into JDK-8310996 # Conflicts: # src/java.base/share/classes/sun/nio/ch/NioSocketImpl.java - split socket connect failure out to its own event. - Added more tests for socket connect events. - SocketAdapter connect - SocketAdapter connect with exception - Socket connect with exception - SocketChannel connect with exception - SocketChannel non-blocking connect - SocketChannel non-blocking connect with exception - suggested changes - improved exception names and handling - Added support for connection failure and non-blocking connections. If an exception is thrown while attempting a connection, the message from the exception is stored in the event. The start time of the initial connect call is stored and used when a finishConnect call is successful or an exception is thrown. - fix default settings - fix merge - Merge branch 'master' into JDK-8310996 # Conflicts: # src/jdk.jfr/share/classes/jdk/jfr/internal/JDKEvents.java # src/jdk.jfr/share/classes/jdk/jfr/internal/MirrorEvents.java - added tests and support for sockets. - ... and 1 more: https://git.openjdk.org/jdk/compare/dfa5620f...a379609e ------------- Changes: https://git.openjdk.org/jdk/pull/21528/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=06 Stats: 738 lines in 16 files changed: 686 ins; 7 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/21528.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21528/head:pull/21528 PR: https://git.openjdk.org/jdk/pull/21528
On Tue, 3 Dec 2024 15:40:02 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits:
- Merge branch 'master' into JDK-8310996
# Conflicts: # src/java.base/share/classes/sun/nio/ch/NioSocketImpl.java - split socket connect failure out to its own event. - Added more tests for socket connect events.
- SocketAdapter connect - SocketAdapter connect with exception - Socket connect with exception - SocketChannel connect with exception - SocketChannel non-blocking connect - SocketChannel non-blocking connect with exception - suggested changes - improved exception names and handling - Added support for connection failure and non-blocking connections.
If an exception is thrown while attempting a connection, the message from the exception is stored in the event. The start time of the initial connect call is stored and used when a finishConnect call is successful or an exception is thrown. - fix default settings - fix merge - Merge branch 'master' into JDK-8310996
# Conflicts: # src/jdk.jfr/share/classes/jdk/jfr/internal/JDKEvents.java # src/jdk.jfr/share/classes/jdk/jfr/internal/MirrorEvents.java - added tests and support for sockets. - ... and 1 more: https://git.openjdk.org/jdk/compare/dfa5620f...a379609e
test/jdk/jdk/jfr/event/io/TestSocketAdapterEvents.java line 149:
147: } catch (IOException ioe) { 148: // we expect this 149: connectException = ioe;
Though unlikely the connect() call could succeed if another (test) process managed to rebind and listen to the port that our server socket just freed. An alternative could be to try to connect to a TCP port reserved by IANA, such as ports 225-241 for instance. Hopefully there should be nothing listening at these ports. Another possibility would be to restart the whole thing untill connect actually fails. You could try to find a refusing port that will generate an exception using something similar to https://github.com/openjdk/jdk/blob/3eaa7615cd7dc67eb78fb0a8f89d4e6662a0db37... Note that I had some trouble with that on windows (attempting to connect to port 47 was using up the whole connect timeout on windows) so you might need to experiment a bit with different reserved ports if you want to go down this route. In anycase, the case where connect() unexpectedly succeed should be handled, for instance by throwing a `jtreg.SkippedException`, or by simply printing a warning and returning, or by retrying with another port until an exception is obtained. test/jdk/jdk/jfr/event/io/TestSocketChannelEvents.java line 185:
183: // we expect this 184: connectException = ioe; 185: }
Same remark as previouly: it's not guaranteed that connect() will not succeed. Also I notice that we're using main/othervm here - and this method is not last one called in main() - so we might not want to throw SkippedException here. Possibly printing a warning and returning would be a better option. Or try with another port... test/jdk/jdk/jfr/event/io/TestSocketEvents.java line 146:
144: // we expect this 145: connectException = ioe; 146: }
Same remark as for the other test: exception is not a guaranteed outcome. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1867974742 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1867983998 PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1868044001
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision: requests fixes - Use IOException.toString() instead of getMessage() in case it's empty - Attempts to test connect exceptions may fail due to unexpected successful connect. Tests quit with uncompleted status if the connect is successful and are retried a small number of times until the test can be performed properly. If the retries are exceeded an exception is generated indicating the test can't be setup properly. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21528/files - new: https://git.openjdk.org/jdk/pull/21528/files/a379609e..5e55ffc6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=06-07 Stats: 62 lines in 6 files changed: 43 ins; 3 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/21528.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21528/head:pull/21528 PR: https://git.openjdk.org/jdk/pull/21528
On Tue, 10 Dec 2024 21:41:18 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
requests fixes
- Use IOException.toString() instead of getMessage() in case it's empty - Attempts to test connect exceptions may fail due to unexpected successful connect. Tests quit with uncompleted status if the connect is successful and are retried a small number of times until the test can be performed properly. If the retries are exceeded an exception is generated indicating the test can't be setup properly.
test/jdk/jdk/jfr/event/io/TestSocketAdapterEvents.java line 154:
152: s.connect(addr); 153: // unexpected, abandon the test 154: return false;
The main issue with using the ephemeral port range is that you might manage to connect to a server opened by another test, and that might cause the other test to fail if it's not expecting connections to get closed. If instead you use ports in the IANA reserved port range - at least you know that you won't connect to other tests running on the same machine. Have you tried to connect to port 225 for instance, and increase the port number up to 241 in case you still manage to connect? Ports 225-241 are reserved by IANA - so there should be nobody listening there. I had some trouble on windows 2016 with port 47, but hopefully ports 225-241 will not have the same issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1881736555
On Thu, 12 Dec 2024 09:48:45 GMT, Daniel Fuchs <dfuchs@openjdk.org> wrote:
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
requests fixes
- Use IOException.toString() instead of getMessage() in case it's empty - Attempts to test connect exceptions may fail due to unexpected successful connect. Tests quit with uncompleted status if the connect is successful and are retried a small number of times until the test can be performed properly. If the retries are exceeded an exception is generated indicating the test can't be setup properly.
test/jdk/jdk/jfr/event/io/TestSocketAdapterEvents.java line 154:
152: s.connect(addr); 153: // unexpected, abandon the test 154: return false;
The main issue with using the ephemeral port range is that you might manage to connect to a server opened by another test, and that might cause the other test to fail if it's not expecting connections to get closed.
If instead you use ports in the IANA reserved port range - at least you know that you won't connect to other tests running on the same machine.
Have you tried to connect to port 225 for instance, and increase the port number up to 241 in case you still manage to connect?
Ports 225-241 are reserved by IANA - so there should be nobody listening there. I had some trouble on windows 2016 with port 47, but hopefully ports 225-241 will not have the same issue.
I've changed the exception tests to use the port range you suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1887538817
On Tue, 10 Dec 2024 21:41:18 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
requests fixes
- Use IOException.toString() instead of getMessage() in case it's empty - Attempts to test connect exceptions may fail due to unexpected successful connect. Tests quit with uncompleted status if the connect is successful and are retried a small number of times until the test can be performed properly. If the retries are exceeded an exception is generated indicating the test can't be setup properly.
src/java.base/share/classes/jdk/internal/event/SocketConnectFailedEvent.java line 117:
115: long duration = timestamp() - start; 116: if (shouldCommit(duration)) { 117: String msg = connectEx.toString();
Should the description for this field in the event be updated too? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21528#discussion_r1881757052
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision: Refactored connect exception tests to a single IOHelper method that takes a function argument that is excpected to produce the appropriate exception. IANA reserved ports in the range 225 to 241 are used to attempt to produce a connect exception. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21528/files - new: https://git.openjdk.org/jdk/pull/21528/files/5e55ffc6..81cea489 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21528&range=07-08 Stats: 206 lines in 4 files changed: 54 ins; 114 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/21528.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21528/head:pull/21528 PR: https://git.openjdk.org/jdk/pull/21528
On Mon, 16 Dec 2024 20:24:20 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
Tim Prinzing has updated the pull request incrementally with one additional commit since the last revision:
Refactored connect exception tests to a single IOHelper method that takes a function argument that is excpected to produce the appropriate exception. IANA reserved ports in the range 225 to 241 are used to attempt to produce a connect exception.
The test updates LGTM ------------- PR Comment: https://git.openjdk.org/jdk/pull/21528#issuecomment-2551514817
On Wed, 16 Oct 2024 01:19:15 GMT, Tim Prinzing <tprinzing@openjdk.org> wrote:
Adds a JFR event for socket connect operations.
Existing tests TestSocketEvents and TestSocketChannelEvents modified to also check for connect events.
This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21528
participants (6)
-
Alan Bateman
-
Daniel Fuchs
-
duke
-
Erik Gahlin
-
Sean Mullan
-
Tim Prinzing