From martinrb at google.com Tue Aug 13 22:39:06 2019 From: martinrb at google.com (Martin Buchholz) Date: Tue, 13 Aug 2019 15:39:06 -0700 Subject: Still plagued by "Agent communication error" Message-ID: We continue to see rare "Agent communication error" problems when running jtreg tests. We believe something has gone wrong in the JDK under test, but we never get any details. The failure is correlated with running specific tests, and specific JDKs. Anecdotally, it appears to be more common with fastdebug JDKs. A sample snippet: TEST RESULT: Error. Agent communication error: java.net.SocketException: Broken pipe (Write failed); check console log for any additional details -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alan.Bateman at oracle.com Wed Aug 14 06:49:12 2019 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 14 Aug 2019 07:49:12 +0100 Subject: Still plagued by "Agent communication error" In-Reply-To: References: Message-ID: On 13/08/2019 23:39, Martin Buchholz wrote: > We continue to see rare?"Agent communication error" problems when > running jtreg tests. > We believe something has gone wrong in the JDK under test, but we > never get any details. > The failure is correlated with running specific tests, and specific JDKs. > Anecdotally, it appears to be more common with fastdebug JDKs. > > A sample snippet: > > TEST RESULT: Error. Agent communication error: > java.net.SocketException: Broken pipe (Write failed); check console > log for any additional details Are the agent VMs crashing? Maybe the fastdebug builds are hitting asserts earlier than the crash with product bits. Have you looked at hs_err logs or core files on the systems? -Alan From martinrb at google.com Wed Aug 14 07:28:08 2019 From: martinrb at google.com (Martin Buchholz) Date: Wed, 14 Aug 2019 00:28:08 -0700 Subject: Still plagued by "Agent communication error" In-Reply-To: References: Message-ID: On Tue, Aug 13, 2019 at 11:49 PM Alan Bateman wrote: > On 13/08/2019 23:39, Martin Buchholz wrote: > > We continue to see rare "Agent communication error" problems when > > running jtreg tests. > > We believe something has gone wrong in the JDK under test, but we > > never get any details. > > The failure is correlated with running specific tests, and specific JDKs. > > Anecdotally, it appears to be more common with fastdebug JDKs. > > > > A sample snippet: > > > > TEST RESULT: Error. Agent communication error: > > java.net.SocketException: Broken pipe (Write failed); check console > > log for any additional details > Are the agent VMs crashing? Maybe the fastdebug builds are hitting > asserts earlier than the crash with product bits. Have you looked at > hs_err logs or core files on the systems? > It's not so easy for us to get hs_err log files; we only get jtreg stdout/stderr. While we could/should improve our infrastructure ... jtreg provides helpful diagnostics in other cases, e.g. thread dump on test timeout, so it would be good to be helpful even when communication with the agent breaks down. Maybe jtreg should use one of those VM flags to get agent VMs to send failure data to stderr? -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Wed Aug 14 13:57:39 2019 From: martinrb at google.com (Martin Buchholz) Date: Wed, 14 Aug 2019 06:57:39 -0700 Subject: Still plagued by "Agent communication error" In-Reply-To: References: Message-ID: Here's another way to look at it: We have a failure to communicate with an agent process. That's probably a subprocess. It probably died with some serious error, a hotspot crash or OOM. Probably the agent process printed something helpful to stderr before it terminated. What happened to that output? The failure is probably related to whatever test it was supposed to be running at the time, so I'd want that information in e.g. the jtr file. On Wed, Aug 14, 2019 at 12:28 AM Martin Buchholz wrote: > > > On Tue, Aug 13, 2019 at 11:49 PM Alan Bateman > wrote: > >> On 13/08/2019 23:39, Martin Buchholz wrote: >> > We continue to see rare "Agent communication error" problems when >> > running jtreg tests. >> > We believe something has gone wrong in the JDK under test, but we >> > never get any details. >> > The failure is correlated with running specific tests, and specific >> JDKs. >> > Anecdotally, it appears to be more common with fastdebug JDKs. >> > >> > A sample snippet: >> > >> > TEST RESULT: Error. Agent communication error: >> > java.net.SocketException: Broken pipe (Write failed); check console >> > log for any additional details >> Are the agent VMs crashing? Maybe the fastdebug builds are hitting >> asserts earlier than the crash with product bits. Have you looked at >> hs_err logs or core files on the systems? >> > > It's not so easy for us to get hs_err log files; we only get jtreg > stdout/stderr. > While we could/should improve our infrastructure ... > jtreg provides helpful diagnostics in other cases, e.g. thread dump on > test timeout, so it would be good to be helpful even when communication > with the agent breaks down. Maybe jtreg should use one of those VM flags > to get agent VMs to send failure data to stderr? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: