RFR(xs): 8080925: Make error log write timeout parameter configurable

Thomas Stüfe thomas.stuefe at gmail.com
Thu Jun 25 07:10:37 UTC 2015


On Thu, Jun 25, 2015 at 8:55 AM, David Holmes <david.holmes at oracle.com>
wrote:

> On 25/06/2015 4:41 PM, Thomas Stüfe wrote:
>
>> Hi Christian, David,
>>
>> yes, I admit you are right. We run similar tests since a long time at
>> SAP, and even with a lot of leeway we run into situations where the VM
>> does not finish in time. Very rarely (maybe once a month) but still.
>> Especially on older hardware, like AS/400. In almost all situations
>> these were false positives, only once this was a legitimate error.
>>
>> My suggestions would be
>>
>> 1) Either I could just not test. The change is arguably easy enough to
>> omit a specific test.
>> 2) Or I could add a change (which would be a good idea anyway) to,
>> before I abort, print out a last line into the hs-err file: "-- Timeout
>> During Error Reporting after xxx seconds --". In the test, instead of
>> measuring time, I would scan for this trailing line in the hs-err file.
>> That would have the additional advantage that we later know why the
>> hs-err file is torn.
>>
>
> You could perhaps augment the existing line:
>
> 1293           os::sleep(this, ErrorLogTimeout * 1000, false);
> 1294           fdStream err(defaultStream::output_fd());
> 1295           err.print_raw_cr("# [ timer expired, abort... ]");
>
>
It is not that easy. That output goes to stderr, not the hs-err file.

hs-err file is written concurrently by the VMError thread. I need to access
the FILE* of the open hs-err file. I also need to interrupt the VMError
thread to prevent garbled output.

There are various ways to do this. In our VM, we signal the error reporter
thread with SIGILL and thus interrupt him - then he himself writes the
"Timeout" line before closing the hs-err file. This would be a bit
over-engineered if it were only to add the last "Timeout" line, but we use
it also to interrupt long-running error reporting Steps, in order to
prevent one slow or hanging error reporting step from suffocating the
succeeding error reporting steps. I wanted to contribute this to the
OpenJDK with a later change, but I do not know if you guys think this as
too complicated.

Thomas


> David
>
>  I also could do (1) now, and (2) in a separate patch - I have vacations
>> coming up in a week and would like to finish this beforehand.
>>
>> Thomas
>>
>>
>> On Thu, Jun 25, 2015 at 7:34 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     Hi Thomas,
>>
>>     On 25/06/2015 2:01 AM, Christian Tornqvist wrote:
>>
>>         Hi Thomas,
>>
>>         I'm concerned with the time measurement. We run a lot of our
>>         tests concurrently on a wide spread of hardware, I wouldn't be
>>         surprised if it sometimes took more than 4s to get the JVM to a
>>         state where the ErrorHandlerTests are invoked.
>>
>>
>>     I share Christian's concern, particularly as you start measuring
>>     before even exec'ing the new JVM:
>>
>>        70     long t1 = System.currentTimeMillis();
>>        71
>>        72     OutputAnalyzer output_detail = new
>> OutputAnalyzer(pb.start());
>>
>>     There's really no reliable way to validate a timeout using an
>>     external observer: you either have so much leeway that you won't
>>     spot an unexpected delay; or too little such that some
>>     platforms/devices don't make it. The framework timeout will at least
>>     detect if the underlying VM execution fails to terminate at all.
>>
>>     Sorry I don't have any better suggestions for testing this.
>>
>>     David
>>     -----
>>
>>
>>
>>         Thanks,
>>         Christian
>>
>>         -----Original Message-----
>>         From: hotspot-runtime-dev
>>         [mailto:hotspot-runtime-dev-bounces at openjdk.java.net
>>         <mailto:hotspot-runtime-dev-bounces at openjdk.java.net>] On Behalf
>>         Of Coleen Phillimore
>>         Sent: Wednesday, June 24, 2015 11:45 AM
>>         To: hotspot-runtime-dev at openjdk.java.net
>>         <mailto:hotspot-runtime-dev at openjdk.java.net>
>>         Subject: Re: RFR(xs): 8080925: Make error log write timeout
>>         parameter configurable
>>
>>
>>         Hi Thomas,
>>
>>         This change looks okay, I guess.  I've been avoiding tests that
>>         crash the JVM because I don't know if this will mess up test
>>         execution in any
>>         way.   Also I guess I don't see any other way to add the test
>>         without a
>>         command line parameter, like TestSafeFetchInErrorHandler , but I
>>         feel
>>         like we're swamped in command line parameters!   But you just
>>         got rid of
>>         one that was broken, so I guess this is fine.
>>
>>         I can sponsor your change if someone from our SQE group okays
>>         the test.
>>
>>         Thanks,
>>         Coleen
>>
>>         On 6/24/15 11:30 AM, Thomas Stüfe wrote:
>>
>>             Hi,
>>
>>             This change is in review limbo since a month...
>>
>>             Could I have a second reviewer, please, and a sponsor?
>>
>>             Or, if the change is not wanted, a reason why?
>>
>>             Thanks a lot,
>>
>>             Thomas
>>
>>
>>             On Mon, Jun 15, 2015 at 2:58 PM, Thomas Stüfe
>>             <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>>
>>             wrote:
>>
>>                 Hi all,
>>
>>                 may I have a second review for this tiny change? I also
>>                 need a sponsor.
>>
>>                 Thanks!
>>
>>                 On Fri, May 22, 2015 at 4:21 PM, Thomas Stüfe
>>                 <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com
>> >>
>>                 wrote:
>>
>>                     Hi Staffan,
>>
>>                     thanks!
>>
>>                     ...Thomas
>>
>>                     On Fri, May 22, 2015 at 1:55 PM, Staffan Larsen <
>>                     staffan.larsen at oracle.com
>>                     <mailto:staffan.larsen at oracle.com>> wrote:
>>
>>                         Looks good to me.
>>
>>                         A small nit in globals.hpp:925: "an timeout” ->
>>                         “a timeout”. No
>>                         need for a new webrev if you fix this.
>>
>>                         Thanks,
>>                         /Staffan
>>
>>                             On 22 maj 2015, at 13:33, Thomas Stüfe
>>                             <thomas.stuefe at gmail.com
>>                             <mailto:thomas.stuefe at gmail.com>>
>>
>>                         wrote:
>>
>>                             Hi,
>>
>>                             please review this small change:
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8080925/webrev.00/webre
>>                             v/
>>
>>
>> https://bugs.openjdk.java.net/browse/JDK-8080925
>>
>>                             This is a small addition which makes the
>>                             timeout to write error
>>                             log
>>
>>                         files
>>
>>                             configurable.
>>
>>                             Thanks & Kind Regards,
>>
>>                             Thomas
>>
>>
>>
>>
>>
>>


More information about the hotspot-runtime-dev mailing list