RFR(S): 8074354: tests that intentionally crash the VM can timeout creating a core file
Yumin Qi
yumin.qi at oracle.com
Wed Apr 1 16:57:17 UTC 2015
Hi, Dan and Thomas,
I have a new webrev at
http://cr.openjdk.java.net/~minqi/8074354/webrev06/
What changes:
1) os_windows.cpp, os::abort(bool ....):
a) Since when abort() is called, the last function called
before exit process, VMError::log already closed, which is the file
logging. defaultStream still available to use, so use it for writing
error or success messages. VMError::out is the defaultStream, for using
it, added a function out_stream() in VMError.
b) Delete the codes which decide to write core based on
client/server version with debug. The new code is if
CreateCoredumpOnCrash set or dump_core is on, the coredump will be tried
no matter which version it is. There is no special reason not to dump
core with client version I think. Any concern for this change?
c) Use of 'goto' in this function. I tried not to use it, but
using it makes code clear. I remember somewhere suggest not using 'goto'
in cplusplus.
2) changed comments to make them consistent as indicated by Dan.
3) Added/modified head version for files.
Tests: JPRT, manual test on local Windows.
Thanks
Yumin
On 3/31/2015 1:25 AM, Thomas Stüfe wrote:
> Hi Daniel, Yumin,
>
> sorry, a lot of Yumins changes were based on suggestions from me, so
> here more background:
>
> Yumin reversed the order of error log writing and core dumping. The
> reason for that are explained in detail at the start of the mail
> thread, in short:
>
> - on Windows core file dumping may hang - its rare but it happens -
> preventing error logs. Depending on who you ask, error logs are more
> important than minidumps, so it makes sense to first write the erorr log.
> - This also brings windows core paths closer to UNIX code paths. See
> below.
>
> About writing to stderr in os::abort():
>
> After this change, calling VmError::report_coredump_status() in
> os::abort() will not do anything because the error log is already
> written at that point. I suggested to Yumin writing to stderr instead
> in os::abort() because that mimicks UNIX behaviour: On UNIX, you call
> abort(3), which writes a core and aborts. It writes a short message to
> stderr "Aborted (core dumped)" or just "Aborted".
>
> After Yumins change, on Windows os::abort also writes the core, then
> aborts. It made sense to me that this function should behave the same
> way as on UNIX: if core writing fails, write to stderr. There is no
> way otherwise to communicate a core dump writing failure. After
> writing the core, process stops.
>
> ---
>
> About the control flows:
>
> Before Yumins change:
>
> On Windows
> 1) os::check_or_create_dump(): we wrote the core dump and then
> information about the dump (file location, success or error) was
> handed to the misnamed VmError::report_coredump_status(), which just
> stored that information.
> 2) VmError::report(): Error log was written and in step 63 information
> about the core (already written at that point) was added to error log.
> 3) os::abort() just did an exit(3)
>
> On Unix,
> 1) os::check_or_create_dump(): checks the limits to guess whether core
> dumping later will succeed. Again, that information is handed to
> VmError::report_coredump_status()
> 2) VmError::report(): Error log is written and in step 63, information
> about the core's probable future location is added to error log. Here,
> messages use past tense, which is misleading, because the core is not
> yet written.
> 3) os::abort() calls abort(3), which stops and writes a core.
>
> Yumin pushed core file writing on Windows down to os::abort(). This
> brings the control flow much closer to Unix:
>
> (1) call os::check_dump_limit() to check the prerequisites for core
> file writing and gather a bit information to put in the error log: On
> Unix, limit checks, on windows, so far nothing much but precalculating
> the error file name.
> (2) VmError::report(): Error log is written and information about the
> core's probable location is added to error log.
> (3) os::abort() is called, which on all platforms:
> - dumps the core
> - if failing, writes a one-liner to stderr
> - stops the process.
>
> I think, if one agrees that reversing order of core dumping and error
> log writing on windows is a good thing, this change looks ok to me.
> Some improvements could make it clearer:
>
> - maybe rename "os::check_dump_limit()" (was my suggestion, sorry) to
> "os::check_core_prerequisites()" - that leaves room to flesh it out a
> bit more on Windows, where we do not have UNIX limits.
> - make the messages in VMError::report() future sense: "Will write
> core file to location.....".
> - Make the messages written in os::abort() on Windows clearer, and
> shorter, and we also do not need the buffer to be static here.
> Actually, if we want to be as on UNIX, just dumping "core file writing
> succeeded" or "failed" - maybe with an error number from
> GetLastError() - will be enough because the file location is already
> printed in the error log. Like on UNIX.
>
> Kind Regards, Thomas
>
>
> On Mon, Mar 30, 2015 at 9:51 PM, Daniel D. Daugherty
> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>
> Re: report_coredump_status() is really record_coredump_status()
>
> The report_coredump_status() function is designed to record two
> things about core dumps for later reporting by the code that
> generates the hs_err_pid file. That's why the original
> report_coredump_status() didn't output anything to stderr.
>
> By changing the Windows calls to report_coredump_status() into
> jio_fprintf() calls you have:
>
> - put output onto stderr that should go to the hs_err_pid file
> - made the windows code paths work differently than the non-windows
> code paths
>
>
> Dan
>
>
>
> On 3/30/15 1:28 PM, Yumin Qi wrote:
>
> Thanks Dan
>
> On 3/30/2015 11:25 AM, Daniel D. Daugherty wrote:
>
> On 3/25/15 12:11 PM, Yumin Qi wrote:
>
> Hi, all
>
> I updated the webrev with a new change to test case:
> test/runtime/Unsafe/RangeCheck.java
>
> Add -XX:-CreateCoredumpOnCrash to test, no dump
> needed on this case.
>
> new webrev:
> http://cr.openjdk.java.net/~minqi/8074354/webrev05/
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev05/>
>
>
> General nit - please update copyright lines to 2015 as needed
>
> src/share/vm/runtime/globals.hpp
> No comments.
>
> src/share/vm/runtime/os.hpp
> line 720: // On Windows this will create an actual
> minidump, on Linux/Solaris it will simply check core dump
> limits
> line 721: static void check_dump_limit(char* buffer,
> size_t bufferSize);
> The comment on line 720 no longer matches with the
> function
> rename from check_or_create_dump() to
> check_dump_limit().
>
> You have this comment in vmError.cpp:
> 943 // check core dump limits on
> Linux/Solaris, nothing on Windows
> 944 os::check_dump_limit(buffer, sizeof(buffer));
>
> so the two comments are out of sync.
>
> Will convert them to be consistent.
>
> src/share/vm/runtime/arguments.cpp
> line 3252: } else if (match_option(option,
> "-XX:+CreateMinidumpOnCrash", &tail)) {
> line 3256: } else if (match_option(option,
> "-XX:-CreateMinidumpOnCrash", &tail)) {
> These two checks should use the match_option() version
> without a '&tail' parameter.
>
> Will use version w/o tail.
>
> src/share/vm/utilities/vmError.cpp
> line 217: bool VMError::coredump_status = true;
> // presume we can dump core file first
> I don't think you should init this to true. It
> confuses
> things with VMError::report_coredump_status(). It
> will also
> confuse this code:
>
> 526 STEP(63, "(printing core file information)")
> 529 if (coredump_status) {
> 530 st->print("Core dump written. Default
> location: %s", coredump_message);
>
> because coredump_status won't accurately reflect
> whether
> the coredump_message field has been set.
>
> line 943: // check core dump limits on Linux/Solaris,
> nothing on Windows
> This updated comment doesn't match the unmodified
> comment in os.hpp:
>
> 720: // On Windows this will create an actual
> minidump, on Linux/Solaris it will simply check core dump
> limits
>
> I will consider your comments, then revise with a new version.
> I remember here there is a case, if status not set, it will
> output an inconsistent message. Will check again.
>
> src/os/aix/vm/os_aix.cpp
> No comments.
>
> src/os/bsd/vm/os_bsd.cpp
> No comments.
>
> src/os/linux/vm/os_linux.cpp
> No comments.
>
> src/os/posix/vm/os_posix.cpp
> No comments.
>
> src/os/solaris/vm/os_solaris.cpp
> No comments.
>
> src/os/windows/vm/os_windows.cpp
> line 1008: static char buffer[O_BUFLEN];
> Why switch from a buffer parameter to a static buffer?
> What happens with parallel calls to abort()? Will
> the static
> buffer get stomped by the competing threads?
>
> The original buffer is static too. Carrying an buffer for
> abort seems not right. This buffer in fact only for storing
> file name (based on pid) here.
> abort() will dump the core file, should we prevent
> multi-thread calling to this function? I did not check this
> part, will check if it is MT-safe.
>
>
> line 1015: // If running on a client version of
> Windows and user has not explicitly enabled dumping
> line 1016: if (!os::win32::is_windows_server() &&
> !CreateCoredumpOnCrash) {
> The default for CreateCoredumpOnCrash is now
> 'true' so the
> comment and logic are no longer correct here. The
> Windows
> Client VM will generate minidumps by default.
>
> old line 1007:
> VMError::report_coredump_status("Minidumps are not enabled
> by default on client versions of Windows", false);
> new line 1017: jio_fprintf(stderr, "Minidumps are not
> enabled by default on client versions of Windows.\n");
> There are a number of places where we change from
> VMError::report_coredump_status() to jio_fprintf()
> and I'm not quite seeing why this was done.
>
> Update: VMError::report_coredump_status() is
> misnamed. It doesn't
> report anything in the sense that it doesn't print
> anything. It
> does record two pieces of information about core
> dumps so maybe
> it should be VMError::record_coredump_status().
>
> line 1100: jio_fprintf(stderr, "%s.\n", buffer);
> At this point, buffer contains the path to the
> mini-dump file so the above line simply prints
> that on stderr. Why?
>
> Yes, I see that the old code did something similar:
>
> 1090 VMError::report_coredump_status(buffer,
> true);
>
> but report_coredump_status() didn't actually print
> anything. It just squirreled away these in
> vmError.cpp:
>
> 217 bool VMError::coredump_status;
> 218 char VMError::coredump_message[O_BUFLEN];
>
> See comments above about how report_coredump_status()
> is misnamed.
>
> At this point, I'm convinced that all the changes from
> VMError::report_coredump_status() to jio_fprintf() are
> a bad idea.
>
>
> Changing to jio_fprintf is because report_coredump_status did
> not output anything, it is just a logging as you indicated. It
> is reasonable we change it to record_coredump_status instead.
> How about add printout to report_coredump_status? So I do not
> need to use jio_printf here.
>
> Other consideration of using jio_fprintf after call shutdown
> called, the static output stream still exists, but not
> guaranteed to work as expected.
>
>
> Thanks
> Yumin
>
> test/runtime/ErrorHandling/ProblematicFrameTest.java
> No comments.
>
> test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
> No comments.
>
> test/runtime/ErrorHandling/SecondaryErrorTest.java
> No comments.
>
> test/runtime/Safepoint/AssertSafepointCheckConsistency1.java
> No comments.
>
> test/runtime/Safepoint/AssertSafepointCheckConsistency2.java
> No comments.
>
> test/runtime/Safepoint/AssertSafepointCheckConsistency3.java
> No comments.
>
> test/runtime/Safepoint/AssertSafepointCheckConsistency4.java
> No comments.
>
> test/runtime/Unsafe/RangeCheck.java
> No comments.
>
> test/runtime/memory/ReadFromNoaccessArea.java
> No comments.
>
>
>
> Thanks
> Yumin
>
>
> On 3/23/2015 11:48 AM, Yumin Qi wrote:
>
> Since Thomas is not a reviewer in open jdk, I need
> two volunteers (at least one "R"eviewer).
>
> Dan, since you already reviewed previous version,
> could you have a look?
>
> Thanks
> Yumin
>
> On 3/20/2015 3:20 PM, Yumin Qi wrote:
>
> Thomas,
>
> Thanks. Yes, it is webrev04. Also, I have
> updated webrev04 to add another test case
> change:
> test/runtime/memory/ReadFromNoaccessArea.java
> (Thanks Dan's update)
>
> Thanks
> Yumin
>
> On 3/20/2015 11:55 AM, Thomas Stüfe wrote:
>
>
> Hi Yumin,
>
> I think you meant to post webrev.04?
>
> I looked at 04, and it looks nice. Thank you!
>
> (still only reviewer with r)
>
> Thomas
>
> On Mar 20, 2015 6:20 PM, "Yumin Qi"
> <yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>
> <mailto:yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>>> wrote:
>
> Hi, All
>
> http://cr.openjdk.java.net/~minqi/8074354/webrev03/
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev03/>
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev03/>
>
> This version is based on Thomas'
> suggestion.
> Tests passed
> vm.runtime.quick.testlist, JPRT and manual
> tests.
>
> Thanks
> Yumin
>
> On 3/18/2015 10:06 AM, Yumin Qi wrote:
>
>
> On 3/18/2015 4:26 AM, Thomas Stüfe
> wrote:
>
> Hi Yumin,
>
> - just aesthetics, but on all
> implementations of
> os_abort(), at least on the
> UNIX variants, maybe we could
> consistently rename the
> parameters to be "siginfo" and
> "context" instead of using
> Windows terms?
>
> Sure.
>
> - On
> check_create_dump_limit(), you do not need
> exceptionRecord, contextRecord
> parameters anymore. The
> only parameters now needed are
> buffer and buffer size,
> which is not even an output
> parameter, just a scratch
> buffer for this function to
> use for printf. For output, it
> calls
> VMError::report_coredump_status(...) at
> the end to
> provide information about the
> core file.
>
> - I would rename that function
> from
> check_create_dump_limit() to
> check_dump_limit() - nothing
> is created anymore.
>
> Yes, no need to carry those two
> parameters. I tried to keep
> less changes, but decided to
> change now.
>
> - on Windows, in os::abort(),
> there is no point anymore in
> calling
> VMError::report_coredump_status(...)
> because that
> information is only used
> during error log writing which
> already happened. It only
> makes sense to call this
> function in
> check_create_dump_limit(), which happens
> before error log writing.
> Instead, maybe error messages
> like "Call to
> MiniDumpWriteDump() failed"
> should just be written to
> stderr? This would be
> consistent with Unix, where the OS
> writes a short message on
> stderr if core file writing fails.
>
> Sure, will direct output to stderr.
>
> - there is now a new test,
> test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
> - could you please also add
> "-XX:-CreateCoredumpOnCrash" ?
> I just added the test and did
> not want to add the option
> before it was available.
>
> OK
>
> Thanks
> Yumin
>
> Thanks for your work!
>
> Thomas
>
>
> On Tue, Mar 17, 2015 at 2:04
> AM, Yumin Qi
> <yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>
> <mailto:yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>>
> <mailto:yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>
> <mailto:yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>>>>
> wrote:
>
> Hi, Dan and all
>
> I have updated webrev at:
> http://cr.openjdk.java.net/~minqi/8074354/webrev03/
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev03/>
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev03/>
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev03/>
>
> 1) bug changed synopsis
> 8074354: Make
> CreateMinidumpOnCrash a new name and
> available on all platforms
> 2) tests: JPRT,
> vm.runtime.quick.testlist (on
> Windows), jtreg on
> Windows and Linux/Unix.
>
> Thanks
> Yumin
>
>
> On 3/12/2015 12:59 PM,
> Daniel D. Daugherty wrote:
>
> I believe there are
> both JavaTest/JTREG tests in
> the hotspot repo
> and there are VM/NSK
> tests from back when we did
> phone home...
>
> Check with Christian
> or Misha...
>
> Dan
>
>
> On 3/12/15 1:50 PM,
> Yumin Qi wrote:
>
> Thanks, Dan
>
> Will look at and
> run those tests. Are they
> under nsk?
>
> Yumin
>
> On 3/12/2015 12:29
> PM, Daniel D. Daugherty wrote:
>
> Yumin,
>
> There are some
> error handler tests. You
> should find
> those and make
> sure that you
> run them on Windows since you're
> changing the order
> there...
>
> Dan
>
>
> On 3/12/15
> 12:42 PM, Yumin Qi wrote:
>
> Hi, Thomas
> and all
>
> The
> second version of the change:
> webrev:
> http://cr.openjdk.java.net/~minqi/8074354/webrev02/
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev02/>
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev02/>
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev02/>
>
> In this
> version:
> 1)
> switch order on Windows to first
> print log
> file as
> other platform, then dump core
> file if
> needed.
> When VMError created after
> crash, siginfo
> and
> context which are ExceptionRecord and
> ContextRecord on Windows are recorded,
> mini (or
> full, due
> to 'or' used so type will be
> mini
> anyway)
> dump creates dump file based
> on those two
> as before.
> 2) to
> make os::abort to get above
> two pointers,
> added two
> more fields to the function
> parameters:
> 3)
> changes for
> test/runtime/ErrorHandling/SecondaryError.java
> ---
> added
> "-XX:-CreateCoredumpOnCrash"
>
> -
> static void abort(bool
> dump_core = true);
> +
> static void abort(bool dump_core
> = true,
> void
> *siginfo = NULL, void *context =
> NULL);
>
> Tests:
> JPRT and manually.
>
> Thanks
> Yumin
>
> On
> 3/11/2015 3:47 AM, Thomas Stüfe wrote:
>
> Hi Yumin,
>
> There
> is also
> test/runtime/ErrorHandling/SecondaryErrorTest.java
> -
> could you please add
> "-XX:-CreateCoredumpOnCrash" ?
> Thank you!
>
> Beyond
> that, as I wrote in the bug
> report
> comments:
>
> "This
> is also a problem on Windows -
> MiniDumpWriteDump() may hang
> infinitly. And on
> Windows this is worse than under
> UNIX because
> we
> create the Dump before writing
> the hs-err
> file,
> so if the Dump hangs, we get
> no error
> log. I
> would like to revert the
> order: create
> the
> minidump after writing the
> error log, the
> same
> way Unix does it. We did this
> in our JVM
> (SAP)
> because for us, error logs
> are more
> useful
> than minidumps. "
>
> So, I
> would like to see os::abort
> on Windows
> call
> MiniDumpWriteDump(), and thus
> the mini
> dump
> writing moved after the error log
> writing. This would also make the
> code way
> cleaner because the control flow
> would be the
> same
> on all platforms.
>
> I
> understand that this may be out
> of scope for
> your
> change, but I would like to
> know what
> others
> think about this.
>
> Kind
> regards, Thomas
>
>
>
>
>
>
>
> On
> Wed, Mar 11, 2015 at 8:02 AM,
> Yumin Qi
> <yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>
> <mailto:yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>>
> <mailto:yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>
> <mailto:yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>>>
> <mailto:yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>
> <mailto:yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>>
> <mailto:yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>
> <mailto:yumin.qi at oracle.com
> <mailto:yumin.qi at oracle.com>>>>> wrote:
>
> Please review:
>
> bug:
> https://bugs.openjdk.java.net/browse/JDK-8074354
> webrev:
> http://cr.openjdk.java.net/~minqi/8074354/webrev01/
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev01/>
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev01/>
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev01/>
> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev01/>
>
> Summary: Tests timed out when
> VM crashes
> and
> dumping core file
> which in the test case is not
> needed. To
> make
> core not created,
> the fix changed
> CreateMinidumpOnCrash to
> CreateCoredumpOnCrash,
> the former is only used on
> Windows and the
> latter
> for all
> platforms. When VM crashes on
> non Windows,
> core
> file generated as
> default if OS sets core dump
> allowed.
> Default value of
> CreateCoredumpOnCrash set to
> 'true' to
> keep
> same behavior on non
> Windows platforms, but changed
> for Windows
> ---
> original is false,
> not create minidump on
> Windows. With
> CreateCoredumpOnCrash turned
> off, no core file will be
> generated.
> CreateMinidumpOnCrash still
> can be used on commandline but
> only as
> alias
> for the new flag.
>
> Tests: JPRT, manual tests setting
> CreateMinidumpOnCrash on
> commandline to verify flag
> change as alias.
>
> Thanks
> Yumin
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
More information about the hotspot-runtime-dev
mailing list