RFR(S): 8074354: tests that intentionally crash the VM can timeout creating a core file
Thomas Stüfe
thomas.stuefe at gmail.com
Fri Apr 10 11:20:09 UTC 2015
Hi Yumin,
Hi Yumin,
this looks mostly fine for me. I only was able to test it on Linux, but
will test it on other OSes next week.
Here two remaining small points:
- in os::check_dump_limit, maybe a comment would be helpful, because the
contract for this function is a bit difficult to understand. Proposal:
"Check or prepare a core dump to be taken at a later point in the same
thread in os::abort(). Use the caller provided buffer as a scratch buffer.
Output status message via VMError::report_cordedump_status() (on success,
the core dump file location, on error, a short error message) - this status
message which will be written into the error log."
- It would make sense to improve the output in VMError.report() in STEP(63)
a bit:
Problem 1: we use past-tense, which is misleading ("Core dump written",
"Failed to write core"). We did not generate the core dump yet, we just
plan to do it, and it still may fail for various reasons (e.g. even if the
limits are fine, the current directory may be write protected. There is no
way to catch all errors beforehand). If the user reads "Core dump written",
he assumes the core dump was written successfully. If abort(3) later fails
to write the core dump, user will be confused.
Problem 2: this is more obscure: on Linux core dumps may not be written to
the file system at all but be processed by some other program. On my Ubuntu
14.4 machine I see this output:
#
# Core dump written. Default location: Core dumps may be processed with
"/usr/share/apport/apport %p %s %c %P" (or dumping to
/shared/projects/openjdk/jdk9-hs-rt/output/core.26261)
#
This output is a bit jarring.
My suggestion for better texts would be:
"Core dump will be written. %s"
"No core dump will be written. %s"
which hopefully fits all cases better. But I don't know, other suggestions
are welcome :-)
Otherwise, this looks fine.
Thanks for your work!
Thomas
On Fri, Apr 10, 2015 at 5:18 AM, Yumin Qi <yumin.qi at oracle.com> wrote:
> Hi, Thomas and Dan
>
> http://cr.openjdk.java.net/~minqi/8074354/webrev07/
>
> On 4/9/2015 7:51 PM, Yumin Qi wrote:
>
>> Please hold on, I missed the test cases:
>>
>> *test/runtime/Unsafe/RangeCheck.java
>> *
>> *test/runtime/memory/ReadFromNoaccessArea.java
>>
>> Added.
>
>> Also some file heads need to be updated with this year.
>>
>> Changed
>
> Thanks
> Yumin
>
> Thanks
>> Yumin
>>
>>
>> *
>> On 4/9/2015 7:07 PM, Yumin Qi wrote:
>>
>>> Hi, Thomas and Dan
>>>
>>> The same URL is updated with new version.
>>>
>>> http://cr.openjdk.java.net/~minqi/8074354/webrev07/
>>>
>>> I remove the public access to VMError::out, VMError::coredump_message
>>> and the size of VMError::coredump_message.
>>> Added private static HANDLE in os_windows.cpp for dumpFile as
>>> suggested by Thomas.
>>>
>>> Tests passes JPRT and manual tests.
>>>
>>> Thanks
>>> Yumin
>>>
>>>
>>> On 4/9/2015 1:32 AM, Thomas Stüfe wrote:
>>>
>>>> Hi Yumin,
>>>>
>>>> I do not think exposing VMError::coredump_status buffer just for the
>>>> sake of using it as a scratch buffer in os::abort() is a good idea. It is
>>>> very confusing and has nothing to do with the purpose of
>>>> VMerror::coredump_status.
>>>>
>>>> Also, VMerror::coredump_status is static, so you do not even gain
>>>> anything over just using a global or function scope static buffer. Either
>>>> we are worried about thread safety in os::abort(), or we aren't. If we
>>>> aren't, just use a local static buffer, it is much clearer and no need to
>>>> expose VMError::coredump_status like this.
>>>>
>>>> Apart from that, see my remarks in yesterdays mail.
>>>>
>>>> Cheers, Thomas
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Apr 9, 2015 at 5:16 AM, Yumin Qi <yumin.qi at oracle.com <mailto:
>>>> yumin.qi at oracle.com>> wrote:
>>>>
>>>> Hi, Dan
>>>>
>>>> New webrev at
>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev07/
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>>
>>>> To make code clean, add os::abort_internal which is only used
>>>> for Windows.
>>>> Since passing VMError::coredump_message and its size as
>>>> parameters across multiple functions makes function have a bigger
>>>> signature, I change the way to access them -- add two functions in
>>>> VMError to access them: coredump_msg_buffer() and
>>>> coredump_msg_buffer_size(). They are static members, so can save
>>>> stack space for passing around.
>>>>
>>>> in os_windows.cpp, abort(bool, void*, void*):
>>>> if error encountered, exit with error messages.
>>>> If no error, VMError::coredump_message contains the dump
>>>> path/name. I just use them so no need to have extra space for name
>>>> allocation.
>>>>
>>>> Passed JPRT and manual tests.
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>>
>>>> On 4/7/2015 9:23 AM, Daniel D. Daugherty wrote:
>>>>
>>>>> On 4/1/15 10:57 AM, Yumin Qi wrote:
>>>>>
>>>>>> Hi, Dan and Thomas,
>>>>>>
>>>>>> I have a new webrev at
>>>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev06/
>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev06/>
>>>>>>
>>>>>
>>>>>
>>>>> src/share/vm/runtime/globals.hpp
>>>>> Sorry, I didn't notice this before:
>>>>>
>>>>> L939: product(bool, CreateCoredumpOnCrash,
>>>>> true, \
>>>>> L940: "Create minidump on VM fatal
>>>>> error") \
>>>>>
>>>>> The "minidump" on L940 should change also. Don't know if
>>>>> you
>>>>> want to say "core/mini dump" or something else.
>>>>>
>>>>>
>>>>> src/share/vm/runtime/os.hpp
>>>>> L719: // On Linux/Solaris it will simply check core dump
>>>>> limits while on Windows will do nothing.
>>>>> grammar: 'on Windows will do nothing'
>>>>> -> 'on Windows it will do nothing'
>>>>>
>>>>>
>>>>> src/share/vm/runtime/arguments.cpp
>>>>> No comments.
>>>>>
>>>>> src/share/vm/utilities/vmError.hpp
>>>>> No comments.
>>>>>
>>>>> src/share/vm/utilities/vmError.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/aix/vm/os_aix.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/bsd/vm/os_bsd.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/linux/vm/os_linux.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/posix/vm/os_posix.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/solaris/vm/os_solaris.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/windows/vm/os_windows.cpp
>>>>> L92: jio_snprintf(buffer, bufferSize,
>>>>> "%s\\hs_err_pid%u.mdmp", get_current_directory(NULL, 0),
>>>>> current_process_id());
>>>>> Other non-Windows calls to get_current_directory() do a
>>>>> NULL
>>>>> check before using the return value. The original Windows
>>>>> code that called get_current_directory() did not make this
>>>>> check, but I think it's better to be consistent and safe.
>>>>>
>>>>> L1007: static char buffer[O_BUFLEN];
>>>>> L1008: char filename[FILENAME_MAX];
>>>>> Two more potentially large stack variables here. Will we
>>>>> run into stack overflow on Win* more frequently when what
>>>>> we really wanted was the reason for the abort() call?
>>>>>
>>>>> The reason the original code used 'buffer' to construct the
>>>>> dump file name was to save space. The original code was
>>>>> also
>>>>> careful to not need the file name in any error messages
>>>>> after
>>>>> the point at which the dump file was successfully created.
>>>>>
>>>>> It would be better (more resilient) if you could pass in
>>>>> an already existing buffer to abort() like the original
>>>>> code passed into os::check_or_create_dump(). Yes, that
>>>>> would require visiting all existing calls to abort().
>>>>>
>>>>> L1015: assert(out->is_open(), "defaultStream should be
>>>>> open");
>>>>> Is it possible for os::abort() to be called early enough on
>>>>> Windows such that VMError::out is not actually setup? Yes,
>>>>> this would have to be absolutely catastrophic...
>>>>>
>>>>> See the comment in os_solaris.cpp L1520-1522.
>>>>>
>>>>> L1018: jio_snprintf(buffer, sizeof(buffer), "%s", "Either
>>>>> CreateCoredumpOnCrash is disabled from command"
>>>>> L1019 " line or coredump is not available");
>>>>> When you are using jio_snprintf() as a replacement for:
>>>>>
>>>>> strncpy(buffer, "my string", sizeof(buffer));
>>>>> buffer[sizeof(buffer) - 1] = '\0';
>>>>>
>>>>> you don't need the "%s" format string. So this code would
>>>>> be:
>>>>>
>>>>> jio_snprintf(buffer, sizeof(buffer),
>>>>> "Either CreateCoredumpOnCrash is disabled
>>>>> from "
>>>>> "command line or coredump is not available");
>>>>>
>>>>> Same comment for: L1026, L1039
>>>>>
>>>>> L1020: goto done;
>>>>> The use of 'goto' makes me cringe. You can refactor much
>>>>> of this logic into an abort_internal() function that
>>>>> abort() calls and keep all the original "return" logic
>>>>> to bail out.
>>>>>
>>>>> L1045: // Older versions of dbghelp.h doesn't contain all
>>>>> grammar: 'doesn't' -> 'do not'
>>>>>
>>>>> L1052: cwd = get_current_directory(NULL, 0);
>>>>> L1053: jio_snprintf(filename, sizeof(filename),
>>>>> "%s\\hs_err_pid%u.mdmp", cwd, current_process_id());
>>>>> Other non-Windows calls to get_current_directory() do a
>>>>> NULL
>>>>> check before using the return value. The original Windows
>>>>> code that called get_current_directory() did not make this
>>>>> check, but I think it's better to be consistent and safe.
>>>>>
>>>>> L1092: // well done.
>>>>> Like a overcooked steak? :-) Not sure what you're trying
>>>>> to say here...
>>>>>
>>>>> test/runtime/ErrorHandling/ProblematicFrameTest.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/ErrorHandling/SecondaryErrorTest.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency1.java
>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency2.java
>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency3.java
>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency4.java
>>>>> No comments on these four.
>>>>>
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>>> What changes:
>>>>>>
>>>>>> 1) os_windows.cpp, os::abort(bool ....):
>>>>>> a) Since when abort() is called, the last function
>>>>>> called before exit process, VMError::log already closed, which
>>>>>> is the file logging. defaultStream still available to use, so
>>>>>> use it for writing error or success messages. VMError::out is
>>>>>> the defaultStream, for using it, added a function out_stream()
>>>>>> in VMError.
>>>>>> b) Delete the codes which decide to write core based on
>>>>>> client/server version with debug. The new code is if
>>>>>> CreateCoredumpOnCrash set or dump_core is on, the coredump will
>>>>>> be tried no matter which version it is. There is no special
>>>>>> reason not to dump core with client version I think. Any concern
>>>>>> for this change?
>>>>>> c) Use of 'goto' in this function. I tried not to use it,
>>>>>> but using it makes code clear. I remember somewhere suggest not
>>>>>> using 'goto' in cplusplus.
>>>>>>
>>>>>> 2) changed comments to make them consistent as indicated by
>>>>>> Dan.
>>>>>>
>>>>>> 3) Added/modified head version for files.
>>>>>>
>>>>>> Tests: JPRT, manual test on local Windows.
>>>>>>
>>>>>> Thanks
>>>>>> Yumin
>>>>>>
>>>>>>
>>>>>> On 3/31/2015 1:25 AM, Thomas Stüfe wrote:
>>>>>>
>>>>>>> Hi Daniel, Yumin,
>>>>>>>
>>>>>>> sorry, a lot of Yumins changes were based on suggestions from
>>>>>>> me, so here more background:
>>>>>>>
>>>>>>> Yumin reversed the order of error log writing and core dumping.
>>>>>>> The reason for that are explained in detail at the start of the
>>>>>>> mail thread, in short:
>>>>>>>
>>>>>>> - on Windows core file dumping may hang - its rare but it
>>>>>>> happens - preventing error logs. Depending on who you ask,
>>>>>>> error logs are more important than minidumps, so it makes sense
>>>>>>> to first write the erorr log.
>>>>>>> - This also brings windows core paths closer to UNIX code
>>>>>>> paths. See below.
>>>>>>>
>>>>>>> About writing to stderr in os::abort():
>>>>>>>
>>>>>>> After this change, calling VmError::report_coredump_status() in
>>>>>>> os::abort() will not do anything because the error log is
>>>>>>> already written at that point. I suggested to Yumin writing to
>>>>>>> stderr instead in os::abort() because that mimicks UNIX
>>>>>>> behaviour: On UNIX, you call abort(3), which writes a core and
>>>>>>> aborts. It writes a short message to stderr "Aborted (core
>>>>>>> dumped)" or just "Aborted".
>>>>>>>
>>>>>>> After Yumins change, on Windows os::abort also writes the core,
>>>>>>> then aborts. It made sense to me that this function should
>>>>>>> behave the same way as on UNIX: if core writing fails, write to
>>>>>>> stderr. There is no way otherwise to communicate a core dump
>>>>>>> writing failure. After writing the core, process stops.
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> About the control flows:
>>>>>>>
>>>>>>> Before Yumins change:
>>>>>>>
>>>>>>> On Windows
>>>>>>> 1) os::check_or_create_dump(): we wrote the core dump and then
>>>>>>> information about the dump (file location, success or error)
>>>>>>> was handed to the misnamed VmError::report_coredump_status(),
>>>>>>> which just stored that information.
>>>>>>> 2) VmError::report(): Error log was written and in step 63
>>>>>>> information about the core (already written at that point) was
>>>>>>> added to error log.
>>>>>>> 3) os::abort() just did an exit(3)
>>>>>>>
>>>>>>> On Unix,
>>>>>>> 1) os::check_or_create_dump(): checks the limits to guess
>>>>>>> whether core dumping later will succeed. Again, that
>>>>>>> information is handed to VmError::report_coredump_status()
>>>>>>> 2) VmError::report(): Error log is written and in step 63,
>>>>>>> information about the core's probable future location is added
>>>>>>> to error log. Here, messages use past tense, which is
>>>>>>> misleading, because the core is not yet written.
>>>>>>> 3) os::abort() calls abort(3), which stops and writes a core.
>>>>>>>
>>>>>>> Yumin pushed core file writing on Windows down to os::abort().
>>>>>>> This brings the control flow much closer to Unix:
>>>>>>>
>>>>>>> (1) call os::check_dump_limit() to check the prerequisites for
>>>>>>> core file writing and gather a bit information to put in the
>>>>>>> error log: On Unix, limit checks, on windows, so far nothing
>>>>>>> much but precalculating the error file name.
>>>>>>> (2) VmError::report(): Error log is written and information
>>>>>>> about the core's probable location is added to error log.
>>>>>>> (3) os::abort() is called, which on all platforms:
>>>>>>> - dumps the core
>>>>>>> - if failing, writes a one-liner to stderr
>>>>>>> - stops the process.
>>>>>>>
>>>>>>> I think, if one agrees that reversing order of core dumping and
>>>>>>> error log writing on windows is a good thing, this change looks
>>>>>>> ok to me. Some improvements could make it clearer:
>>>>>>>
>>>>>>> - maybe rename "os::check_dump_limit()" (was my suggestion,
>>>>>>> sorry) to "os::check_core_prerequisites()" - that leaves room
>>>>>>> to flesh it out a bit more on Windows, where we do not have
>>>>>>> UNIX limits.
>>>>>>> - make the messages in VMError::report() future sense: "Will
>>>>>>> write core file to location.....".
>>>>>>> - Make the messages written in os::abort() on Windows clearer,
>>>>>>> and shorter, and we also do not need the buffer to be static
>>>>>>> here. Actually, if we want to be as on UNIX, just dumping "core
>>>>>>> file writing succeeded" or "failed" - maybe with an error
>>>>>>> number from GetLastError() - will be enough because the file
>>>>>>> location is already printed in the error log. Like on UNIX.
>>>>>>>
>>>>>>> Kind Regards, Thomas
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 30, 2015 at 9:51 PM, Daniel D. Daugherty
>>>>>>> <daniel.daugherty at oracle.com
>>>>>>> <mailto:daniel.daugherty at oracle.com>> wrote:
>>>>>>>
>>>>>>> Re: report_coredump_status() is really
>>>>>>> record_coredump_status()
>>>>>>>
>>>>>>> The report_coredump_status() function is designed to record
>>>>>>> two
>>>>>>> things about core dumps for later reporting by the code that
>>>>>>> generates the hs_err_pid file. That's why the original
>>>>>>> report_coredump_status() didn't output anything to stderr.
>>>>>>>
>>>>>>> By changing the Windows calls to report_coredump_status()
>>>>>>> into
>>>>>>> jio_fprintf() calls you have:
>>>>>>>
>>>>>>> - put output onto stderr that should go to the hs_err_pid
>>>>>>> file
>>>>>>> - made the windows code paths work differently than the
>>>>>>> non-windows
>>>>>>> code paths
>>>>>>>
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3/30/15 1:28 PM, Yumin Qi wrote:
>>>>>>>
>>>>>>> Thanks Dan
>>>>>>>
>>>>>>> On 3/30/2015 11:25 AM, Daniel D. Daugherty wrote:
>>>>>>>
>>>>>>> On 3/25/15 12:11 PM, Yumin Qi wrote:
>>>>>>>
>>>>>>> Hi, all
>>>>>>>
>>>>>>> I updated the webrev with a new change to
>>>>>>> test case:
>>>>>>> test/runtime/Unsafe/RangeCheck.java
>>>>>>>
>>>>>>> Add -XX:-CreateCoredumpOnCrash to test, no
>>>>>>> dump needed on this case.
>>>>>>>
>>>>>>> new webrev:
>>>>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev05/
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev05/>
>>>>>>>
>>>>>>>
>>>>>>> General nit - please update copyright lines to 2015
>>>>>>> as needed
>>>>>>>
>>>>>>> src/share/vm/runtime/globals.hpp
>>>>>>> No comments.
>>>>>>>
>>>>>>> src/share/vm/runtime/os.hpp
>>>>>>> line 720: // On Windows this will create an
>>>>>>> actual minidump, on Linux/Solaris it will simply
>>>>>>> check core dump limits
>>>>>>> line 721: static void check_dump_limit(char*
>>>>>>> buffer, size_t bufferSize);
>>>>>>> The comment on line 720 no longer matches
>>>>>>> with the function
>>>>>>> rename from check_or_create_dump() to
>>>>>>> check_dump_limit().
>>>>>>>
>>>>>>> You have this comment in vmError.cpp:
>>>>>>> 943 // check core dump limits on
>>>>>>> Linux/Solaris, nothing on Windows
>>>>>>> 944 os::check_dump_limit(buffer,
>>>>>>> sizeof(buffer));
>>>>>>>
>>>>>>> so the two comments are out of sync.
>>>>>>>
>>>>>>> Will convert them to be consistent.
>>>>>>>
>>>>>>> src/share/vm/runtime/arguments.cpp
>>>>>>> line 3252: } else if (match_option(option,
>>>>>>> "-XX:+CreateMinidumpOnCrash", &tail)) {
>>>>>>> line 3256: } else if (match_option(option,
>>>>>>> "-XX:-CreateMinidumpOnCrash", &tail)) {
>>>>>>> These two checks should use the
>>>>>>> match_option() version
>>>>>>> without a '&tail' parameter.
>>>>>>>
>>>>>>> Will use version w/o tail.
>>>>>>>
>>>>>>> src/share/vm/utilities/vmError.cpp
>>>>>>> line 217: bool VMError::coredump_status =
>>>>>>> true; // presume we can dump core file first
>>>>>>> I don't think you should init this to true.
>>>>>>> It confuses
>>>>>>> things with
>>>>>>> VMError::report_coredump_status(). It will also
>>>>>>> confuse this code:
>>>>>>>
>>>>>>> 526 STEP(63, "(printing core file
>>>>>>> information)")
>>>>>>> 529 if (coredump_status) {
>>>>>>> 530 st->print("Core dump written. Default
>>>>>>> location: %s", coredump_message);
>>>>>>>
>>>>>>> because coredump_status won't accurately
>>>>>>> reflect whether
>>>>>>> the coredump_message field has been set.
>>>>>>>
>>>>>>> line 943: // check core dump limits on
>>>>>>> Linux/Solaris, nothing on Windows
>>>>>>> This updated comment doesn't match the
>>>>>>> unmodified
>>>>>>> comment in os.hpp:
>>>>>>>
>>>>>>> 720: // On Windows this will create an
>>>>>>> actual minidump, on Linux/Solaris it will simply
>>>>>>> check core dump limits
>>>>>>>
>>>>>>> I will consider your comments, then revise with a new
>>>>>>> version. I remember here there is a case, if status not
>>>>>>> set, it will output an inconsistent message. Will check
>>>>>>> again.
>>>>>>>
>>>>>>> src/os/aix/vm/os_aix.cpp
>>>>>>> No comments.
>>>>>>>
>>>>>>> src/os/bsd/vm/os_bsd.cpp
>>>>>>> No comments.
>>>>>>>
>>>>>>> src/os/linux/vm/os_linux.cpp
>>>>>>> No comments.
>>>>>>>
>>>>>>> src/os/posix/vm/os_posix.cpp
>>>>>>> No comments.
>>>>>>>
>>>>>>> src/os/solaris/vm/os_solaris.cpp
>>>>>>> No comments.
>>>>>>>
>>>>>>> src/os/windows/vm/os_windows.cpp
>>>>>>> line 1008: static char buffer[O_BUFLEN];
>>>>>>> Why switch from a buffer parameter to a
>>>>>>> static buffer?
>>>>>>> What happens with parallel calls to
>>>>>>> abort()? Will the static
>>>>>>> buffer get stomped by the competing threads?
>>>>>>>
>>>>>>> The original buffer is static too. Carrying an buffer
>>>>>>> for abort seems not right. This buffer in fact only for
>>>>>>> storing file name (based on pid) here.
>>>>>>> abort() will dump the core file, should we prevent
>>>>>>> multi-thread calling to this function? I did not check
>>>>>>> this part, will check if it is MT-safe.
>>>>>>>
>>>>>>>
>>>>>>> line 1015: // If running on a client version
>>>>>>> of Windows and user has not explicitly enabled
>>>>>>> dumping
>>>>>>> line 1016: if
>>>>>>> (!os::win32::is_windows_server() &&
>>>>>>> !CreateCoredumpOnCrash) {
>>>>>>> The default for CreateCoredumpOnCrash is
>>>>>>> now 'true' so the
>>>>>>> comment and logic are no longer correct
>>>>>>> here. The Windows
>>>>>>> Client VM will generate minidumps by default.
>>>>>>>
>>>>>>> old line 1007:
>>>>>>> VMError::report_coredump_status("Minidumps are not
>>>>>>> enabled by default on client versions of Windows",
>>>>>>> false);
>>>>>>> new line 1017: jio_fprintf(stderr, "Minidumps
>>>>>>> are not enabled by default on client versions of
>>>>>>> Windows.\n");
>>>>>>> There are a number of places where we
>>>>>>> change from
>>>>>>> VMError::report_coredump_status() to jio_fprintf()
>>>>>>> and I'm not quite seeing why this was done.
>>>>>>>
>>>>>>> Update: VMError::report_coredump_status()
>>>>>>> is misnamed. It doesn't
>>>>>>> report anything in the sense that it
>>>>>>> doesn't print anything. It
>>>>>>> does record two pieces of information about
>>>>>>> core dumps so maybe
>>>>>>> it should be VMError::record_coredump_
>>>>>>> status().
>>>>>>>
>>>>>>> line 1100: jio_fprintf(stderr, "%s.\n", buffer);
>>>>>>> At this point, buffer contains the path to
>>>>>>> the
>>>>>>> mini-dump file so the above line simply
>>>>>>> prints
>>>>>>> that on stderr. Why?
>>>>>>>
>>>>>>> Yes, I see that the old code did something
>>>>>>> similar:
>>>>>>>
>>>>>>> 1090
>>>>>>> VMError::report_coredump_status(buffer, true);
>>>>>>>
>>>>>>> but report_coredump_status() didn't
>>>>>>> actually print
>>>>>>> anything. It just squirreled away these in
>>>>>>> vmError.cpp:
>>>>>>>
>>>>>>> 217 bool VMError::coredump_status;
>>>>>>> 218 char VMError::coredump_message[O_
>>>>>>> BUFLEN];
>>>>>>>
>>>>>>> See comments above about how
>>>>>>> report_coredump_status()
>>>>>>> is misnamed.
>>>>>>>
>>>>>>> At this point, I'm convinced that all the
>>>>>>> changes from
>>>>>>> VMError::report_coredump_status() to jio_fprintf()
>>>>>>> are
>>>>>>> a bad idea.
>>>>>>>
>>>>>>>
>>>>>>> Changing to jio_fprintf is because
>>>>>>> report_coredump_status did not output anything, it is
>>>>>>> just a logging as you indicated. It is reasonable we
>>>>>>> change it to record_coredump_status instead. How about
>>>>>>> add printout to report_coredump_status? So I do not
>>>>>>> need to use jio_printf here.
>>>>>>>
>>>>>>> Other consideration of using jio_fprintf after call
>>>>>>> shutdown called, the static output stream still exists,
>>>>>>> but not guaranteed to work as expected.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>>
>>>>>>> test/runtime/ErrorHandling/ProblematicFrameTest.java
>>>>>>> No comments.
>>>>>>>
>>>>>>> test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
>>>>>>> No comments.
>>>>>>>
>>>>>>> test/runtime/ErrorHandling/SecondaryErrorTest.java
>>>>>>> No comments.
>>>>>>>
>>>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency1.java
>>>>>>> No comments.
>>>>>>>
>>>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency2.java
>>>>>>> No comments.
>>>>>>>
>>>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency3.java
>>>>>>> No comments.
>>>>>>>
>>>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency4.java
>>>>>>> No comments.
>>>>>>>
>>>>>>> test/runtime/Unsafe/RangeCheck.java
>>>>>>> No comments.
>>>>>>>
>>>>>>> test/runtime/memory/ReadFromNoaccessArea.java
>>>>>>> No comments.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>>
>>>>>>>
>>>>>>> On 3/23/2015 11:48 AM, Yumin Qi wrote:
>>>>>>>
>>>>>>> Since Thomas is not a reviewer in open jdk,
>>>>>>> I need two volunteers (at least one
>>>>>>> "R"eviewer).
>>>>>>>
>>>>>>> Dan, since you already reviewed previous
>>>>>>> version, could you have a look?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>>
>>>>>>> On 3/20/2015 3:20 PM, Yumin Qi wrote:
>>>>>>>
>>>>>>> Thomas,
>>>>>>>
>>>>>>> Thanks. Yes, it is webrev04. Also, I
>>>>>>> have updated webrev04 to add another
>>>>>>> test case change:
>>>>>>> test/runtime/memory/ReadFromNoaccessArea.java
>>>>>>> (Thanks Dan's update)
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>>
>>>>>>> On 3/20/2015 11:55 AM, Thomas Stüfe
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Yumin,
>>>>>>>
>>>>>>> I think you meant to post webrev.04?
>>>>>>>
>>>>>>> I looked at 04, and it looks nice.
>>>>>>> Thank you!
>>>>>>>
>>>>>>> (still only reviewer with r)
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>> On Mar 20, 2015 6:20 PM, "Yumin Qi"
>>>>>>> <yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>
>>>>>>> <mailto:yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>>> wrote:
>>>>>>>
>>>>>>> Hi, All
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev03/
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev03/>
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev03/>
>>>>>>>
>>>>>>> This version is based on
>>>>>>> Thomas' suggestion.
>>>>>>> Tests passed
>>>>>>> vm.runtime.quick.testlist, JPRT and
>>>>>>> manual tests.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>>
>>>>>>> On 3/18/2015 10:06 AM, Yumin Qi
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 3/18/2015 4:26 AM,
>>>>>>> Thomas Stüfe wrote:
>>>>>>>
>>>>>>> Hi Yumin,
>>>>>>>
>>>>>>> - just aesthetics, but
>>>>>>> on all implementations of
>>>>>>> os_abort(), at least on
>>>>>>> the UNIX variants, maybe we could
>>>>>>> consistently rename the
>>>>>>> parameters to be "siginfo" and
>>>>>>> "context" instead of
>>>>>>> using Windows terms?
>>>>>>>
>>>>>>> Sure.
>>>>>>>
>>>>>>> - On
>>>>>>> check_create_dump_limit(), you do
>>>>>>> not need
>>>>>>> exceptionRecord, contextRecord
>>>>>>> parameters anymore. The
>>>>>>> only parameters now
>>>>>>> needed are buffer and buffer size,
>>>>>>> which is not even an
>>>>>>> output parameter, just a scratch
>>>>>>> buffer for this
>>>>>>> function to use for printf. For
>>>>>>> output, it
>>>>>>> calls
>>>>>>> VMError::report_coredump_status(...) at
>>>>>>> the end to
>>>>>>> provide information
>>>>>>> about the core file.
>>>>>>>
>>>>>>> - I would rename that
>>>>>>> function from
>>>>>>> check_create_dump_limit() to
>>>>>>> check_dump_limit() - nothing
>>>>>>> is created anymore.
>>>>>>>
>>>>>>> Yes, no need to carry those
>>>>>>> two parameters. I tried to keep
>>>>>>> less changes, but decided
>>>>>>> to change now.
>>>>>>>
>>>>>>> - on Windows, in
>>>>>>> os::abort(), there is no point
>>>>>>> anymore in
>>>>>>> calling
>>>>>>> VMError::report_coredump_status(...) because
>>>>>>> that
>>>>>>> information is only
>>>>>>> used during error log writing which
>>>>>>> already happened. It
>>>>>>> only makes sense to call this
>>>>>>> function in
>>>>>>> check_create_dump_limit(), which
>>>>>>> happens
>>>>>>> before error log writing.
>>>>>>> Instead, maybe error
>>>>>>> messages like "Call to
>>>>>>> MiniDumpWriteDump() failed" should
>>>>>>> just be written to
>>>>>>> stderr? This would be
>>>>>>> consistent with Unix, where the OS
>>>>>>> writes a short message
>>>>>>> on stderr if core file writing fails.
>>>>>>>
>>>>>>> Sure, will direct output to
>>>>>>> stderr.
>>>>>>>
>>>>>>> - there is now a new
>>>>>>> test,
>>>>>>> test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
>>>>>>> - could you please also
>>>>>>> add "-XX:-CreateCoredumpOnCrash" ?
>>>>>>> I just added the test
>>>>>>> and did not want to add the option
>>>>>>> before it was available.
>>>>>>>
>>>>>>> OK
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>>
>>>>>>> Thanks for your work!
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 17, 2015 at
>>>>>>> 2:04 AM, Yumin Qi
>>>>>>> <yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>
>>>>>>> <mailto:yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>>
>>>>>>> <mailto:yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>
>>>>>>> <mailto:yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi, Dan and all
>>>>>>>
>>>>>>> I have updated
>>>>>>> webrev at:
>>>>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev03/
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev03/>
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev03/>
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev03/>
>>>>>>>
>>>>>>> 1) bug changed
>>>>>>> synopsis
>>>>>>> 8074354: Make
>>>>>>> CreateMinidumpOnCrash a new name and
>>>>>>> available on all platforms
>>>>>>> 2) tests: JPRT,
>>>>>>> vm.runtime.quick.testlist (on
>>>>>>> Windows), jtreg on
>>>>>>> Windows and
>>>>>>> Linux/Unix.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>>
>>>>>>>
>>>>>>> On 3/12/2015 12:59
>>>>>>> PM, Daniel D. Daugherty wrote:
>>>>>>>
>>>>>>> I believe there
>>>>>>> are both JavaTest/JTREG tests in
>>>>>>> the hotspot repo
>>>>>>> and there are
>>>>>>> VM/NSK tests from back when we did
>>>>>>> phone home...
>>>>>>>
>>>>>>> Check with Christian or Misha...
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 3/12/15 1:50
>>>>>>> PM, Yumin Qi wrote:
>>>>>>>
>>>>>>> Thanks, Dan
>>>>>>>
>>>>>>> Will look at and run those tests.
>>>>>>> Are they
>>>>>>> under nsk?
>>>>>>>
>>>>>>> Yumin
>>>>>>>
>>>>>>> On 3/12/2015 12:29 PM, Daniel D.
>>>>>>> Daugherty wrote:
>>>>>>>
>>>>>>> Yumin,
>>>>>>>
>>>>>>> There are some error handler
>>>>>>> tests. You
>>>>>>> should find
>>>>>>> those and make
>>>>>>> sure that you run them on
>>>>>>> Windows since you're
>>>>>>> changing the order
>>>>>>> there...
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 3/12/15 12:42 PM, Yumin Qi
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi, Thomas and all
>>>>>>>
>>>>>>> The second version of
>>>>>>> the change:
>>>>>>> webrev:
>>>>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev02/
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev02/>
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev02/>
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev02/>
>>>>>>>
>>>>>>> In this version:
>>>>>>> 1) switch order on
>>>>>>> Windows to first
>>>>>>> print log
>>>>>>> file as other platform,
>>>>>>> then dump core
>>>>>>> file if
>>>>>>> needed. When VMError
>>>>>>> created after
>>>>>>> crash, siginfo
>>>>>>> and context which are
>>>>>>> ExceptionRecord and
>>>>>>> ContextRecord on Windows
>>>>>>> are recorded,
>>>>>>> mini (or
>>>>>>> full, due to 'or' used so
>>>>>>> type will be
>>>>>>> mini
>>>>>>> anyway) dump creates dump
>>>>>>> file based
>>>>>>> on those two
>>>>>>> as before.
>>>>>>> 2) to make os::abort to
>>>>>>> get above
>>>>>>> two pointers,
>>>>>>> added two more fields to
>>>>>>> the function
>>>>>>> parameters:
>>>>>>> 3) changes for
>>>>>>> test/runtime/ErrorHandling/SecondaryError.java
>>>>>>> ---
>>>>>>> added
>>>>>>> "-XX:-CreateCoredumpOnCrash"
>>>>>>>
>>>>>>> - static void abort(bool
>>>>>>> dump_core = true);
>>>>>>> + static void
>>>>>>> abort(bool dump_core
>>>>>>> = true,
>>>>>>> void *siginfo = NULL, void
>>>>>>> *context =
>>>>>>> NULL);
>>>>>>>
>>>>>>> Tests: JPRT and manually.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>>
>>>>>>> On 3/11/2015 3:47 AM,
>>>>>>> Thomas Stüfe wrote:
>>>>>>>
>>>>>>> Hi Yumin,
>>>>>>>
>>>>>>> There is also
>>>>>>> test/runtime/ErrorHandling/SecondaryErrorTest.java
>>>>>>> - could you please add
>>>>>>> "-XX:-CreateCoredumpOnCrash" ?
>>>>>>> Thank you!
>>>>>>>
>>>>>>> Beyond that, as I wrote
>>>>>>> in the bug
>>>>>>> report
>>>>>>> comments:
>>>>>>>
>>>>>>> "This is also a problem
>>>>>>> on Windows -
>>>>>>> MiniDumpWriteDump() may hang
>>>>>>> infinitly. And on
>>>>>>> Windows this is worse
>>>>>>> than under
>>>>>>> UNIX because
>>>>>>> we create the Dump
>>>>>>> before writing
>>>>>>> the hs-err
>>>>>>> file, so if the Dump
>>>>>>> hangs, we get
>>>>>>> no error
>>>>>>> log. I would like to
>>>>>>> revert the
>>>>>>> order: create
>>>>>>> the minidump after
>>>>>>> writing the
>>>>>>> error log, the
>>>>>>> same way Unix does it.
>>>>>>> We did this
>>>>>>> in our JVM
>>>>>>> (SAP) because for us,
>>>>>>> error logs
>>>>>>> are more
>>>>>>> useful than minidumps. "
>>>>>>>
>>>>>>> So, I would like to see
>>>>>>> os::abort
>>>>>>> on Windows
>>>>>>> call
>>>>>>> MiniDumpWriteDump(), and thus
>>>>>>> the mini
>>>>>>> dump writing moved
>>>>>>> after the error log
>>>>>>> writing. This would
>>>>>>> also make the
>>>>>>> code way
>>>>>>> cleaner because the
>>>>>>> control flow
>>>>>>> would be the
>>>>>>> same on all platforms.
>>>>>>>
>>>>>>> I understand that this
>>>>>>> may be out
>>>>>>> of scope for
>>>>>>> your change, but I
>>>>>>> would like to
>>>>>>> know what
>>>>>>> others think about this.
>>>>>>>
>>>>>>> Kind regards, Thomas
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 11, 2015 at
>>>>>>> 8:02 AM,
>>>>>>> Yumin Qi
>>>>>>> <yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>
>>>>>>> <mailto:yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>>
>>>>>>> <mailto:yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>
>>>>>>> <mailto:yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>>>
>>>>>>> <mailto:yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>
>>>>>>> <mailto:yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>>
>>>>>>> <mailto:yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>
>>>>>>> <mailto:yumin.qi at oracle.com
>>>>>>> <mailto:yumin.qi at oracle.com>>>>> wrote:
>>>>>>>
>>>>>>> Please review:
>>>>>>>
>>>>>>> bug:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8074354
>>>>>>> webrev:
>>>>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev01/
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev01/>
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev01/>
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev01/>
>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev01/>
>>>>>>>
>>>>>>> Summary: Tests
>>>>>>> timed out when
>>>>>>> VM crashes
>>>>>>> and dumping core file
>>>>>>> which in the test
>>>>>>> case is not
>>>>>>> needed. To
>>>>>>> make core not created,
>>>>>>> the fix changed
>>>>>>> CreateMinidumpOnCrash to
>>>>>>> CreateCoredumpOnCrash,
>>>>>>> the former is only
>>>>>>> used on
>>>>>>> Windows and the
>>>>>>> latter for all
>>>>>>> platforms. When VM crashes on
>>>>>>> non Windows,
>>>>>>> core file generated as
>>>>>>> default if OS sets
>>>>>>> core dump
>>>>>>> allowed.
>>>>>>> Default value of
>>>>>>> CreateCoredumpOnCrash set to
>>>>>>> 'true' to
>>>>>>> keep same behavior on non
>>>>>>> Windows platforms,
>>>>>>> but changed
>>>>>>> for Windows
>>>>>>> --- original is false,
>>>>>>> not create minidump
>>>>>>> on
>>>>>>> Windows. With
>>>>>>> CreateCoredumpOnCrash turned
>>>>>>> off, no core file
>>>>>>> will be
>>>>>>> generated.
>>>>>>> CreateMinidumpOnCrash still
>>>>>>> can be used on
>>>>>>> commandline but
>>>>>>> only as
>>>>>>> alias for the new flag.
>>>>>>>
>>>>>>> Tests: JPRT, manual
>>>>>>> tests setting
>>>>>>> CreateMinidumpOnCrash on
>>>>>>> commandline to verify flag
>>>>>>> change as alias.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list