RFR(S): 8074354: tests that intentionally crash the VM can timeout creating a core file
Yumin Qi
yumin.qi at oracle.com
Mon Apr 13 22:25:25 UTC 2015
Hi, Dan
Same Webrev updated.
http://cr.openjdk.java.net/~minqi/8074354/webrev08/
Thanks
Yumin
On 4/13/2015 3:05 PM, Yumin Qi wrote:
> Dan,
>
> On 4/13/2015 2:09 PM, Daniel D. Daugherty wrote:
>> On 4/13/15 10:58 AM, Yumin Qi wrote:
>>> Thomas,
>>>
>>> Answers embeded below. The new web is just the same link updated
>>> (08):
>>>
>>> http://cr.openjdk.java.net/~minqi/8074354/webrev08/
>>
>> src/share/vm/runtime/globals.hpp
>> No comments.
>>
>> src/share/vm/runtime/os.hpp
>> L720: // ON Posix compatible OS it will
>> typo: 'ON' -> 'On'
>>
>> L724: ... or a short error message, depends
>> grammar: 'depends' -> 'depending'
>>
>> L725: // on checking result.
>> grammar: 'on checking' -> 'on the checking'
>>
>> src/share/vm/runtime/arguments.cpp
>> No comments.
>>
>> src/share/vm/utilities/vmError.hpp
>> No comments.
>>
>> src/share/vm/utilities/vmError.cpp
>> No comments.
>>
>> src/os/aix/vm/os_aix.cpp
>> src/os/bsd/vm/os_bsd.cpp
>> src/os/linux/vm/os_linux.cpp
>> src/os/posix/vm/os_posix.cpp
>> src/os/solaris/vm/os_solaris.cpp
>> No comments on the above five files.
>>
>> src/os/windows/vm/os_windows.cpp
>> L1010: jio_snprintf(buffer, buffsz, "Failed to create coredump
>> file (0x%x).", GetLastError());
>> Because this is a Win* specific file you can use "minidump"
>> here instead of "coredump".
>>
>> I made a similar comment in the previous round.
>>
> Yes, I will change 'coredump' into 'minidump'
>> L1028: if (!dump_core || !dumpFile) {
>> The "!dumpFile" part is an implied boolean which we don't
>> like to use in HotSpot. Perhaps: dumpFile == NULL
>>
> Will change to Hotspot style.
>> L1078: win32::exit_process_or_thread(win32::EPT_PROCESS, 1);
>> The "CloseHandle(dumpFile)" call is missing.
>>
> There is no need to call CloseHandle here explicitly, since we will
> exit next and all resources will be reclaimed by OS. I remove it at
> last since think it is not needed here. If you check previous calls to
> win32::exit_process_or_thread(win32::EPT_PROCESS, 1);
> I did not place CloseHandle(dumpFile) either.
> If needed, I think we should put the call before all exit entry.
>> test/runtime/ErrorHandling/ProblematicFrameTest.java
>> test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
>> No comments on the above two files.
>>
>> test/runtime/ErrorHandling/SecondaryErrorTest.java
>> No comments.
>>
>> test/runtime/Safepoint/AssertSafepointCheckConsistency1.java
>> test/runtime/Safepoint/AssertSafepointCheckConsistency2.java
>> test/runtime/Safepoint/AssertSafepointCheckConsistency3.java
>> test/runtime/Safepoint/AssertSafepointCheckConsistency4.java
>> test/runtime/Unsafe/RangeCheck.java
>> test/runtime/memory/ReadFromNoaccessArea.java
>> No comments on the above six files.
> Thanks
> Yumin
>
>>
>>
>> Dan
>>
>>
>>>
>>>
>>> Thanks
>>> Yumin
>>>
>>> On 4/13/2015 12:30 AM, Thomas Stüfe wrote:
>>>> Hi Yumin,
>>>>
>>>> some small things:
>>>>
>>>> os_windows.cpp:1012
>>>>
>>>> + if (dumpFile == INVALID_HANDLE_VALUE) {
>>>> + jio_snprintf(buffer, buffsz, "Failed to create coredump file");
>>>> + status = false;
>>>> + }
>>>>
>>>> Please print out the error number too:
>>>> + jio_snprintf(buffer, buffsz, "Failed to create coredump file
>>>> (0x%x).", ::GetLastError());
>>>>
>>>>
>>> Changed as above. No :: global scope operator needed here.
>>>>
>>>> os_windows.cpp:1036
>>>>
>>>> + assert(dumpFile != NULL, "dumpFile should not be NULL");
>>>>
>>>> Please remove this assert. I would not do any asserts at this
>>>> stage, we are dying anyway and a recursive assert would just
>>>> confuse things. Moreover, CreateFile may have failed for "normal"
>>>> reasons.
>>>>
>>> removed.
>>>> Instead, if dumpFile is NULL, just skip the whole minidump stuff in
>>>> os::abort; just behave the same way as for os::abort(false):
>>>>
>>>> + if (!dump_core || !dumpFile) {
>>>> + win32::exit_process_or_thread(win32::EPT_PROCESS, 1);
>>>> }
>>>>
>>>>
>>> changed as expected.
>>>>
>>>> os_windows.cpp: 1076
>>>>
>>>> I would just skip the whole "FormatMessage" stuff. It is just
>>>> sugar, but it will allocate buffers, which we may not want at this
>>>> stage. I think the plain error code from GetLastError() is
>>>> sufficient here. So I would just:
>>>>
>>>> 1076 // Older versions of dbghelp.dll (the one shipped with
>>>> Win2003 for example) may not support all
>>>> 1077 // the dump types we really want. If first call fails, lets
>>>> fall back to just use MiniDumpWithFullMemory then.
>>>> 1078 if (_MiniDumpWriteDump(hProcess, processId, dumpFile,
>>>> dumpType, pmei, NULL, NULL) == false &&
>>>> 1079 _MiniDumpWriteDump(hProcess, processId, dumpFile,
>>>> (MINIDUMP_TYPE)MiniDumpWithFullMemory, pmei, NULL, NULL) == false) {
>>>> jio_fprintf(stderr, "Call to MiniDumpWriteDump() failed
>>>> (Error 0x%x)\n", ::GetLastError());
>>>> 1089 } else {
>>>> 1090 ....
>>>>
>>>>
>>> Agree, this is the last step before exit, with output of last error
>>> code, user can get what happened --- though a little effort to
>>> research.
>>>
>>>> Rest is fine.
>>>> Thomas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 13, 2015 at 7:59 AM, Yumin Qi <yumin.qi at oracle.com
>>>> <mailto:yumin.qi at oracle.com>> wrote:
>>>>
>>>> Thomas and Dan,
>>>>
>>>> Please review the changes at
>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev08/
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev08/>
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>> On 4/10/2015 10:14 PM, Yumin Qi wrote:
>>>>
>>>> Thomas,
>>>>
>>>> Thanks for review again.
>>>>
>>>> On 4/10/2015 4:20 AM, Thomas Stüfe wrote:
>>>>
>>>> Hi Yumin,
>>>>
>>>> Hi Yumin,
>>>>
>>>> this looks mostly fine for me. I only was able to test
>>>> it on Linux, but will test it on other OSes next week.
>>>>
>>>> Here two remaining small points:
>>>>
>>>> - in os::check_dump_limit, maybe a comment would be
>>>> helpful, because the contract for this function is a bit
>>>> difficult to understand. Proposal:
>>>>
>>>> "Check or prepare a core dump to be taken at a later
>>>> point in the same thread in os::abort(). Use the caller
>>>> provided buffer as a scratch buffer. Output status
>>>> message via VMError::report_cordedump_status() (on
>>>> success, the core dump file location, on error, a short
>>>> error message) - this status message which will be
>>>> written into the error log."
>>>>
>>>> I will take the comments, new comments:
>>>>
>>>> // ON Posix compatible OS it will simply check core dump
>>>> limits while on Windows
>>>> // it will check if dump file can be created. Check or
>>>> prepare a core dump to be
>>>> // taken at a later point in the same thread in
>>>> os::abort(). Use the caller
>>>> // provided buffer as a scratch buffer. The status message
>>>> which will be written
>>>> // into the error log either is file location or a short
>>>> error message, depends
>>>> // on checking result.
>>>> static void check_dump_limit(char* buffer, size_t
>>>> bufferSize);
>>>>
>>>> - It would make sense to improve the output in
>>>> VMError.report() in STEP(63) a bit:
>>>>
>>>> Problem 1: we use past-tense, which is misleading ("Core
>>>> dump written", "Failed to write core"). We did not
>>>> generate the core dump yet, we just plan to do it, and
>>>> it still may fail for various reasons (e.g. even if the
>>>> limits are fine, the current directory may be write
>>>> protected. There is no way to catch all errors
>>>> beforehand). If the user reads "Core dump written", he
>>>> assumes the core dump was written successfully. If
>>>> abort(3) later fails to write the core dump, user will
>>>> be confused.
>>>>
>>>> Will change the message to
>>>> if (coredump_status) {
>>>> st->print("Core dump will be written. %s",
>>>> coredump_message);
>>>> } else {
>>>> st->print("No core dump will be written. %s",
>>>> coredump_message);
>>>> }
>>>>
>>>> as your suggested.
>>>> Whether the coredump can be successfully created depends on
>>>> following core file generation procedure. User can trace
>>>> error messages if failed to create.
>>>>
>>>> Problem 2: this is more obscure: on Linux core dumps may
>>>> not be written to the file system at all but be
>>>> processed by some other program. On my Ubuntu 14.4
>>>> machine I see this output:
>>>>
>>>> #
>>>> # Core dump written. Default location: Core dumps may be
>>>> processed with "/usr/share/apport/apport %p %s %c %P"
>>>> (or dumping to
>>>> /shared/projects/openjdk/jdk9-hs-rt/output/core.26261)
>>>> #
>>>>
>>>> This output is a bit jarring.
>>>>
>>>> I have not tested on Ubuntu so no comments.
>>>>
>>>> I will post a new webrev based on your and Dan's comments.
>>>> Thanks for your patience.
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>> My suggestion for better texts would be:
>>>>
>>>> "Core dump will be written. %s"
>>>> "No core dump will be written. %s"
>>>>
>>>> which hopefully fits all cases better. But I don't know,
>>>> other suggestions are welcome :-)
>>>>
>>>> Otherwise, this looks fine.
>>>>
>>>> Thanks for your work!
>>>>
>>>> Thomas
>>>>
>>>>
>>>> On Fri, Apr 10, 2015 at 5:18 AM, Yumin Qi
>>>> <yumin.qi at oracle.com <mailto:yumin.qi at oracle.com>
>>>> <mailto:yumin.qi at oracle.com
>>>> <mailto:yumin.qi at oracle.com>>> wrote:
>>>>
>>>> Hi, Thomas and Dan
>>>>
>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev07/
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>>
>>>> On 4/9/2015 7:51 PM, Yumin Qi wrote:
>>>>
>>>> Please hold on, I missed the test cases:
>>>>
>>>> *test/runtime/Unsafe/RangeCheck.java
>>>> *
>>>> *test/runtime/memory/ReadFromNoaccessArea.java
>>>>
>>>> Added.
>>>>
>>>> Also some file heads need to be updated with
>>>> this year.
>>>>
>>>> Changed
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>>
>>>> *
>>>> On 4/9/2015 7:07 PM, Yumin Qi wrote:
>>>>
>>>> Hi, Thomas and Dan
>>>>
>>>> The same URL is updated with new version.
>>>>
>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev07/
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>>
>>>> I remove the public access to VMError::out,
>>>> VMError::coredump_message and the size of
>>>> VMError::coredump_message.
>>>> Added private static HANDLE in
>>>> os_windows.cpp for
>>>> dumpFile as suggested by Thomas.
>>>>
>>>> Tests passes JPRT and manual tests.
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>>
>>>> On 4/9/2015 1:32 AM, Thomas Stüfe wrote:
>>>>
>>>> Hi Yumin,
>>>>
>>>> I do not think exposing
>>>> VMError::coredump_status
>>>> buffer just for the sake of using it as
>>>> a scratch
>>>> buffer in os::abort() is a good idea. It
>>>> is very
>>>> confusing and has nothing to do with the
>>>> purpose of
>>>> VMerror::coredump_status.
>>>>
>>>> Also, VMerror::coredump_status is
>>>> static, so you do
>>>> not even gain anything over just using a
>>>> global or
>>>> function scope static buffer. Either we
>>>> are worried
>>>> about thread safety in os::abort(), or
>>>> we aren't. If
>>>> we aren't, just use a local static
>>>> buffer, it is much
>>>> clearer and no need to expose
>>>> VMError::coredump_status
>>>> like this.
>>>>
>>>> Apart from that, see my remarks in
>>>> yesterdays mail.
>>>>
>>>> Cheers, Thomas
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Apr 9, 2015 at 5:16 AM, Yumin Qi
>>>> <yumin.qi at oracle.com
>>>> <mailto:yumin.qi at oracle.com> <mailto:yumin.qi at oracle.com
>>>> <mailto:yumin.qi at oracle.com>>
>>>> <mailto:yumin.qi at oracle.com
>>>> <mailto:yumin.qi at oracle.com>
>>>> <mailto:yumin.qi at oracle.com
>>>> <mailto:yumin.qi at oracle.com>>>> wrote:
>>>>
>>>> Hi, Dan
>>>>
>>>> New webrev at
>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev07/
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>>
>>>> To make code clean, add
>>>> os::abort_internal which
>>>> is only used
>>>> for Windows.
>>>> Since passing
>>>> VMError::coredump_message and its
>>>> size as
>>>> parameters across multiple functions
>>>> makes
>>>> function have a bigger
>>>> signature, I change the way to
>>>> access them -- add
>>>> two functions in
>>>> VMError to access them:
>>>> coredump_msg_buffer() and
>>>> coredump_msg_buffer_size(). They are
>>>> static
>>>> members, so can save
>>>> stack space for passing around.
>>>>
>>>> in os_windows.cpp, abort(bool,
>>>> void*, void*):
>>>> if error encountered, exit with
>>>> error messages.
>>>> If no error,
>>>> VMError::coredump_message contains
>>>> the dump
>>>> path/name. I just use them so no
>>>> need to have
>>>> extra space for name
>>>> allocation.
>>>>
>>>> Passed JPRT and manual tests.
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>>
>>>> On 4/7/2015 9:23 AM, Daniel D.
>>>> Daugherty wrote:
>>>>
>>>> On 4/1/15 10:57 AM, Yumin Qi
>>>> wrote:
>>>>
>>>> Hi, Dan and Thomas,
>>>>
>>>> I have a new webrev at
>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev06/
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev06/>
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev06/>
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev06/>
>>>>
>>>>
>>>>
>>>> src/share/vm/runtime/globals.hpp
>>>> Sorry, I didn't notice this
>>>> before:
>>>>
>>>> L939: product(bool,
>>>> CreateCoredumpOnCrash,
>>>> true, \
>>>> L940: "Create
>>>> minidump on VM fatal
>>>> error") \
>>>>
>>>> The "minidump" on L940
>>>> should change
>>>> also. Don't know if you
>>>> want to say "core/mini
>>>> dump" or
>>>> something else.
>>>>
>>>>
>>>> src/share/vm/runtime/os.hpp
>>>> L719: // On Linux/Solaris
>>>> it will simply
>>>> check core dump
>>>> limits while on Windows will do
>>>> nothing.
>>>> grammar: 'on Windows
>>>> will do nothing'
>>>> -> 'on Windows it
>>>> will do nothing'
>>>>
>>>>
>>>> src/share/vm/runtime/arguments.cpp
>>>> No comments.
>>>>
>>>> src/share/vm/utilities/vmError.hpp
>>>> No comments.
>>>>
>>>> src/share/vm/utilities/vmError.cpp
>>>> No comments.
>>>>
>>>> src/os/aix/vm/os_aix.cpp
>>>> No comments.
>>>>
>>>> src/os/bsd/vm/os_bsd.cpp
>>>> No comments.
>>>>
>>>> src/os/linux/vm/os_linux.cpp
>>>> No comments.
>>>>
>>>> src/os/posix/vm/os_posix.cpp
>>>> No comments.
>>>>
>>>> src/os/solaris/vm/os_solaris.cpp
>>>> No comments.
>>>>
>>>> src/os/windows/vm/os_windows.cpp
>>>> L92: jio_snprintf(buffer,
>>>> bufferSize,
>>>> "%s\\hs_err_pid%u.mdmp",
>>>> get_current_directory(NULL, 0),
>>>> current_process_id());
>>>> Other non-Windows calls to
>>>> get_current_directory() do a NULL
>>>> check before using the
>>>> return value.
>>>> The original Windows
>>>> code that called
>>>> get_current_directory() did not make
>>>> this
>>>> check, but I think it's
>>>> better to be
>>>> consistent and safe.
>>>>
>>>> L1007: static char
>>>> buffer[O_BUFLEN];
>>>> L1008: char
>>>> filename[FILENAME_MAX];
>>>> Two more potentially
>>>> large stack
>>>> variables here. Will we
>>>> run into stack overflow
>>>> on Win* more
>>>> frequently when what
>>>> we really wanted was the
>>>> reason for
>>>> the abort() call?
>>>>
>>>> The reason the original
>>>> code used
>>>> 'buffer' to construct the
>>>> dump file name was to
>>>> save space. The
>>>> original code was also
>>>> careful to not need the
>>>> file name in
>>>> any error messages after
>>>> the point at which the
>>>> dump file was
>>>> successfully created.
>>>>
>>>> It would be better (more
>>>> resilient) if
>>>> you could pass in
>>>> an already existing
>>>> buffer to abort()
>>>> like the original
>>>> code passed into
>>>> os::check_or_create_dump(). Yes, that
>>>> would require visiting
>>>> all existing
>>>> calls to abort().
>>>>
>>>> L1015: assert(out->is_open(),
>>>> "defaultStream should be open");
>>>> Is it possible for
>>>> os::abort() to be
>>>> called early enough on
>>>> Windows such that
>>>> VMError::out is not
>>>> actually setup? Yes,
>>>> this would have to be
>>>> absolutely
>>>> catastrophic...
>>>>
>>>> See the comment in
>>>> os_solaris.cpp
>>>> L1520-1522.
>>>>
>>>> L1018: jio_snprintf(buffer,
>>>> sizeof(buffer), "%s", "Either
>>>> CreateCoredumpOnCrash is
>>>> disabled from command"
>>>> L1019 "
>>>> line or coredump
>>>> is not available");
>>>> When you are using
>>>> jio_snprintf() as a
>>>> replacement for:
>>>>
>>>> strncpy(buffer, "my
>>>> string",
>>>> sizeof(buffer));
>>>> buffer[sizeof(buffer) - 1] = '\0';
>>>>
>>>> you don't need the "%s"
>>>> format string.
>>>> So this code would be:
>>>>
>>>> jio_snprintf(buffer,
>>>> sizeof(buffer),
>>>> "Either
>>>> CreateCoredumpOnCrash is disabled
>>>> from "
>>>> "command line or coredump
>>>> is not available");
>>>>
>>>> Same comment for: L1026,
>>>> L1039
>>>>
>>>> L1020: goto done;
>>>> The use of 'goto' makes
>>>> me cringe. You
>>>> can refactor much
>>>> of this logic into an
>>>> abort_internal()
>>>> function that
>>>> abort() calls and keep
>>>> all the
>>>> original "return" logic
>>>> to bail out.
>>>>
>>>> L1045: // Older versions of
>>>> dbghelp.h
>>>> doesn't contain all
>>>> grammar: 'doesn't' ->
>>>> 'do not'
>>>>
>>>> L1052: cwd =
>>>> get_current_directory(NULL, 0);
>>>> L1053: jio_snprintf(filename,
>>>> sizeof(filename),
>>>> "%s\\hs_err_pid%u.mdmp", cwd,
>>>> current_process_id());
>>>> Other non-Windows calls to
>>>> get_current_directory() do a NULL
>>>> check before using the
>>>> return value.
>>>> The original Windows
>>>> code that called
>>>> get_current_directory() did not make
>>>> this
>>>> check, but I think it's
>>>> better to be
>>>> consistent and safe.
>>>>
>>>> L1092: // well done.
>>>> Like a overcooked steak?
>>>> :-) Not sure
>>>> what you're trying
>>>> to say here...
>>>>
>>>> test/runtime/ErrorHandling/ProblematicFrameTest.java
>>>> No comments.
>>>>
>>>> test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
>>>> No comments.
>>>>
>>>> test/runtime/ErrorHandling/SecondaryErrorTest.java
>>>> No comments.
>>>>
>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency1.java
>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency2.java
>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency3.java
>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency4.java
>>>> No comments on these four.
>>>>
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>> What changes:
>>>>
>>>> 1) os_windows.cpp,
>>>> os::abort(bool ....):
>>>> a) Since when abort()
>>>> is called, the
>>>> last function called before
>>>> exit process,
>>>> VMError::log already closed, which
>>>> is the file logging.
>>>> defaultStream still
>>>> available to use, so
>>>> use it for writing error or
>>>> success
>>>> messages. VMError::out is
>>>> the defaultStream, for using
>>>> it, added a
>>>> function out_stream() in
>>>> VMError.
>>>> b) Delete the codes
>>>> which decide to
>>>> write core based on
>>>> client/server version with
>>>> debug. The new
>>>> code is if
>>>> CreateCoredumpOnCrash set or
>>>> dump_core is
>>>> on, the coredump will
>>>> be tried no matter which
>>>> version it is.
>>>> There is no special
>>>> reason not to dump core with
>>>> client
>>>> version I think. Any concern
>>>> for this change?
>>>> c) Use of 'goto' in
>>>> this function. I
>>>> tried not to use it,
>>>> but using it makes code
>>>> clear. I remember
>>>> somewhere suggest not
>>>> using 'goto' in cplusplus.
>>>>
>>>> 2) changed comments to
>>>> make them
>>>> consistent as indicated by Dan.
>>>>
>>>> 3) Added/modified head
>>>> version for files.
>>>>
>>>> Tests: JPRT, manual test
>>>> on local Windows.
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>>
>>>> On 3/31/2015 1:25 AM, Thomas
>>>> Stüfe wrote:
>>>>
>>>> Hi Daniel, Yumin,
>>>>
>>>> sorry, a lot of Yumins
>>>> changes were
>>>> based on suggestions from
>>>> me, so here more
>>>> background:
>>>>
>>>> Yumin reversed the order
>>>> of error log
>>>> writing and core dumping.
>>>> The reason for that are
>>>> explained in
>>>> detail at the start of the
>>>> mail thread, in short:
>>>>
>>>> - on Windows core file
>>>> dumping may
>>>> hang - its rare but it
>>>> happens - preventing
>>>> error logs.
>>>> Depending on who you ask,
>>>> error logs are more
>>>> important than
>>>> minidumps, so it makes sense
>>>> to first write the erorr
>>>> log.
>>>> - This also brings
>>>> windows core paths
>>>> closer to UNIX code
>>>> paths. See below.
>>>>
>>>> About writing to stderr
>>>> in os::abort():
>>>>
>>>> After this change, calling
>>>> VmError::report_coredump_status() in
>>>> os::abort() will not do
>>>> anything
>>>> because the error log is
>>>> already written at that
>>>> point. I
>>>> suggested to Yumin writing to
>>>> stderr instead in
>>>> os::abort() because
>>>> that mimicks UNIX
>>>> behaviour: On UNIX, you
>>>> call abort(3),
>>>> which writes a core and
>>>> aborts. It writes a
>>>> short message to
>>>> stderr "Aborted (core
>>>> dumped)" or just
>>>> "Aborted".
>>>>
>>>> After Yumins change, on
>>>> Windows
>>>> os::abort also writes the
>>>> core,
>>>> then aborts. It made
>>>> sense to me that
>>>> this function should
>>>> behave the same way as
>>>> on UNIX: if
>>>> core writing fails, write to
>>>> stderr. There is no way
>>>> otherwise to
>>>> communicate a core dump
>>>> writing failure. After
>>>> writing the
>>>> core, process stops.
>>>>
>>>> ---
>>>>
>>>> About the control flows:
>>>>
>>>> Before Yumins change:
>>>>
>>>> On Windows
>>>> 1)
>>>> os::check_or_create_dump(): we
>>>> wrote the core dump and then
>>>> information about the
>>>> dump (file
>>>> location, success or error)
>>>> was handed to the misnamed
>>>> VmError::report_coredump_status(),
>>>> which just stored that
>>>> information.
>>>> 2) VmError::report():
>>>> Error log was
>>>> written and in step 63
>>>> information about the
>>>> core (already
>>>> written at that point) was
>>>> added to error log.
>>>> 3) os::abort() just did
>>>> an exit(3)
>>>>
>>>> On Unix,
>>>> 1)
>>>> os::check_or_create_dump(): checks
>>>> the limits to guess
>>>> whether core dumping
>>>> later will
>>>> succeed. Again, that
>>>> information is handed to
>>>> VmError::report_coredump_status()
>>>> 2) VmError::report():
>>>> Error log is
>>>> written and in step 63,
>>>> information about the
>>>> core's probable
>>>> future location is added
>>>> to error log. Here,
>>>> messages use past
>>>> tense, which is
>>>> misleading, because the
>>>> core is not
>>>> yet written.
>>>> 3) os::abort() calls
>>>> abort(3), which
>>>> stops and writes a core.
>>>>
>>>> Yumin pushed core file
>>>> writing on
>>>> Windows down to os::abort().
>>>> This brings the control
>>>> flow much
>>>> closer to Unix:
>>>>
>>>> (1) call
>>>> os::check_dump_limit() to
>>>> check the prerequisites for
>>>> core file writing and
>>>> gather a bit
>>>> information to put in the
>>>> error log: On Unix,
>>>> limit checks, on
>>>> windows, so far nothing
>>>> much but precalculating
>>>> the error file
>>>> name.
>>>> (2) VmError::report():
>>>> Error log is
>>>> written and information
>>>> about the core's
>>>> probable location is
>>>> added to error log.
>>>> (3) os::abort() is
>>>> called, which on
>>>> all platforms:
>>>> - dumps the core
>>>> - if failing, writes a
>>>> one-liner to
>>>> stderr
>>>> - stops the process.
>>>>
>>>> I think, if one agrees
>>>> that reversing
>>>> order of core dumping and
>>>> error log writing on
>>>> windows is a good
>>>> thing, this change looks
>>>> ok to me. Some
>>>> improvements could make
>>>> it clearer:
>>>>
>>>> - maybe rename
>>>> "os::check_dump_limit()"
>>>> (was my suggestion,
>>>> sorry) to
>>>> "os::check_core_prerequisites()" - that
>>>> leaves room
>>>> to flesh it out a bit
>>>> more on Windows,
>>>> where we do not have
>>>> UNIX limits.
>>>> - make the messages in
>>>> VMError::report() future
>>>> sense: "Will
>>>> write core file to
>>>> location.....".
>>>> - Make the messages
>>>> written in
>>>> os::abort() on Windows
>>>> clearer,
>>>> and shorter, and we also
>>>> do not need
>>>> the buffer to be static
>>>> here. Actually, if we
>>>> want to be as on
>>>> UNIX, just dumping "core
>>>> file writing succeeded"
>>>> or "failed" -
>>>> maybe with an error
>>>> number from
>>>> GetLastError() - will be
>>>> enough because the file
>>>> location is already
>>>> printed in the
>>>> error log. Like on UNIX.
>>>>
>>>> Kind Regards, Thomas
>>>>
>>>>
>>>> On Mon, Mar 30, 2015 at
>>>> 9:51 PM,
>>>> Daniel D. Daugherty
>>>> <daniel.daugherty at oracle.com
>>>> <mailto:daniel.daugherty at oracle.com>
>>>> <mailto:daniel.daugherty at oracle.com
>>>> <mailto:daniel.daugherty at oracle.com>>
>>>> <mailto:daniel.daugherty at oracle.com
>>>> <mailto:daniel.daugherty at oracle.com>
>>>> <mailto:daniel.daugherty at oracle.com
>>>> <mailto:daniel.daugherty at oracle.com>>>> wrote:
>>>>
>>>> Re:
>>>> report_coredump_status() is
>>>> really
>>>> record_coredump_status()
>>>>
>>>> The
>>>> report_coredump_status()
>>>> function is designed to
>>>> record two
>>>> things about core
>>>> dumps for later
>>>> reporting by the code that
>>>> generates the
>>>> hs_err_pid file.
>>>> That's why the original
>>>> report_coredump_status() didn't
>>>> output anything to stderr.
>>>>
>>>> By changing the
>>>> Windows calls to
>>>> report_coredump_status() into
>>>> jio_fprintf() calls
>>>> you have:
>>>>
>>>> - put output onto
>>>> stderr that
>>>> should go to the hs_err_pid
>>>> file
>>>> - made the windows
>>>> code paths work
>>>> differently than the
>>>> non-windows
>>>> code paths
>>>>
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>> On 3/30/15 1:28 PM,
>>>> Yumin Qi wrote:
>>>>
>>>> Thanks Dan
>>>>
>>>> On 3/30/2015
>>>> 11:25 AM, Daniel
>>>> D. Daugherty wrote:
>>>>
>>>> On 3/25/15
>>>> 12:11 PM, Yumin
>>>> Qi wrote:
>>>>
>>>> Hi, all
>>>>
>>>> I updated the
>>>> webrev with a new change to
>>>> test case:
>>>> test/runtime/Unsafe/RangeCheck.java
>>>>
>>>> Add
>>>> -XX:-CreateCoredumpOnCrash to test, no
>>>> dump
>>>> needed on this case.
>>>>
>>>> new webrev:
>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev05/
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev05/>
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev05/>
>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev05/>
>>>>
>>>>
>>>> General nit
>>>> - please
>>>> update copyright lines to 2015
>>>> as needed
>>>>
>>>> src/share/vm/runtime/globals.hpp
>>>> No comments.
>>>>
>>>> src/share/vm/runtime/os.hpp
>>>> line
>>>> 720: // On
>>>> Windows this will create an
>>>> actual
>>>> minidump, on
>>>> Linux/Solaris it will simply
>>>> check core
>>>> dump limits
>>>> line
>>>> 721: static
>>>> void check_dump_limit(char*
>>>> buffer,
>>>> size_t bufferSize);
>>>> The comment on
>>>> line 720 no longer matches
>>>> with the
>>>> function
>>>> rename from
>>>> check_or_create_dump() to
>>>> check_dump_limit().
>>>>
>>>> You have this
>>>> comment in vmError.cpp:
>>>> 943 // check
>>>> core dump limits on
>>>> Linux/Solaris, nothing on
>>>> Windows
>>>> 944
>>>> os::check_dump_limit(buffer,
>>>> sizeof(buffer));
>>>>
>>>> so the two
>>>> comments are out of sync.
>>>>
>>>> Will convert
>>>> them to be
>>>> consistent.
>>>>
>>>> src/share/vm/runtime/arguments.cpp
>>>> line
>>>> 3252: } else if
>>>> (match_option(option,
>>>> "-XX:+CreateMinidumpOnCrash", &tail)) {
>>>> line
>>>> 3256: } else if
>>>> (match_option(option,
>>>> "-XX:-CreateMinidumpOnCrash", &tail)) {
>>>> These two checks
>>>> should use the
>>>> match_option() version
>>>> without a '&tail'
>>>> parameter.
>>>>
>>>> Will use version
>>>> w/o tail.
>>>>
>>>> src/share/vm/utilities/vmError.cpp
>>>> line
>>>> 217: bool
>>>> VMError::coredump_status =
>>>> true; //
>>>> presume we
>>>> can dump core file first
>>>> I don't think you
>>>> should init this to true.
>>>> It confuses
>>>> things with
>>>> VMError::report_coredump_status(). It will
>>>> also
>>>> confuse this code:
>>>>
>>>> 526 STEP(63,
>>>> "(printing core file
>>>> information)")
>>>> 529 if
>>>> (coredump_status) {
>>>> 530
>>>> st->print("Core dump
>>>> written. Default
>>>> location: %s",
>>>> coredump_message);
>>>>
>>>> because
>>>> coredump_status won't
>>>> accurately
>>>> reflect whether
>>>> the
>>>> coredump_message field has
>>>> been set.
>>>>
>>>> line
>>>> 943: // check
>>>> core dump limits on
>>>> Linux/Solaris, nothing on
>>>> Windows
>>>> This updated
>>>> comment doesn't match the
>>>> unmodified
>>>> comment in os.hpp:
>>>>
>>>> 720: // On
>>>> Windows this will create an
>>>> actual
>>>> minidump, on
>>>> Linux/Solaris it will simply
>>>> check core
>>>> dump limits
>>>>
>>>> I will consider
>>>> your comments,
>>>> then revise with a new
>>>> version. I
>>>> remember here there
>>>> is a case, if status not
>>>> set, it will
>>>> output an
>>>> inconsistent message. Will
>>>> check
>>>> again.
>>>>
>>>> src/os/aix/vm/os_aix.cpp
>>>> No comments.
>>>>
>>>> src/os/bsd/vm/os_bsd.cpp
>>>> No comments.
>>>>
>>>> src/os/linux/vm/os_linux.cpp
>>>> No comments.
>>>>
>>>> src/os/posix/vm/os_posix.cpp
>>>> No comments.
>>>>
>>>> src/os/solaris/vm/os_solaris.cpp
>>>> No comments.
>>>>
>>>> src/os/windows/vm/os_windows.cpp
>>>> line
>>>> 1008: static char
>>>> buffer[O_BUFLEN];
>>>> Why switch from a
>>>> buffer parameter to a
>>>> static buffer?
>>>> What happens with
>>>> parallel calls to
>>>> abort()?
>>>> Will the static
>>>> buffer get stomped
>>>> by the competing threads?
>>>>
>>>> The original
>>>> buffer is static
>>>> too. Carrying an buffer
>>>> for abort seems
>>>> not right.
>>>> This buffer in fact only for
>>>> storing file
>>>> name (based on
>>>> pid) here.
>>>> abort() will
>>>> dump the core
>>>> file, should we prevent
>>>> multi-thread
>>>> calling to this
>>>> function? I did not check
>>>> this part, will
>>>> check if it is
>>>> MT-safe.
>>>>
>>>>
>>>> line
>>>> 1015: // If
>>>> running on a client version
>>>> of Windows
>>>> and user has
>>>> not explicitly enabled dumping
>>>> line
>>>> 1016: if
>>>> (!os::win32::is_windows_server() &&
>>>> !CreateCoredumpOnCrash) {
>>>> The default for
>>>> CreateCoredumpOnCrash is
>>>> now 'true'
>>>> so the
>>>> comment and logic
>>>> are no longer correct
>>>> here. The
>>>> Windows
>>>> Client VM will
>>>> generate minidumps by default.
>>>>
>>>> old line
>>>> 1007:
>>>> VMError::report_coredump_status("Minidumps
>>>> are not
>>>> enabled by
>>>> default on
>>>> client versions of Windows",
>>>> false);
>>>> new line
>>>> 1017:
>>>> jio_fprintf(stderr, "Minidumps
>>>> are not
>>>> enabled by default
>>>> on client versions of
>>>> Windows.\n");
>>>> There are a number
>>>> of places where we
>>>> change from
>>>> VMError::report_coredump_status() to
>>>> jio_fprintf()
>>>> and I'm not quite
>>>> seeing why this was done.
>>>>
>>>> Update:
>>>> VMError::report_coredump_status()
>>>> is misnamed.
>>>> It doesn't
>>>> report anything in
>>>> the sense that it
>>>> doesn't
>>>> print anything. It
>>>> does record two
>>>> pieces of information about
>>>> core dumps
>>>> so maybe
>>>> it should be
>>>> VMError::record_coredump_status().
>>>>
>>>> line 1100:
>>>> jio_fprintf(stderr, "%s.\n",
>>>> buffer);
>>>> At this point,
>>>> buffer contains the path to
>>>> the
>>>> mini-dump file so
>>>> the above line simply prints
>>>> that on stderr. Why?
>>>>
>>>> Yes, I see that
>>>> the old code did something
>>>> similar:
>>>>
>>>> 1090
>>>> VMError::report_coredump_status(buffer,
>>>> true);
>>>>
>>>> but
>>>> report_coredump_status() didn't
>>>> actually print
>>>> anything. It just
>>>> squirreled away these in
>>>> vmError.cpp:
>>>>
>>>> 217 bool
>>>> VMError::coredump_status;
>>>> 218 char
>>>> VMError::coredump_message[O_BUFLEN];
>>>>
>>>> See comments above
>>>> about how
>>>> report_coredump_status()
>>>> is misnamed.
>>>>
>>>> At this
>>>> point, I'm
>>>> convinced that all the
>>>> changes from
>>>> VMError::report_coredump_status() to
>>>> jio_fprintf() are
>>>> a bad idea.
>>>>
>>>>
>>>> Changing to
>>>> jio_fprintf is because
>>>> report_coredump_status did not
>>>> output anything, it is
>>>> just a logging
>>>> as you
>>>> indicated. It is reasonable we
>>>> change it to
>>>> record_coredump_status
>>>> instead. How about
>>>> add printout to
>>>> report_coredump_status? So I
>>>> do not
>>>> need to use
>>>> jio_printf here.
>>>>
>>>> Other
>>>> consideration of using
>>>> jio_fprintf after call
>>>> shutdown called,
>>>> the static
>>>> output stream still exists,
>>>> but not
>>>> guaranteed to work as
>>>> expected.
>>>>
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>> test/runtime/ErrorHandling/ProblematicFrameTest.java
>>>> No comments.
>>>>
>>>> test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
>>>> No comments.
>>>>
>>>> test/runtime/ErrorHandling/SecondaryErrorTest.java
>>>> No comments.
>>>>
>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency1.java
>>>> No comments.
>>>>
>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency2.java
>>>> No comments.
>>>>
>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency3.java
>>>> No comments.
>>>>
>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency4.java
>>>> No comments.
>>>>
>>>> test/runtime/Unsafe/RangeCheck.java
>>>> No comments.
>>>>
>>>> test/runtime/memory/ReadFromNoaccessArea.java
>>>> No comments.
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>>
>>>> On
>>>> 3/23/2015 11:48 AM,
>>>> Yumin Qi wrote:
>>>>
>>>> Since Thomas is
>>>> not a reviewer in open jdk,
>>>> I need two
>>>> volunteers (at least one
>>>> "R"eviewer).
>>>>
>>>> Dan, since you
>>>> already reviewed previous
>>>> version, could you
>>>> have a look?
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>> On 3/20/2015 3:20
>>>> PM, Yumin Qi wrote:
>>>>
>>>> Thomas,
>>>>
>>>> Thanks. Yes,
>>>> it is webrev04. Also, I
>>>> have updated
>>>> webrev04 to add another
>>>> test case change:
>>>> test/runtime/memory/ReadFromNoaccessArea.java
>>>> (Thanks Dan's
>>>> update)
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>> On 3/20/2015
>>>> 11:55 AM, Thomas Stüfe wrote:
>>>>
>>>>
>>>> Hi Yumin,
>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list