RFR(S): 8074354: tests that intentionally crash the VM can timeout creating a core file
Yumin Qi
yumin.qi at oracle.com
Tue Apr 14 05:13:40 UTC 2015
Dan, Thanks.
Changed to add CloseHandle(dumpFile) before all exit paths.
Please review at same link:
http://cr.openjdk.java.net/~minqi/8074354/webrev08/
Yumin
On 4/13/2015 7:04 PM, Daniel D. Daugherty wrote:
> Yumin,
>
> I would prefer that the handle is properly closed on all the
> exit paths. Not closing the handle might even trigger a future
> Microsoft sanity check; kind of like when Microsoft added checks
> for duplicate close() calls...
>
> Dan
>
>
> On 4/13/15 4:05 PM, Yumin Qi wrote:
>> Dan,
>>
>> On 4/13/2015 2:09 PM, Daniel D. Daugherty wrote:
>>> On 4/13/15 10:58 AM, Yumin Qi wrote:
>>>> Thomas,
>>>>
>>>> Answers embeded below. The new web is just the same link updated
>>>> (08):
>>>>
>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev08/
>>>
>>> src/share/vm/runtime/globals.hpp
>>> No comments.
>>>
>>> src/share/vm/runtime/os.hpp
>>> L720: // ON Posix compatible OS it will
>>> typo: 'ON' -> 'On'
>>>
>>> L724: ... or a short error message, depends
>>> grammar: 'depends' -> 'depending'
>>>
>>> L725: // on checking result.
>>> grammar: 'on checking' -> 'on the checking'
>>>
>>> src/share/vm/runtime/arguments.cpp
>>> No comments.
>>>
>>> src/share/vm/utilities/vmError.hpp
>>> No comments.
>>>
>>> src/share/vm/utilities/vmError.cpp
>>> No comments.
>>>
>>> src/os/aix/vm/os_aix.cpp
>>> src/os/bsd/vm/os_bsd.cpp
>>> src/os/linux/vm/os_linux.cpp
>>> src/os/posix/vm/os_posix.cpp
>>> src/os/solaris/vm/os_solaris.cpp
>>> No comments on the above five files.
>>>
>>> src/os/windows/vm/os_windows.cpp
>>> L1010: jio_snprintf(buffer, buffsz, "Failed to create coredump
>>> file (0x%x).", GetLastError());
>>> Because this is a Win* specific file you can use "minidump"
>>> here instead of "coredump".
>>>
>>> I made a similar comment in the previous round.
>>>
>> Yes, I will change 'coredump' into 'minidump'
>>> L1028: if (!dump_core || !dumpFile) {
>>> The "!dumpFile" part is an implied boolean which we don't
>>> like to use in HotSpot. Perhaps: dumpFile == NULL
>>>
>> Will change to Hotspot style.
>>> L1078: win32::exit_process_or_thread(win32::EPT_PROCESS, 1);
>>> The "CloseHandle(dumpFile)" call is missing.
>>>
>> There is no need to call CloseHandle here explicitly, since we will
>> exit next and all resources will be reclaimed by OS. I remove it at
>> last since think it is not needed here. If you check previous calls to
>> win32::exit_process_or_thread(win32::EPT_PROCESS, 1);
>> I did not place CloseHandle(dumpFile) either.
>> If needed, I think we should put the call before all exit entry.
>>> test/runtime/ErrorHandling/ProblematicFrameTest.java
>>> test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
>>> No comments on the above two files.
>>>
>>> test/runtime/ErrorHandling/SecondaryErrorTest.java
>>> No comments.
>>>
>>> test/runtime/Safepoint/AssertSafepointCheckConsistency1.java
>>> test/runtime/Safepoint/AssertSafepointCheckConsistency2.java
>>> test/runtime/Safepoint/AssertSafepointCheckConsistency3.java
>>> test/runtime/Safepoint/AssertSafepointCheckConsistency4.java
>>> test/runtime/Unsafe/RangeCheck.java
>>> test/runtime/memory/ReadFromNoaccessArea.java
>>> No comments on the above six files.
>> Thanks
>> Yumin
>>
>>>
>>>
>>> Dan
>>>
>>>
>>>>
>>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>> On 4/13/2015 12:30 AM, Thomas Stüfe wrote:
>>>>> Hi Yumin,
>>>>>
>>>>> some small things:
>>>>>
>>>>> os_windows.cpp:1012
>>>>>
>>>>> + if (dumpFile == INVALID_HANDLE_VALUE) {
>>>>> + jio_snprintf(buffer, buffsz, "Failed to create coredump file");
>>>>> + status = false;
>>>>> + }
>>>>>
>>>>> Please print out the error number too:
>>>>> + jio_snprintf(buffer, buffsz, "Failed to create coredump
>>>>> file (0x%x).", ::GetLastError());
>>>>>
>>>>>
>>>> Changed as above. No :: global scope operator needed here.
>>>>>
>>>>> os_windows.cpp:1036
>>>>>
>>>>> + assert(dumpFile != NULL, "dumpFile should not be NULL");
>>>>>
>>>>> Please remove this assert. I would not do any asserts at this
>>>>> stage, we are dying anyway and a recursive assert would just
>>>>> confuse things. Moreover, CreateFile may have failed for "normal"
>>>>> reasons.
>>>>>
>>>> removed.
>>>>> Instead, if dumpFile is NULL, just skip the whole minidump stuff
>>>>> in os::abort; just behave the same way as for os::abort(false):
>>>>>
>>>>> + if (!dump_core || !dumpFile) {
>>>>> + win32::exit_process_or_thread(win32::EPT_PROCESS, 1);
>>>>> }
>>>>>
>>>>>
>>>> changed as expected.
>>>>>
>>>>> os_windows.cpp: 1076
>>>>>
>>>>> I would just skip the whole "FormatMessage" stuff. It is just
>>>>> sugar, but it will allocate buffers, which we may not want at this
>>>>> stage. I think the plain error code from GetLastError() is
>>>>> sufficient here. So I would just:
>>>>>
>>>>> 1076 // Older versions of dbghelp.dll (the one shipped with
>>>>> Win2003 for example) may not support all
>>>>> 1077 // the dump types we really want. If first call fails, lets
>>>>> fall back to just use MiniDumpWithFullMemory then.
>>>>> 1078 if (_MiniDumpWriteDump(hProcess, processId, dumpFile,
>>>>> dumpType, pmei, NULL, NULL) == false &&
>>>>> 1079 _MiniDumpWriteDump(hProcess, processId, dumpFile,
>>>>> (MINIDUMP_TYPE)MiniDumpWithFullMemory, pmei, NULL, NULL) == false) {
>>>>> jio_fprintf(stderr, "Call to MiniDumpWriteDump() failed
>>>>> (Error 0x%x)\n", ::GetLastError());
>>>>> 1089 } else {
>>>>> 1090 ....
>>>>>
>>>>>
>>>> Agree, this is the last step before exit, with output of last error
>>>> code, user can get what happened --- though a little effort to
>>>> research.
>>>>
>>>>> Rest is fine.
>>>>> Thomas
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 13, 2015 at 7:59 AM, Yumin Qi <yumin.qi at oracle.com
>>>>> <mailto:yumin.qi at oracle.com>> wrote:
>>>>>
>>>>> Thomas and Dan,
>>>>>
>>>>> Please review the changes at
>>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev08/
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev08/>
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>> On 4/10/2015 10:14 PM, Yumin Qi wrote:
>>>>>
>>>>> Thomas,
>>>>>
>>>>> Thanks for review again.
>>>>>
>>>>> On 4/10/2015 4:20 AM, Thomas Stüfe wrote:
>>>>>
>>>>> Hi Yumin,
>>>>>
>>>>> Hi Yumin,
>>>>>
>>>>> this looks mostly fine for me. I only was able to test
>>>>> it on Linux, but will test it on other OSes next week.
>>>>>
>>>>> Here two remaining small points:
>>>>>
>>>>> - in os::check_dump_limit, maybe a comment would be
>>>>> helpful, because the contract for this function is a
>>>>> bit difficult to understand. Proposal:
>>>>>
>>>>> "Check or prepare a core dump to be taken at a later
>>>>> point in the same thread in os::abort(). Use the
>>>>> caller provided buffer as a scratch buffer. Output
>>>>> status message via VMError::report_cordedump_status()
>>>>> (on success, the core dump file location, on error, a
>>>>> short error message) - this status message which will
>>>>> be written into the error log."
>>>>>
>>>>> I will take the comments, new comments:
>>>>>
>>>>> // ON Posix compatible OS it will simply check core dump
>>>>> limits while on Windows
>>>>> // it will check if dump file can be created. Check or
>>>>> prepare a core dump to be
>>>>> // taken at a later point in the same thread in
>>>>> os::abort(). Use the caller
>>>>> // provided buffer as a scratch buffer. The status
>>>>> message which will be written
>>>>> // into the error log either is file location or a short
>>>>> error message, depends
>>>>> // on checking result.
>>>>> static void check_dump_limit(char* buffer, size_t
>>>>> bufferSize);
>>>>>
>>>>> - It would make sense to improve the output in
>>>>> VMError.report() in STEP(63) a bit:
>>>>>
>>>>> Problem 1: we use past-tense, which is misleading
>>>>> ("Core dump written", "Failed to write core"). We did
>>>>> not generate the core dump yet, we just plan to do it,
>>>>> and it still may fail for various reasons (e.g. even
>>>>> if the limits are fine, the current directory may be
>>>>> write protected. There is no way to catch all errors
>>>>> beforehand). If the user reads "Core dump written", he
>>>>> assumes the core dump was written successfully. If
>>>>> abort(3) later fails to write the core dump, user will
>>>>> be confused.
>>>>>
>>>>> Will change the message to
>>>>> if (coredump_status) {
>>>>> st->print("Core dump will be written. %s",
>>>>> coredump_message);
>>>>> } else {
>>>>> st->print("No core dump will be written. %s",
>>>>> coredump_message);
>>>>> }
>>>>>
>>>>> as your suggested.
>>>>> Whether the coredump can be successfully created depends
>>>>> on following core file generation procedure. User can
>>>>> trace error messages if failed to create.
>>>>>
>>>>> Problem 2: this is more obscure: on Linux core dumps
>>>>> may not be written to the file system at all but be
>>>>> processed by some other program. On my Ubuntu 14.4
>>>>> machine I see this output:
>>>>>
>>>>> #
>>>>> # Core dump written. Default location: Core dumps may
>>>>> be processed with "/usr/share/apport/apport %p %s %c
>>>>> %P" (or dumping to
>>>>> /shared/projects/openjdk/jdk9-hs-rt/output/core.26261)
>>>>> #
>>>>>
>>>>> This output is a bit jarring.
>>>>>
>>>>> I have not tested on Ubuntu so no comments.
>>>>>
>>>>> I will post a new webrev based on your and Dan's comments.
>>>>> Thanks for your patience.
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>> My suggestion for better texts would be:
>>>>>
>>>>> "Core dump will be written. %s"
>>>>> "No core dump will be written. %s"
>>>>>
>>>>> which hopefully fits all cases better. But I don't
>>>>> know, other suggestions are welcome :-)
>>>>>
>>>>> Otherwise, this looks fine.
>>>>>
>>>>> Thanks for your work!
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>> On Fri, Apr 10, 2015 at 5:18 AM, Yumin Qi
>>>>> <yumin.qi at oracle.com <mailto:yumin.qi at oracle.com>
>>>>> <mailto:yumin.qi at oracle.com
>>>>> <mailto:yumin.qi at oracle.com>>> wrote:
>>>>>
>>>>> Hi, Thomas and Dan
>>>>>
>>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev07/
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>>>
>>>>> On 4/9/2015 7:51 PM, Yumin Qi wrote:
>>>>>
>>>>> Please hold on, I missed the test cases:
>>>>>
>>>>> *test/runtime/Unsafe/RangeCheck.java
>>>>> *
>>>>> *test/runtime/memory/ReadFromNoaccessArea.java
>>>>>
>>>>> Added.
>>>>>
>>>>> Also some file heads need to be updated with
>>>>> this year.
>>>>>
>>>>> Changed
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>>
>>>>> *
>>>>> On 4/9/2015 7:07 PM, Yumin Qi wrote:
>>>>>
>>>>> Hi, Thomas and Dan
>>>>>
>>>>> The same URL is updated with new version.
>>>>>
>>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev07/
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>>>
>>>>> I remove the public access to VMError::out,
>>>>> VMError::coredump_message and the size of
>>>>> VMError::coredump_message.
>>>>> Added private static HANDLE in
>>>>> os_windows.cpp for
>>>>> dumpFile as suggested by Thomas.
>>>>>
>>>>> Tests passes JPRT and manual tests.
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>>
>>>>> On 4/9/2015 1:32 AM, Thomas Stüfe wrote:
>>>>>
>>>>> Hi Yumin,
>>>>>
>>>>> I do not think exposing
>>>>> VMError::coredump_status
>>>>> buffer just for the sake of using it
>>>>> as a scratch
>>>>> buffer in os::abort() is a good idea.
>>>>> It is very
>>>>> confusing and has nothing to do with
>>>>> the purpose of
>>>>> VMerror::coredump_status.
>>>>>
>>>>> Also, VMerror::coredump_status is
>>>>> static, so you do
>>>>> not even gain anything over just using
>>>>> a global or
>>>>> function scope static buffer. Either
>>>>> we are worried
>>>>> about thread safety in os::abort(), or
>>>>> we aren't. If
>>>>> we aren't, just use a local static
>>>>> buffer, it is much
>>>>> clearer and no need to expose
>>>>> VMError::coredump_status
>>>>> like this.
>>>>>
>>>>> Apart from that, see my remarks in
>>>>> yesterdays mail.
>>>>>
>>>>> Cheers, Thomas
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 9, 2015 at 5:16 AM, Yumin Qi
>>>>> <yumin.qi at oracle.com
>>>>> <mailto:yumin.qi at oracle.com>
>>>>> <mailto:yumin.qi at oracle.com <mailto:yumin.qi at oracle.com>>
>>>>> <mailto:yumin.qi at oracle.com
>>>>> <mailto:yumin.qi at oracle.com>
>>>>> <mailto:yumin.qi at oracle.com
>>>>> <mailto:yumin.qi at oracle.com>>>> wrote:
>>>>>
>>>>> Hi, Dan
>>>>>
>>>>> New webrev at
>>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev07/
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>>>
>>>>> To make code clean, add
>>>>> os::abort_internal which
>>>>> is only used
>>>>> for Windows.
>>>>> Since passing
>>>>> VMError::coredump_message and its
>>>>> size as
>>>>> parameters across multiple
>>>>> functions makes
>>>>> function have a bigger
>>>>> signature, I change the way to
>>>>> access them -- add
>>>>> two functions in
>>>>> VMError to access them:
>>>>> coredump_msg_buffer() and
>>>>> coredump_msg_buffer_size(). They
>>>>> are static
>>>>> members, so can save
>>>>> stack space for passing around.
>>>>>
>>>>> in os_windows.cpp, abort(bool,
>>>>> void*, void*):
>>>>> if error encountered, exit with
>>>>> error messages.
>>>>> If no error,
>>>>> VMError::coredump_message contains
>>>>> the dump
>>>>> path/name. I just use them so no
>>>>> need to have
>>>>> extra space for name
>>>>> allocation.
>>>>>
>>>>> Passed JPRT and manual tests.
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>>
>>>>> On 4/7/2015 9:23 AM, Daniel D.
>>>>> Daugherty wrote:
>>>>>
>>>>> On 4/1/15 10:57 AM, Yumin Qi
>>>>> wrote:
>>>>>
>>>>> Hi, Dan and Thomas,
>>>>>
>>>>> I have a new webrev at
>>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev06/
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev06/>
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev06/>
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev06/>
>>>>>
>>>>>
>>>>>
>>>>> src/share/vm/runtime/globals.hpp
>>>>> Sorry, I didn't notice
>>>>> this before:
>>>>>
>>>>> L939: product(bool,
>>>>> CreateCoredumpOnCrash,
>>>>> true, \
>>>>> L940: "Create minidump on
>>>>> VM fatal
>>>>> error") \
>>>>>
>>>>> The "minidump" on L940
>>>>> should change
>>>>> also. Don't know if you
>>>>> want to say "core/mini
>>>>> dump" or
>>>>> something else.
>>>>>
>>>>>
>>>>> src/share/vm/runtime/os.hpp
>>>>> L719: // On
>>>>> Linux/Solaris it will simply
>>>>> check core dump
>>>>> limits while on Windows will
>>>>> do nothing.
>>>>> grammar: 'on Windows
>>>>> will do nothing'
>>>>> -> 'on Windows
>>>>> it will do nothing'
>>>>>
>>>>>
>>>>> src/share/vm/runtime/arguments.cpp
>>>>> No comments.
>>>>>
>>>>> src/share/vm/utilities/vmError.hpp
>>>>> No comments.
>>>>>
>>>>> src/share/vm/utilities/vmError.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/aix/vm/os_aix.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/bsd/vm/os_bsd.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/linux/vm/os_linux.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/posix/vm/os_posix.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/solaris/vm/os_solaris.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/windows/vm/os_windows.cpp
>>>>> L92: jio_snprintf(buffer,
>>>>> bufferSize,
>>>>> "%s\\hs_err_pid%u.mdmp",
>>>>> get_current_directory(NULL, 0),
>>>>> current_process_id());
>>>>> Other non-Windows calls to
>>>>> get_current_directory() do a NULL
>>>>> check before using the
>>>>> return value.
>>>>> The original Windows
>>>>> code that called
>>>>> get_current_directory() did not
>>>>> make this
>>>>> check, but I think
>>>>> it's better to be
>>>>> consistent and safe.
>>>>>
>>>>> L1007: static char
>>>>> buffer[O_BUFLEN];
>>>>> L1008: char
>>>>> filename[FILENAME_MAX];
>>>>> Two more potentially
>>>>> large stack
>>>>> variables here. Will we
>>>>> run into stack
>>>>> overflow on Win* more
>>>>> frequently when what
>>>>> we really wanted was
>>>>> the reason for
>>>>> the abort() call?
>>>>>
>>>>> The reason the
>>>>> original code used
>>>>> 'buffer' to construct the
>>>>> dump file name was to
>>>>> save space. The
>>>>> original code was also
>>>>> careful to not need
>>>>> the file name in
>>>>> any error messages after
>>>>> the point at which the
>>>>> dump file was
>>>>> successfully created.
>>>>>
>>>>> It would be better
>>>>> (more resilient) if
>>>>> you could pass in
>>>>> an already existing
>>>>> buffer to abort()
>>>>> like the original
>>>>> code passed into
>>>>> os::check_or_create_dump(). Yes, that
>>>>> would require visiting
>>>>> all existing
>>>>> calls to abort().
>>>>>
>>>>> L1015: assert(out->is_open(),
>>>>> "defaultStream should be open");
>>>>> Is it possible for
>>>>> os::abort() to be
>>>>> called early enough on
>>>>> Windows such that
>>>>> VMError::out is not
>>>>> actually setup? Yes,
>>>>> this would have to be
>>>>> absolutely
>>>>> catastrophic...
>>>>>
>>>>> See the comment in
>>>>> os_solaris.cpp
>>>>> L1520-1522.
>>>>>
>>>>> L1018: jio_snprintf(buffer,
>>>>> sizeof(buffer), "%s", "Either
>>>>> CreateCoredumpOnCrash is
>>>>> disabled from command"
>>>>> L1019 " line or coredump
>>>>> is not available");
>>>>> When you are using
>>>>> jio_snprintf() as a
>>>>> replacement for:
>>>>>
>>>>> strncpy(buffer, "my string",
>>>>> sizeof(buffer));
>>>>> buffer[sizeof(buffer) - 1] = '\0';
>>>>>
>>>>> you don't need the
>>>>> "%s" format string.
>>>>> So this code would be:
>>>>>
>>>>> jio_snprintf(buffer, sizeof(buffer),
>>>>> "Either
>>>>> CreateCoredumpOnCrash is disabled
>>>>> from "
>>>>> "command line or coredump
>>>>> is not available");
>>>>>
>>>>> Same comment for:
>>>>> L1026, L1039
>>>>>
>>>>> L1020: goto done;
>>>>> The use of 'goto'
>>>>> makes me cringe. You
>>>>> can refactor much
>>>>> of this logic into an
>>>>> abort_internal()
>>>>> function that
>>>>> abort() calls and keep
>>>>> all the
>>>>> original "return" logic
>>>>> to bail out.
>>>>>
>>>>> L1045: // Older versions
>>>>> of dbghelp.h
>>>>> doesn't contain all
>>>>> grammar: 'doesn't' ->
>>>>> 'do not'
>>>>>
>>>>> L1052: cwd =
>>>>> get_current_directory(NULL, 0);
>>>>> L1053: jio_snprintf(filename,
>>>>> sizeof(filename),
>>>>> "%s\\hs_err_pid%u.mdmp", cwd,
>>>>> current_process_id());
>>>>> Other non-Windows calls to
>>>>> get_current_directory() do a NULL
>>>>> check before using the
>>>>> return value.
>>>>> The original Windows
>>>>> code that called
>>>>> get_current_directory() did not
>>>>> make this
>>>>> check, but I think
>>>>> it's better to be
>>>>> consistent and safe.
>>>>>
>>>>> L1092: // well done.
>>>>> Like a overcooked
>>>>> steak? :-) Not sure
>>>>> what you're trying
>>>>> to say here...
>>>>>
>>>>> test/runtime/ErrorHandling/ProblematicFrameTest.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/ErrorHandling/SecondaryErrorTest.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency1.java
>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency2.java
>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency3.java
>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency4.java
>>>>> No comments on these four.
>>>>>
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>> What changes:
>>>>>
>>>>> 1) os_windows.cpp,
>>>>> os::abort(bool ....):
>>>>> a) Since when
>>>>> abort() is called, the
>>>>> last function called
>>>>> before exit process,
>>>>> VMError::log already closed, which
>>>>> is the file logging.
>>>>> defaultStream still
>>>>> available to use, so
>>>>> use it for writing error
>>>>> or success
>>>>> messages. VMError::out is
>>>>> the defaultStream, for
>>>>> using it, added a
>>>>> function out_stream() in VMError.
>>>>> b) Delete the codes
>>>>> which decide to
>>>>> write core based on
>>>>> client/server version with
>>>>> debug. The new
>>>>> code is if
>>>>> CreateCoredumpOnCrash set or dump_core is
>>>>> on, the coredump will
>>>>> be tried no matter which
>>>>> version it is.
>>>>> There is no special
>>>>> reason not to dump core
>>>>> with client
>>>>> version I think. Any concern
>>>>> for this change?
>>>>> c) Use of 'goto' in
>>>>> this function. I
>>>>> tried not to use it,
>>>>> but using it makes code
>>>>> clear. I remember
>>>>> somewhere suggest not
>>>>> using 'goto' in cplusplus.
>>>>>
>>>>> 2) changed comments to
>>>>> make them
>>>>> consistent as indicated by Dan.
>>>>>
>>>>> 3) Added/modified head
>>>>> version for files.
>>>>>
>>>>> Tests: JPRT, manual
>>>>> test on local Windows.
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>>
>>>>> On 3/31/2015 1:25 AM,
>>>>> Thomas Stüfe wrote:
>>>>>
>>>>> Hi Daniel, Yumin,
>>>>>
>>>>> sorry, a lot of Yumins
>>>>> changes were
>>>>> based on suggestions from
>>>>> me, so here more
>>>>> background:
>>>>>
>>>>> Yumin reversed the
>>>>> order of error log
>>>>> writing and core dumping.
>>>>> The reason for that
>>>>> are explained in
>>>>> detail at the start of the
>>>>> mail thread, in short:
>>>>>
>>>>> - on Windows core file
>>>>> dumping may
>>>>> hang - its rare but it
>>>>> happens - preventing
>>>>> error logs.
>>>>> Depending on who you ask,
>>>>> error logs are more
>>>>> important than
>>>>> minidumps, so it makes sense
>>>>> to first write the
>>>>> erorr log.
>>>>> - This also brings
>>>>> windows core paths
>>>>> closer to UNIX code
>>>>> paths. See below.
>>>>>
>>>>> About writing to
>>>>> stderr in os::abort():
>>>>>
>>>>> After this change, calling
>>>>> VmError::report_coredump_status() in
>>>>> os::abort() will not
>>>>> do anything
>>>>> because the error log is
>>>>> already written at
>>>>> that point. I
>>>>> suggested to Yumin writing to
>>>>> stderr instead in
>>>>> os::abort() because
>>>>> that mimicks UNIX
>>>>> behaviour: On UNIX,
>>>>> you call abort(3),
>>>>> which writes a core and
>>>>> aborts. It writes a
>>>>> short message to
>>>>> stderr "Aborted (core
>>>>> dumped)" or just
>>>>> "Aborted".
>>>>>
>>>>> After Yumins change,
>>>>> on Windows
>>>>> os::abort also writes the
>>>>> core,
>>>>> then aborts. It made
>>>>> sense to me that
>>>>> this function should
>>>>> behave the same way as
>>>>> on UNIX: if
>>>>> core writing fails, write to
>>>>> stderr. There is no
>>>>> way otherwise to
>>>>> communicate a core dump
>>>>> writing failure. After
>>>>> writing the
>>>>> core, process stops.
>>>>>
>>>>> ---
>>>>>
>>>>> About the control flows:
>>>>>
>>>>> Before Yumins change:
>>>>>
>>>>> On Windows
>>>>> 1)
>>>>> os::check_or_create_dump(): we
>>>>> wrote the core dump and then
>>>>> information about the
>>>>> dump (file
>>>>> location, success or error)
>>>>> was handed to the misnamed
>>>>> VmError::report_coredump_status(),
>>>>> which just stored that
>>>>> information.
>>>>> 2) VmError::report():
>>>>> Error log was
>>>>> written and in step 63
>>>>> information about the
>>>>> core (already
>>>>> written at that point) was
>>>>> added to error log.
>>>>> 3) os::abort() just
>>>>> did an exit(3)
>>>>>
>>>>> On Unix,
>>>>> 1)
>>>>> os::check_or_create_dump(): checks
>>>>> the limits to guess
>>>>> whether core dumping
>>>>> later will
>>>>> succeed. Again, that
>>>>> information is handed to
>>>>> VmError::report_coredump_status()
>>>>> 2) VmError::report():
>>>>> Error log is
>>>>> written and in step 63,
>>>>> information about the
>>>>> core's probable
>>>>> future location is added
>>>>> to error log. Here,
>>>>> messages use past
>>>>> tense, which is
>>>>> misleading, because
>>>>> the core is not
>>>>> yet written.
>>>>> 3) os::abort() calls
>>>>> abort(3), which
>>>>> stops and writes a core.
>>>>>
>>>>> Yumin pushed core file
>>>>> writing on
>>>>> Windows down to os::abort().
>>>>> This brings the
>>>>> control flow much
>>>>> closer to Unix:
>>>>>
>>>>> (1) call
>>>>> os::check_dump_limit() to
>>>>> check the prerequisites for
>>>>> core file writing and
>>>>> gather a bit
>>>>> information to put in the
>>>>> error log: On Unix,
>>>>> limit checks, on
>>>>> windows, so far nothing
>>>>> much but
>>>>> precalculating the error file
>>>>> name.
>>>>> (2) VmError::report():
>>>>> Error log is
>>>>> written and information
>>>>> about the core's
>>>>> probable location is
>>>>> added to error log.
>>>>> (3) os::abort() is
>>>>> called, which on
>>>>> all platforms:
>>>>> - dumps the core
>>>>> - if failing, writes
>>>>> a one-liner to
>>>>> stderr
>>>>> - stops the process.
>>>>>
>>>>> I think, if one agrees
>>>>> that reversing
>>>>> order of core dumping and
>>>>> error log writing on
>>>>> windows is a good
>>>>> thing, this change looks
>>>>> ok to me. Some
>>>>> improvements could make
>>>>> it clearer:
>>>>>
>>>>> - maybe rename
>>>>> "os::check_dump_limit()" (was my suggestion,
>>>>> sorry) to
>>>>> "os::check_core_prerequisites()" - that
>>>>> leaves room
>>>>> to flesh it out a bit
>>>>> more on Windows,
>>>>> where we do not have
>>>>> UNIX limits.
>>>>> - make the messages in
>>>>> VMError::report() future
>>>>> sense: "Will
>>>>> write core file to
>>>>> location.....".
>>>>> - Make the messages
>>>>> written in
>>>>> os::abort() on Windows
>>>>> clearer,
>>>>> and shorter, and we
>>>>> also do not need
>>>>> the buffer to be static
>>>>> here. Actually, if we
>>>>> want to be as on
>>>>> UNIX, just dumping "core
>>>>> file writing
>>>>> succeeded" or "failed" -
>>>>> maybe with an error
>>>>> number from
>>>>> GetLastError() - will be
>>>>> enough because the file
>>>>> location is already
>>>>> printed in the
>>>>> error log. Like on UNIX.
>>>>>
>>>>> Kind Regards, Thomas
>>>>>
>>>>>
>>>>> On Mon, Mar 30, 2015
>>>>> at 9:51 PM,
>>>>> Daniel D. Daugherty
>>>>>
>>>>> <daniel.daugherty at oracle.com
>>>>> <mailto:daniel.daugherty at oracle.com>
>>>>> <mailto:daniel.daugherty at oracle.com
>>>>> <mailto:daniel.daugherty at oracle.com>>
>>>>> <mailto:daniel.daugherty at oracle.com
>>>>> <mailto:daniel.daugherty at oracle.com>
>>>>> <mailto:daniel.daugherty at oracle.com
>>>>> <mailto:daniel.daugherty at oracle.com>>>> wrote:
>>>>>
>>>>> Re:
>>>>> report_coredump_status() is
>>>>> really
>>>>> record_coredump_status()
>>>>>
>>>>> The
>>>>> report_coredump_status()
>>>>> function is designed to
>>>>> record two
>>>>> things about core
>>>>> dumps for later
>>>>> reporting by the code that
>>>>> generates the
>>>>> hs_err_pid file.
>>>>> That's why the original
>>>>> report_coredump_status() didn't
>>>>> output anything to stderr.
>>>>>
>>>>> By changing the
>>>>> Windows calls to
>>>>> report_coredump_status() into
>>>>> jio_fprintf() calls you have:
>>>>>
>>>>> - put output onto
>>>>> stderr that
>>>>> should go to the
>>>>> hs_err_pid file
>>>>> - made the windows
>>>>> code paths work
>>>>> differently than the
>>>>> non-windows
>>>>> code paths
>>>>>
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>> On 3/30/15 1:28
>>>>> PM, Yumin Qi wrote:
>>>>>
>>>>> Thanks Dan
>>>>>
>>>>> On 3/30/2015
>>>>> 11:25 AM, Daniel
>>>>> D. Daugherty wrote:
>>>>>
>>>>> On 3/25/15
>>>>> 12:11 PM, Yumin
>>>>> Qi wrote:
>>>>>
>>>>> Hi, all
>>>>>
>>>>> I updated the
>>>>> webrev with a new change to
>>>>> test case:
>>>>> test/runtime/Unsafe/RangeCheck.java
>>>>>
>>>>> Add
>>>>> -XX:-CreateCoredumpOnCrash to test, no
>>>>> dump needed on this case.
>>>>>
>>>>> new webrev:
>>>>> http://cr.openjdk.java.net/~minqi/8074354/webrev05/
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev05/>
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev05/>
>>>>> <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev05/>
>>>>>
>>>>>
>>>>> General nit - please
>>>>> update copyright lines to 2015
>>>>> as needed
>>>>>
>>>>> src/share/vm/runtime/globals.hpp
>>>>> No comments.
>>>>>
>>>>> src/share/vm/runtime/os.hpp
>>>>> line 720: // On
>>>>> Windows this will create an
>>>>> actual minidump, on
>>>>> Linux/Solaris it will simply
>>>>> check core dump limits
>>>>> line 721: static
>>>>> void check_dump_limit(char*
>>>>> buffer, size_t bufferSize);
>>>>> The comment on
>>>>> line 720 no longer matches
>>>>> with the
>>>>> function
>>>>> rename from
>>>>> check_or_create_dump() to
>>>>> check_dump_limit().
>>>>>
>>>>> You have this
>>>>> comment in vmError.cpp:
>>>>> 943 // check
>>>>> core dump limits on
>>>>> Linux/Solaris, nothing on
>>>>> Windows
>>>>> 944
>>>>> os::check_dump_limit(buffer,
>>>>> sizeof(buffer));
>>>>>
>>>>> so the two
>>>>> comments are out of sync.
>>>>>
>>>>> Will convert
>>>>> them to be
>>>>> consistent.
>>>>>
>>>>> src/share/vm/runtime/arguments.cpp
>>>>> line 3252: } else if
>>>>> (match_option(option,
>>>>> "-XX:+CreateMinidumpOnCrash", &tail)) {
>>>>> line 3256: } else if
>>>>> (match_option(option,
>>>>> "-XX:-CreateMinidumpOnCrash", &tail)) {
>>>>> These two checks
>>>>> should use the
>>>>> match_option() version
>>>>> without a '&tail'
>>>>> parameter.
>>>>>
>>>>> Will use
>>>>> version w/o tail.
>>>>>
>>>>> src/share/vm/utilities/vmError.cpp
>>>>> line 217: bool
>>>>> VMError::coredump_status =
>>>>> true; // presume we
>>>>> can dump core file first
>>>>> I don't think you
>>>>> should init this to true.
>>>>> It confuses
>>>>> things with
>>>>> VMError::report_coredump_status(). It will
>>>>> also
>>>>> confuse this code:
>>>>>
>>>>> 526 STEP(63,
>>>>> "(printing core file
>>>>> information)")
>>>>> 529 if
>>>>> (coredump_status) {
>>>>> 530
>>>>> st->print("Core dump
>>>>> written. Default
>>>>> location: %s",
>>>>> coredump_message);
>>>>>
>>>>> because
>>>>> coredump_status won't
>>>>> accurately
>>>>> reflect whether
>>>>> the
>>>>> coredump_message field has
>>>>> been set.
>>>>>
>>>>> line 943: // check
>>>>> core dump limits on
>>>>> Linux/Solaris, nothing on
>>>>> Windows
>>>>> This updated
>>>>> comment doesn't match the
>>>>> unmodified
>>>>> comment in os.hpp:
>>>>>
>>>>> 720: // On
>>>>> Windows this will create an
>>>>> actual minidump, on
>>>>> Linux/Solaris it will simply
>>>>> check core dump limits
>>>>>
>>>>> I will
>>>>> consider your comments,
>>>>> then revise with a new
>>>>> version. I
>>>>> remember here there
>>>>> is a case, if status not
>>>>> set, it will
>>>>> output an
>>>>> inconsistent message. Will
>>>>> check
>>>>> again.
>>>>>
>>>>> src/os/aix/vm/os_aix.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/bsd/vm/os_bsd.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/linux/vm/os_linux.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/posix/vm/os_posix.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/solaris/vm/os_solaris.cpp
>>>>> No comments.
>>>>>
>>>>> src/os/windows/vm/os_windows.cpp
>>>>> line 1008: static char
>>>>> buffer[O_BUFLEN];
>>>>> Why switch from a
>>>>> buffer parameter to a
>>>>> static buffer?
>>>>> What happens with
>>>>> parallel calls to
>>>>> abort()? Will the static
>>>>> buffer get stomped
>>>>> by the competing threads?
>>>>>
>>>>> The original
>>>>> buffer is static
>>>>> too. Carrying an buffer
>>>>> for abort
>>>>> seems not right.
>>>>> This buffer in fact only for
>>>>> storing file
>>>>> name (based on
>>>>> pid) here.
>>>>> abort() will
>>>>> dump the core
>>>>> file, should we prevent
>>>>> multi-thread calling to this
>>>>> function? I did not check
>>>>> this part,
>>>>> will check if it is
>>>>> MT-safe.
>>>>>
>>>>>
>>>>> line 1015: // If
>>>>> running on a client version
>>>>> of Windows
>>>>> and user has
>>>>> not explicitly enabled dumping
>>>>> line 1016: if
>>>>> (!os::win32::is_windows_server() &&
>>>>> !CreateCoredumpOnCrash) {
>>>>> The default for
>>>>> CreateCoredumpOnCrash is
>>>>> now 'true'
>>>>> so the
>>>>> comment and logic
>>>>> are no longer correct
>>>>> here. The Windows
>>>>> Client VM will
>>>>> generate minidumps by default.
>>>>>
>>>>> old line 1007:
>>>>> VMError::report_coredump_status("Minidumps
>>>>> are not
>>>>> enabled by default on
>>>>> client versions of Windows",
>>>>> false);
>>>>> new line 1017:
>>>>> jio_fprintf(stderr, "Minidumps
>>>>> are not
>>>>> enabled by default
>>>>> on client versions of
>>>>> Windows.\n");
>>>>> There are a number
>>>>> of places where we
>>>>> change from
>>>>> VMError::report_coredump_status() to
>>>>> jio_fprintf()
>>>>> and I'm not quite
>>>>> seeing why this was done.
>>>>>
>>>>> Update:
>>>>> VMError::report_coredump_status()
>>>>> is
>>>>> misnamed. It doesn't
>>>>> report anything in
>>>>> the sense that it
>>>>> doesn't print anything. It
>>>>> does record two
>>>>> pieces of information about
>>>>> core dumps
>>>>> so maybe
>>>>> it should be
>>>>> VMError::record_coredump_status().
>>>>>
>>>>> line 1100:
>>>>> jio_fprintf(stderr,
>>>>> "%s.\n", buffer);
>>>>> At this point,
>>>>> buffer contains the path
>>>>> to the
>>>>> mini-dump file so
>>>>> the above line simply prints
>>>>> that on stderr. Why?
>>>>>
>>>>> Yes, I see that
>>>>> the old code did something
>>>>> similar:
>>>>>
>>>>> 1090
>>>>> VMError::report_coredump_status(buffer,
>>>>> true);
>>>>>
>>>>> but
>>>>> report_coredump_status() didn't
>>>>> actually print
>>>>> anything. It just
>>>>> squirreled away these in
>>>>> vmError.cpp:
>>>>>
>>>>> 217 bool
>>>>> VMError::coredump_status;
>>>>> 218 char
>>>>> VMError::coredump_message[O_BUFLEN];
>>>>>
>>>>> See comments above
>>>>> about how
>>>>> report_coredump_status()
>>>>> is misnamed.
>>>>>
>>>>> At this point, I'm
>>>>> convinced that all the
>>>>> changes from
>>>>> VMError::report_coredump_status() to
>>>>> jio_fprintf() are
>>>>> a bad idea.
>>>>>
>>>>>
>>>>> Changing to
>>>>> jio_fprintf is because
>>>>> report_coredump_status did not
>>>>> output anything, it is
>>>>> just a logging
>>>>> as you
>>>>> indicated. It is reasonable we
>>>>> change it to
>>>>> record_coredump_status instead. How about
>>>>> add printout to
>>>>> report_coredump_status? So I do not
>>>>> need to use
>>>>> jio_printf here.
>>>>>
>>>>> Other
>>>>> consideration of using
>>>>> jio_fprintf after call
>>>>> shutdown
>>>>> called, the static
>>>>> output stream still exists,
>>>>> but not
>>>>> guaranteed to work as
>>>>> expected.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>> test/runtime/ErrorHandling/ProblematicFrameTest.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/ErrorHandling/SecondaryErrorTest.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency1.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency2.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency3.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency4.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/Unsafe/RangeCheck.java
>>>>> No comments.
>>>>>
>>>>> test/runtime/memory/ReadFromNoaccessArea.java
>>>>> No comments.
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>>
>>>>> On 3/23/2015 11:48 AM,
>>>>> Yumin Qi wrote:
>>>>>
>>>>> Since Thomas is
>>>>> not a reviewer in open jdk,
>>>>> I need two
>>>>> volunteers (at least one
>>>>> "R"eviewer).
>>>>>
>>>>> Dan, since you
>>>>> already reviewed previous
>>>>> version, could you
>>>>> have a look?
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>> On 3/20/2015 3:20
>>>>> PM, Yumin Qi wrote:
>>>>>
>>>>> Thomas,
>>>>>
>>>>> Thanks. Yes,
>>>>> it is webrev04. Also, I
>>>>> have updated
>>>>> webrev04 to add another
>>>>> test case change:
>>>>> test/runtime/memory/ReadFromNoaccessArea.java
>>>>> (Thanks Dan's
>>>>> update)
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>> On 3/20/2015
>>>>> 11:55 AM, Thomas Stüfe wrote:
>>>>>
>>>>>
>>>>> Hi Yumin,
>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list