RFR(S): 8074354: tests that intentionally crash the VM can timeout creating a core file

Yumin Qi yumin.qi at oracle.com
Mon Apr 13 22:05:40 UTC 2015


Dan,

On 4/13/2015 2:09 PM, Daniel D. Daugherty wrote:
> On 4/13/15 10:58 AM, Yumin Qi wrote:
>> Thomas,
>>
>>   Answers embeded below. The new web is just the same link updated (08):
>>
>> http://cr.openjdk.java.net/~minqi/8074354/webrev08/
>
> src/share/vm/runtime/globals.hpp
>     No comments.
>
> src/share/vm/runtime/os.hpp
>     L720: // ON Posix compatible OS it will
>         typo: 'ON' -> 'On'
>
>     L724: ... or a short error message, depends
>         grammar: 'depends' -> 'depending'
>
>     L725: // on checking result.
>         grammar: 'on checking' -> 'on the checking'
>
> src/share/vm/runtime/arguments.cpp
>     No comments.
>
> src/share/vm/utilities/vmError.hpp
>     No comments.
>
> src/share/vm/utilities/vmError.cpp
>     No comments.
>
> src/os/aix/vm/os_aix.cpp
> src/os/bsd/vm/os_bsd.cpp
> src/os/linux/vm/os_linux.cpp
> src/os/posix/vm/os_posix.cpp
> src/os/solaris/vm/os_solaris.cpp
>     No comments on the above five files.
>
> src/os/windows/vm/os_windows.cpp
>     L1010: jio_snprintf(buffer, buffsz, "Failed to create coredump 
> file (0x%x).", GetLastError());
>         Because this is a Win* specific file you can use "minidump"
>         here instead of "coredump".
>
>         I made a similar comment in the previous round.
>
Yes, I will change 'coredump' into 'minidump'
>     L1028: if (!dump_core || !dumpFile) {
>         The "!dumpFile" part is an implied boolean which we don't
>         like to use in HotSpot. Perhaps: dumpFile == NULL
>
Will change to Hotspot style.
>     L1078: win32::exit_process_or_thread(win32::EPT_PROCESS, 1);
>         The "CloseHandle(dumpFile)" call is missing.
>
There is no need to call CloseHandle here explicitly, since we will exit 
next and all resources will be reclaimed by OS. I remove it at last 
since think it is not needed here. If you check previous calls to
   win32::exit_process_or_thread(win32::EPT_PROCESS, 1);
   I did not place CloseHandle(dumpFile) either.
   If needed, I think we should put the call before all exit entry.
> test/runtime/ErrorHandling/ProblematicFrameTest.java
> test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
>     No comments on the above two files.
>
> test/runtime/ErrorHandling/SecondaryErrorTest.java
>     No comments.
>
> test/runtime/Safepoint/AssertSafepointCheckConsistency1.java
> test/runtime/Safepoint/AssertSafepointCheckConsistency2.java
> test/runtime/Safepoint/AssertSafepointCheckConsistency3.java
> test/runtime/Safepoint/AssertSafepointCheckConsistency4.java
> test/runtime/Unsafe/RangeCheck.java
> test/runtime/memory/ReadFromNoaccessArea.java
>     No comments on the above six files.
Thanks
Yumin

>
>
> Dan
>
>
>>
>>
>> Thanks
>> Yumin
>>
>> On 4/13/2015 12:30 AM, Thomas Stüfe wrote:
>>> Hi Yumin,
>>>
>>> some small things:
>>>
>>> os_windows.cpp:1012
>>>
>>> +    if (dumpFile == INVALID_HANDLE_VALUE) {
>>> +      jio_snprintf(buffer, buffsz, "Failed to create coredump file");
>>> +      status = false;
>>> +    }
>>>
>>> Please print out the error number too:
>>> +      jio_snprintf(buffer, buffsz, "Failed to create coredump file 
>>> (0x%x).", ::GetLastError());
>>>
>>>
>> Changed as above. No :: global scope operator needed here.
>>>
>>> os_windows.cpp:1036
>>>
>>> +  assert(dumpFile != NULL, "dumpFile should not be NULL");
>>>
>>> Please remove this assert. I would not do any asserts at this stage, 
>>> we are dying anyway and a recursive assert would just confuse 
>>> things. Moreover, CreateFile may have failed for "normal" reasons.
>>>
>> removed.
>>> Instead, if dumpFile is NULL, just skip the whole minidump stuff in 
>>> os::abort; just behave the same way as for os::abort(false):
>>>
>>> +  if (!dump_core || !dumpFile) {
>>> +    win32::exit_process_or_thread(win32::EPT_PROCESS, 1);
>>>    }
>>>
>>>
>> changed as expected.
>>>
>>> os_windows.cpp: 1076
>>>
>>> I would just skip the whole "FormatMessage" stuff. It is just sugar, 
>>> but it will allocate buffers, which we may not want at this stage. I 
>>> think the plain error code from GetLastError() is sufficient here. 
>>> So I would just:
>>>
>>> 1076   // Older versions of dbghelp.dll (the one shipped with 
>>> Win2003 for example) may not support all
>>> 1077   // the dump types we really want. If first call fails, lets 
>>> fall back to just use MiniDumpWithFullMemory then.
>>> 1078   if (_MiniDumpWriteDump(hProcess, processId, dumpFile, 
>>> dumpType, pmei, NULL, NULL) == false &&
>>> 1079       _MiniDumpWriteDump(hProcess, processId, dumpFile, 
>>> (MINIDUMP_TYPE)MiniDumpWithFullMemory, pmei, NULL, NULL) == false) {
>>>          jio_fprintf(stderr, "Call to MiniDumpWriteDump() failed 
>>> (Error 0x%x)\n", ::GetLastError());
>>> 1089   } else {
>>> 1090    ....
>>>
>>>
>> Agree, this is the last step before exit, with output of last error 
>> code, user can get what happened --- though a little effort to research.
>>
>>> Rest is fine.
>>> Thomas
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Apr 13, 2015 at 7:59 AM, Yumin Qi <yumin.qi at oracle.com 
>>> <mailto:yumin.qi at oracle.com>> wrote:
>>>
>>>     Thomas and Dan,
>>>
>>>       Please review the changes at
>>>     http://cr.openjdk.java.net/~minqi/8074354/webrev08/
>>>     <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev08/>
>>>
>>>     Thanks
>>>     Yumin
>>>
>>>     On 4/10/2015 10:14 PM, Yumin Qi wrote:
>>>
>>>         Thomas,
>>>
>>>         Thanks for review again.
>>>
>>>         On 4/10/2015 4:20 AM, Thomas Stüfe wrote:
>>>
>>>             Hi Yumin,
>>>
>>>             Hi Yumin,
>>>
>>>             this looks mostly fine for me. I only was able to test
>>>             it on Linux, but will test it on other OSes next week.
>>>
>>>             Here two remaining small points:
>>>
>>>             - in os::check_dump_limit, maybe a comment would be
>>>             helpful, because the contract for this function is a bit
>>>             difficult to understand. Proposal:
>>>
>>>             "Check or prepare a core dump to be taken at a later
>>>             point in the same thread in os::abort(). Use the caller
>>>             provided buffer as a scratch buffer. Output status
>>>             message via VMError::report_cordedump_status() (on
>>>             success, the core dump file location, on error, a short
>>>             error message) - this status message which will be
>>>             written into the error log."
>>>
>>>         I will take the comments, new comments:
>>>
>>>           // ON Posix compatible OS it will simply check core dump
>>>         limits while on Windows
>>>           // it will check if dump file can be created. Check or
>>>         prepare a core dump to be
>>>           // taken at a later point in the same thread in
>>>         os::abort(). Use the caller
>>>           // provided buffer as a scratch buffer. The status message
>>>         which will be written
>>>           // into the error log either is file location or a short
>>>         error message, depends
>>>           // on checking result.
>>>           static void check_dump_limit(char* buffer, size_t bufferSize);
>>>
>>>             - It would make sense to improve the output in
>>>             VMError.report() in STEP(63) a bit:
>>>
>>>             Problem 1: we use past-tense, which is misleading ("Core
>>>             dump written", "Failed to write core"). We did not
>>>             generate the core dump yet, we just plan to do it, and
>>>             it still may fail for various reasons (e.g. even if the
>>>             limits are fine, the current directory may be write
>>>             protected. There is no way to catch all errors
>>>             beforehand). If the user reads "Core dump written", he
>>>             assumes the core dump was written successfully. If
>>>             abort(3) later fails to write the core dump, user will
>>>             be confused.
>>>
>>>         Will change the message to
>>>               if (coredump_status) {
>>>                 st->print("Core dump will be written. %s",
>>>         coredump_message);
>>>               } else {
>>>                 st->print("No core dump will be written. %s",
>>>         coredump_message);
>>>               }
>>>
>>>         as your suggested.
>>>         Whether the coredump can be successfully created depends on
>>>         following core file generation procedure. User can trace
>>>         error messages if failed to create.
>>>
>>>             Problem 2: this is more obscure: on Linux core dumps may
>>>             not be written to the file system at all but be
>>>             processed by some other program. On my Ubuntu 14.4
>>>             machine I see this output:
>>>
>>>             #
>>>             # Core dump written. Default location: Core dumps may be
>>>             processed with "/usr/share/apport/apport %p %s %c %P"
>>>             (or dumping to
>>>             /shared/projects/openjdk/jdk9-hs-rt/output/core.26261)
>>>             #
>>>
>>>             This output is a bit jarring.
>>>
>>>         I have not tested on Ubuntu so no comments.
>>>
>>>         I will post a new webrev based on your and Dan's comments.
>>>         Thanks for your patience.
>>>
>>>         Thanks
>>>         Yumin
>>>
>>>             My suggestion for better texts would be:
>>>
>>>             "Core dump will be written. %s"
>>>             "No core dump will be written. %s"
>>>
>>>             which hopefully fits all cases better. But I don't know,
>>>             other suggestions are welcome :-)
>>>
>>>             Otherwise, this looks fine.
>>>
>>>             Thanks for your work!
>>>
>>>             Thomas
>>>
>>>
>>>             On Fri, Apr 10, 2015 at 5:18 AM, Yumin Qi
>>>             <yumin.qi at oracle.com <mailto:yumin.qi at oracle.com>
>>>             <mailto:yumin.qi at oracle.com
>>>             <mailto:yumin.qi at oracle.com>>> wrote:
>>>
>>>                 Hi, Thomas and Dan
>>>
>>>             http://cr.openjdk.java.net/~minqi/8074354/webrev07/
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>
>>>                 On 4/9/2015 7:51 PM, Yumin Qi wrote:
>>>
>>>                     Please hold on, I missed the test cases:
>>>
>>>                     *test/runtime/Unsafe/RangeCheck.java
>>>                     *
>>>             *test/runtime/memory/ReadFromNoaccessArea.java
>>>
>>>                 Added.
>>>
>>>                     Also some file heads need to be updated with
>>>             this year.
>>>
>>>                 Changed
>>>
>>>                 Thanks
>>>                 Yumin
>>>
>>>                     Thanks
>>>                     Yumin
>>>
>>>
>>>                     *
>>>                     On 4/9/2015 7:07 PM, Yumin Qi wrote:
>>>
>>>                         Hi, Thomas and Dan
>>>
>>>                           The same URL is updated with new version.
>>>
>>>             http://cr.openjdk.java.net/~minqi/8074354/webrev07/
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>
>>>                           I remove the public access to VMError::out,
>>>                         VMError::coredump_message and the size of
>>>                         VMError::coredump_message.
>>>                           Added private static HANDLE in
>>>             os_windows.cpp for
>>>                         dumpFile as suggested by Thomas.
>>>
>>>                           Tests passes JPRT and manual tests.
>>>
>>>                         Thanks
>>>                         Yumin
>>>
>>>
>>>                         On 4/9/2015 1:32 AM, Thomas Stüfe wrote:
>>>
>>>                             Hi Yumin,
>>>
>>>                             I do not think exposing
>>>             VMError::coredump_status
>>>                             buffer just for the sake of using it as
>>>             a scratch
>>>                             buffer in os::abort() is a good idea. It
>>>             is very
>>>                             confusing and has nothing to do with the
>>>             purpose of
>>>                             VMerror::coredump_status.
>>>
>>>                             Also, VMerror::coredump_status is
>>>             static, so you do
>>>                             not even gain anything over just using a
>>>             global or
>>>                             function scope static buffer. Either we
>>>             are worried
>>>                             about thread safety in os::abort(), or
>>>             we aren't. If
>>>                             we aren't, just use a local static
>>>             buffer, it is much
>>>                             clearer and no need to expose
>>>             VMError::coredump_status
>>>                             like this.
>>>
>>>                             Apart from that, see my remarks in
>>>             yesterdays mail.
>>>
>>>                             Cheers, Thomas
>>>
>>>
>>>
>>>
>>>                             On Thu, Apr 9, 2015 at 5:16 AM, Yumin Qi
>>>                             <yumin.qi at oracle.com
>>>             <mailto:yumin.qi at oracle.com> <mailto:yumin.qi at oracle.com
>>>             <mailto:yumin.qi at oracle.com>>
>>>                             <mailto:yumin.qi at oracle.com
>>>             <mailto:yumin.qi at oracle.com>
>>>                             <mailto:yumin.qi at oracle.com
>>>             <mailto:yumin.qi at oracle.com>>>> wrote:
>>>
>>>                                 Hi, Dan
>>>
>>>                                   New webrev at
>>>             http://cr.openjdk.java.net/~minqi/8074354/webrev07/
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev07/>
>>>
>>>                                   To make code clean, add
>>>             os::abort_internal which
>>>                             is only used
>>>                                 for Windows.
>>>                                   Since passing
>>>             VMError::coredump_message and its
>>>                             size as
>>>                                 parameters across multiple functions
>>>             makes
>>>                             function have a bigger
>>>                                 signature, I change the way to
>>>             access them -- add
>>>                             two functions in
>>>                                 VMError to access them:
>>>             coredump_msg_buffer() and
>>>                                 coredump_msg_buffer_size(). They are
>>>             static
>>>                             members, so can save
>>>                                 stack space for passing around.
>>>
>>>                                   in os_windows.cpp, abort(bool,
>>>             void*, void*):
>>>                                   if error encountered, exit with
>>>             error messages.
>>>                                   If no error,
>>>             VMError::coredump_message contains
>>>                             the dump
>>>                                 path/name. I just use them so no
>>>             need to have
>>>                             extra space for name
>>>                                 allocation.
>>>
>>>                                   Passed JPRT and manual tests.
>>>
>>>                                   Thanks
>>>                                   Yumin
>>>
>>>
>>>                                 On 4/7/2015 9:23 AM, Daniel D.
>>>             Daugherty wrote:
>>>
>>>                                     On 4/1/15 10:57 AM, Yumin Qi wrote:
>>>
>>>                                       Hi, Dan and Thomas,
>>>
>>>                                           I have a new webrev at
>>>             http://cr.openjdk.java.net/~minqi/8074354/webrev06/
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev06/>
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev06/>
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev06/>
>>>
>>>
>>>
>>>             src/share/vm/runtime/globals.hpp
>>>                                         Sorry, I didn't notice this
>>>             before:
>>>
>>>                                         L939:   product(bool,
>>>             CreateCoredumpOnCrash,
>>>                                     true,         \
>>>                                         L940:           "Create
>>>             minidump on VM fatal
>>>                                     error")         \
>>>
>>>                                             The "minidump" on L940
>>>             should change
>>>                                 also. Don't know if you
>>>                                             want to say "core/mini
>>>             dump" or
>>>                                 something else.
>>>
>>>
>>>                                     src/share/vm/runtime/os.hpp
>>>                                         L719:   // On Linux/Solaris
>>>             it will simply
>>>                                 check core dump
>>>                                     limits while on Windows will do
>>>             nothing.
>>>                                             grammar: 'on Windows
>>>             will do nothing'
>>>                                                   -> 'on Windows it
>>>             will do nothing'
>>>
>>>
>>>             src/share/vm/runtime/arguments.cpp
>>>                                         No comments.
>>>
>>>             src/share/vm/utilities/vmError.hpp
>>>                                         No comments.
>>>
>>>             src/share/vm/utilities/vmError.cpp
>>>                                         No comments.
>>>
>>>                                     src/os/aix/vm/os_aix.cpp
>>>                                         No comments.
>>>
>>>                                     src/os/bsd/vm/os_bsd.cpp
>>>                                         No comments.
>>>
>>>                                     src/os/linux/vm/os_linux.cpp
>>>                                         No comments.
>>>
>>>                                     src/os/posix/vm/os_posix.cpp
>>>                                         No comments.
>>>
>>>             src/os/solaris/vm/os_solaris.cpp
>>>                                         No comments.
>>>
>>>             src/os/windows/vm/os_windows.cpp
>>>                                         L92: jio_snprintf(buffer,
>>>             bufferSize,
>>>                                     "%s\\hs_err_pid%u.mdmp",
>>>                                 get_current_directory(NULL, 0),
>>>                                     current_process_id());
>>>                                             Other non-Windows calls to
>>>                                 get_current_directory() do a NULL
>>>                                             check before using the
>>>             return value.
>>>                                 The original Windows
>>>                                             code that called
>>>                                 get_current_directory() did not make
>>>             this
>>>                                             check, but I think it's
>>>             better to be
>>>                                 consistent and safe.
>>>
>>>                                         L1007:   static char
>>>             buffer[O_BUFLEN];
>>>                                         L1008:   char
>>>             filename[FILENAME_MAX];
>>>                                             Two more potentially
>>>             large stack
>>>                                 variables here. Will we
>>>                                             run into stack overflow
>>>             on Win* more
>>>                                 frequently when what
>>>                                             we really wanted was the
>>>             reason for
>>>                                 the abort() call?
>>>
>>>                                             The reason the original
>>>             code used
>>>                                 'buffer' to construct the
>>>                                             dump file name was to
>>>             save space. The
>>>                                 original code was also
>>>                                             careful to not need the
>>>             file name in
>>>                                 any error messages after
>>>                                             the point at which the
>>>             dump file was
>>>                                 successfully created.
>>>
>>>                                             It would be better (more
>>>             resilient) if
>>>                                 you could pass in
>>>                                             an already existing
>>>             buffer to abort()
>>>                                 like the original
>>>                                             code passed into
>>>                                 os::check_or_create_dump(). Yes, that
>>>                                             would require visiting
>>>             all existing
>>>                                 calls to abort().
>>>
>>>                                         L1015:  assert(out->is_open(),
>>>                                 "defaultStream should be open");
>>>                                             Is it possible for
>>>             os::abort() to be
>>>                                 called early enough on
>>>                                             Windows such that
>>>             VMError::out is not
>>>                                 actually setup? Yes,
>>>                                             this would have to be
>>>             absolutely
>>>                                 catastrophic...
>>>
>>>                                             See the comment in
>>>             os_solaris.cpp
>>>                                 L1520-1522.
>>>
>>>                                         L1018: jio_snprintf(buffer,
>>>                                 sizeof(buffer), "%s", "Either
>>>                                     CreateCoredumpOnCrash is
>>>             disabled from command"
>>>                                         L1019                  "
>>>             line or coredump
>>>                                 is not available");
>>>                                             When you are using
>>>             jio_snprintf() as a
>>>                                 replacement for:
>>>
>>>                                                 strncpy(buffer, "my
>>>             string",
>>>                                 sizeof(buffer));
>>>             buffer[sizeof(buffer) - 1] = '\0';
>>>
>>>                                             you don't need the "%s"
>>>             format string.
>>>                                 So this code would be:
>>>
>>>                                             jio_snprintf(buffer,
>>>             sizeof(buffer),
>>>                                                          "Either
>>>                                 CreateCoredumpOnCrash is disabled from "
>>>              "command line or coredump
>>>                                 is not available");
>>>
>>>                                             Same comment for: L1026,
>>>             L1039
>>>
>>>                                         L1020: goto done;
>>>                                             The use of 'goto' makes
>>>             me cringe. You
>>>                                 can refactor much
>>>                                             of this logic into an
>>>             abort_internal()
>>>                                 function that
>>>                                             abort() calls and keep
>>>             all the
>>>                                 original "return" logic
>>>                                             to bail out.
>>>
>>>                                         L1045: // Older versions of
>>>             dbghelp.h
>>>                                 doesn't contain all
>>>                                             grammar: 'doesn't' ->
>>>             'do not'
>>>
>>>                                         L1052:   cwd =
>>>             get_current_directory(NULL, 0);
>>>                                         L1053: jio_snprintf(filename,
>>>                                 sizeof(filename),
>>>                                     "%s\\hs_err_pid%u.mdmp", cwd,
>>>                                 current_process_id());
>>>                                             Other non-Windows calls to
>>>                                 get_current_directory() do a NULL
>>>                                             check before using the
>>>             return value.
>>>                                 The original Windows
>>>                                             code that called
>>>                                 get_current_directory() did not make
>>>             this
>>>                                             check, but I think it's
>>>             better to be
>>>                                 consistent and safe.
>>>
>>>                                         L1092:   // well done.
>>>                                             Like a overcooked steak?
>>>             :-) Not sure
>>>                                 what you're trying
>>>                                             to say here...
>>>
>>>             test/runtime/ErrorHandling/ProblematicFrameTest.java
>>>                                         No comments.
>>>
>>>             test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
>>>                                         No comments.
>>>
>>>             test/runtime/ErrorHandling/SecondaryErrorTest.java
>>>                                         No comments.
>>>
>>>             test/runtime/Safepoint/AssertSafepointCheckConsistency1.java
>>>             test/runtime/Safepoint/AssertSafepointCheckConsistency2.java
>>>             test/runtime/Safepoint/AssertSafepointCheckConsistency3.java
>>>             test/runtime/Safepoint/AssertSafepointCheckConsistency4.java
>>>                                         No comments on these four.
>>>
>>>
>>>                                     Dan
>>>
>>>
>>>
>>>                                           What changes:
>>>
>>>                                            1) os_windows.cpp,
>>>             os::abort(bool ....):
>>>                                               a) Since when abort()
>>>             is called, the
>>>                                     last function     called before
>>>             exit process,
>>>                                     VMError::log already closed, which
>>>                                         is the file logging.
>>>             defaultStream still
>>>                                     available to use, so
>>>                                         use it for writing error or
>>>             success
>>>                                     messages. VMError::out is
>>>                                         the defaultStream, for using
>>>             it, added a
>>>                                     function out_stream()     in
>>>             VMError.
>>>                                               b) Delete the codes
>>>             which decide to
>>>                                     write core based on
>>>                                         client/server version with
>>>             debug. The new
>>>                                     code is if
>>>                                         CreateCoredumpOnCrash set or
>>>             dump_core is
>>>                                     on, the coredump will
>>>                                         be tried no matter which
>>>             version it is.
>>>                                     There is no special
>>>                                         reason not to dump core with
>>>             client
>>>                                     version I think. Any concern
>>>                                         for this change?
>>>                                               c) Use of 'goto' in
>>>             this function. I
>>>                                     tried not to use it,
>>>                                         but using it makes code
>>>             clear. I remember
>>>                                     somewhere suggest not
>>>                                         using 'goto' in cplusplus.
>>>
>>>                                             2) changed comments to
>>>             make them
>>>                                     consistent as indicated by Dan.
>>>
>>>                                             3) Added/modified head
>>>             version for files.
>>>
>>>                                             Tests: JPRT, manual test
>>>             on local Windows.
>>>
>>>                                         Thanks
>>>                                         Yumin
>>>
>>>
>>>                                         On 3/31/2015 1:25 AM, Thomas
>>>             Stüfe wrote:
>>>
>>>                                             Hi Daniel, Yumin,
>>>
>>>                                             sorry, a lot of Yumins
>>>             changes were
>>>                                         based on suggestions from
>>>                                             me, so here more background:
>>>
>>>                                             Yumin reversed the order
>>>             of error log
>>>                                         writing and core dumping.
>>>                                             The reason for that are
>>>             explained in
>>>                                         detail at the start of the
>>>                                             mail thread, in short:
>>>
>>>                                             - on Windows core file
>>>             dumping may
>>>                                         hang - its rare but it
>>>                                             happens - preventing
>>>             error logs.
>>>                                         Depending on who you ask,
>>>                                             error logs are more
>>>             important than
>>>                                         minidumps, so it makes sense
>>>                                             to first write the erorr
>>>             log.
>>>                                             - This also brings
>>>             windows core paths
>>>                                         closer to UNIX code
>>>                                             paths. See below.
>>>
>>>                                             About writing to stderr
>>>             in os::abort():
>>>
>>>                                             After this change, calling
>>>             VmError::report_coredump_status() in
>>>                                             os::abort() will not do
>>>             anything
>>>                                         because the error log is
>>>                                             already written at that
>>>             point. I
>>>                                         suggested to Yumin writing to
>>>                                             stderr instead in
>>>             os::abort() because
>>>                                         that mimicks UNIX
>>>                                             behaviour: On UNIX, you
>>>             call abort(3),
>>>                                         which writes a core and
>>>                                             aborts. It writes a
>>>             short message to
>>>                                         stderr "Aborted (core
>>>                                             dumped)" or just "Aborted".
>>>
>>>                                             After Yumins change, on
>>>             Windows
>>>                                         os::abort also writes the core,
>>>                                             then aborts. It made
>>>             sense to me that
>>>                                         this function should
>>>                                             behave the same way as
>>>             on UNIX: if
>>>                                         core writing fails, write to
>>>                                             stderr. There is no way
>>>             otherwise to
>>>                                         communicate a core dump
>>>                                             writing failure. After
>>>             writing the
>>>                                         core, process stops.
>>>
>>>                                             ---
>>>
>>>                                             About the control flows:
>>>
>>>                                             Before Yumins change:
>>>
>>>                                             On Windows
>>>                                             1)
>>>             os::check_or_create_dump(): we
>>>                                         wrote the core dump and then
>>>                                             information about the
>>>             dump (file
>>>                                         location, success or error)
>>>                                             was handed to the misnamed
>>>             VmError::report_coredump_status(),
>>>                                             which just stored that
>>>             information.
>>>                                             2) VmError::report():
>>>             Error log was
>>>                                         written and in step 63
>>>                                             information about the
>>>             core (already
>>>                                         written at that point) was
>>>                                             added to error log.
>>>                                             3) os::abort() just did
>>>             an exit(3)
>>>
>>>                                             On Unix,
>>>                                             1)
>>>             os::check_or_create_dump(): checks
>>>                                         the limits to guess
>>>                                             whether core dumping
>>>             later will
>>>                                         succeed. Again, that
>>>                                             information is handed to
>>>             VmError::report_coredump_status()
>>>                                             2) VmError::report():
>>>             Error log is
>>>                                         written and in step 63,
>>>                                             information about the
>>>             core's probable
>>>                                         future location is added
>>>                                             to error log. Here,
>>>             messages use past
>>>                                         tense, which is
>>>                                             misleading, because the
>>>             core is not
>>>                                         yet written.
>>>                                             3) os::abort() calls
>>>             abort(3), which
>>>                                         stops and writes a core.
>>>
>>>                                             Yumin pushed core file
>>>             writing on
>>>                                         Windows down to os::abort().
>>>                                             This brings the control
>>>             flow much
>>>                                         closer to Unix:
>>>
>>>                                             (1) call
>>>             os::check_dump_limit() to
>>>                                         check the prerequisites for
>>>                                             core file writing and
>>>             gather a bit
>>>                                         information to put in the
>>>                                             error log: On Unix,
>>>             limit checks, on
>>>                                         windows, so far nothing
>>>                                             much but precalculating
>>>             the error file
>>>                                         name.
>>>                                             (2) VmError::report():
>>>             Error log is
>>>                                         written and information
>>>                                             about the core's
>>>             probable location is
>>>                                         added to error log.
>>>                                             (3) os::abort() is
>>>             called, which on
>>>                                         all platforms:
>>>                                              - dumps the core
>>>                                              - if failing, writes a
>>>             one-liner to
>>>                                         stderr
>>>                                              - stops the process.
>>>
>>>                                             I think, if one agrees
>>>             that reversing
>>>                                         order of core dumping and
>>>                                             error log writing on
>>>             windows is a good
>>>                                         thing, this change looks
>>>                                             ok to me. Some
>>>             improvements could make
>>>                                         it clearer:
>>>
>>>                                             - maybe rename
>>>                                         "os::check_dump_limit()"
>>>             (was my suggestion,
>>>                                             sorry) to
>>>             "os::check_core_prerequisites()" - that
>>>                                         leaves room
>>>                                             to flesh it out a bit
>>>             more on Windows,
>>>                                         where we do not have
>>>                                             UNIX limits.
>>>                                             - make the messages in
>>>                                         VMError::report() future
>>>             sense: "Will
>>>                                             write core file to
>>>             location.....".
>>>                                             - Make the messages
>>>             written in
>>>                                         os::abort() on Windows clearer,
>>>                                             and shorter, and we also
>>>             do not need
>>>                                         the buffer to be static
>>>                                             here. Actually, if we
>>>             want to be as on
>>>                                         UNIX, just dumping "core
>>>                                             file writing succeeded"
>>>             or "failed" -
>>>                                         maybe with an error
>>>                                             number from
>>>             GetLastError() - will be
>>>                                         enough because the file
>>>                                             location is already
>>>             printed in the
>>>                                         error log. Like on UNIX.
>>>
>>>                                             Kind Regards, Thomas
>>>
>>>
>>>                                             On Mon, Mar 30, 2015 at
>>>             9:51 PM,
>>>                                         Daniel D. Daugherty
>>>                                            
>>>             <daniel.daugherty at oracle.com
>>>             <mailto:daniel.daugherty at oracle.com>
>>>             <mailto:daniel.daugherty at oracle.com
>>>             <mailto:daniel.daugherty at oracle.com>>
>>>             <mailto:daniel.daugherty at oracle.com
>>>             <mailto:daniel.daugherty at oracle.com>
>>>             <mailto:daniel.daugherty at oracle.com
>>>             <mailto:daniel.daugherty at oracle.com>>>> wrote:
>>>
>>>                                                 Re:
>>>             report_coredump_status() is
>>>                                         really record_coredump_status()
>>>
>>>                                                 The
>>>             report_coredump_status()
>>>                                         function is designed to
>>>             record two
>>>                                                 things about core
>>>             dumps for later
>>>                                         reporting by the code that
>>>                                                 generates the
>>>             hs_err_pid file.
>>>                                         That's why the original
>>>             report_coredump_status() didn't
>>>                                         output anything to stderr.
>>>
>>>                                                 By changing the
>>>             Windows calls to
>>>                                         report_coredump_status() into
>>>                                                 jio_fprintf() calls
>>>             you have:
>>>
>>>                                                 - put output onto
>>>             stderr that
>>>                                         should go to the hs_err_pid file
>>>                                                 - made the windows
>>>             code paths work
>>>                                         differently than the
>>>                                                 non-windows
>>>                                                   code paths
>>>
>>>
>>>                                                 Dan
>>>
>>>
>>>
>>>                                                 On 3/30/15 1:28 PM,
>>>             Yumin Qi wrote:
>>>
>>>                                                     Thanks Dan
>>>
>>>                                                     On 3/30/2015
>>>             11:25 AM, Daniel
>>>                                         D. Daugherty wrote:
>>>
>>>                                                         On 3/25/15
>>>             12:11 PM, Yumin
>>>                                         Qi wrote:
>>>
>>>                                                             Hi, all
>>>
>>>             I updated the
>>>                                         webrev with a new change to
>>>                                                             test case:
>>>             test/runtime/Unsafe/RangeCheck.java
>>>
>>>             Add
>>>             -XX:-CreateCoredumpOnCrash to test, no
>>>                                                             dump
>>>             needed on this case.
>>>
>>>             new  webrev:
>>>             http://cr.openjdk.java.net/~minqi/8074354/webrev05/
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev05/>
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev05/>
>>>             <http://cr.openjdk.java.net/%7Eminqi/8074354/webrev05/>
>>>
>>>
>>>                                                         General nit
>>>             - please
>>>                                         update copyright lines to 2015
>>>                                                         as needed
>>>
>>>             src/share/vm/runtime/globals.hpp
>>>                                                             No comments.
>>>
>>>             src/share/vm/runtime/os.hpp
>>>                                                             line
>>>             720:   // On
>>>                                         Windows this will create an
>>>                                                         actual
>>>             minidump, on
>>>                                         Linux/Solaris it will simply
>>>                                                         check core
>>>             dump limits
>>>                                                             line
>>>             721: static
>>>                                         void check_dump_limit(char*
>>>                                                         buffer,
>>>             size_t bufferSize);
>>>             The comment on
>>>                                         line 720 no longer matches
>>>                                                         with the
>>>             function
>>>             rename from
>>>                                         check_or_create_dump() to
>>>             check_dump_limit().
>>>
>>>             You have this
>>>                                         comment in vmError.cpp:
>>>             943     // check
>>>                                         core dump limits on
>>>             Linux/Solaris, nothing on
>>>                                         Windows
>>>             944
>>>             os::check_dump_limit(buffer,
>>>             sizeof(buffer));
>>>
>>>             so the two
>>>                                         comments are out of sync.
>>>
>>>                                                     Will convert
>>>             them to be
>>>                                         consistent.
>>>
>>>             src/share/vm/runtime/arguments.cpp
>>>                                                             line
>>>             3252: } else if
>>>                                         (match_option(option,
>>>             "-XX:+CreateMinidumpOnCrash", &tail)) {
>>>                                                             line
>>>             3256: } else if
>>>                                         (match_option(option,
>>>             "-XX:-CreateMinidumpOnCrash", &tail)) {
>>>             These two checks
>>>                                         should use the
>>>             match_option() version
>>>             without a '&tail'
>>>                                         parameter.
>>>
>>>                                                     Will use version
>>>             w/o tail.
>>>
>>>             src/share/vm/utilities/vmError.cpp
>>>                                                             line
>>>             217: bool
>>>                                         VMError::coredump_status =
>>>                                                         true;    //
>>>             presume we
>>>                                         can dump core file first
>>>             I don't think you
>>>                                         should init this to true.
>>>                                                         It confuses
>>>             things with
>>>             VMError::report_coredump_status(). It will
>>>                                         also
>>>             confuse this code:
>>>
>>>             526 STEP(63,
>>>                                         "(printing core file
>>>             information)")
>>>             529       if
>>>                                         (coredump_status) {
>>>             530
>>>                                         st->print("Core dump
>>>             written. Default
>>>             location: %s",
>>>                                         coredump_message);
>>>
>>>             because
>>>                                         coredump_status won't accurately
>>>                                                         reflect whether
>>>             the
>>>                                         coredump_message field has
>>>             been set.
>>>
>>>                                                             line
>>>             943: // check
>>>                                         core dump limits on
>>>             Linux/Solaris, nothing on
>>>                                         Windows
>>>             This updated
>>>                                         comment doesn't match the
>>>             unmodified
>>>             comment in os.hpp:
>>>
>>>             720:   // On
>>>                                         Windows this will create an
>>>                                                         actual
>>>             minidump, on
>>>                                         Linux/Solaris it will simply
>>>                                                         check core
>>>             dump limits
>>>
>>>                                                     I will consider
>>>             your comments,
>>>                                         then revise with a new
>>>                                                     version. I
>>>             remember here there
>>>                                         is a case, if status not
>>>                                                     set, it will
>>>             output an
>>>                                         inconsistent message. Will check
>>>                                                     again.
>>>
>>>             src/os/aix/vm/os_aix.cpp
>>>                                                             No comments.
>>>
>>>             src/os/bsd/vm/os_bsd.cpp
>>>                                                             No comments.
>>>
>>>             src/os/linux/vm/os_linux.cpp
>>>                                                             No comments.
>>>
>>>             src/os/posix/vm/os_posix.cpp
>>>                                                             No comments.
>>>
>>>             src/os/solaris/vm/os_solaris.cpp
>>>                                                             No comments.
>>>
>>>             src/os/windows/vm/os_windows.cpp
>>>                                                             line
>>>             1008: static char
>>>                                         buffer[O_BUFLEN];
>>>             Why switch from a
>>>                                         buffer parameter to a
>>>                                                         static buffer?
>>>             What happens with
>>>                                         parallel calls to
>>>                                                         abort()?
>>>             Will the static
>>>             buffer get stomped
>>>                                         by the competing threads?
>>>
>>>                                                     The original
>>>             buffer is static
>>>                                         too.  Carrying an buffer
>>>                                                     for abort seems
>>>             not right.
>>>                                         This buffer in fact only for
>>>                                                     storing file
>>>             name (based on
>>>                                         pid) here.
>>>                                                     abort() will
>>>             dump the core
>>>                                         file, should we prevent
>>>                                                     multi-thread
>>>             calling to this
>>>                                         function? I did not check
>>>                                                     this part, will
>>>             check if it is
>>>                                         MT-safe.
>>>
>>>
>>>                                                             line
>>>             1015:   // If
>>>                                         running on a client version
>>>                                                         of Windows
>>>             and user has
>>>                                         not explicitly enabled dumping
>>>                                                             line
>>>             1016:   if
>>>             (!os::win32::is_windows_server() &&
>>>             !CreateCoredumpOnCrash) {
>>>             The default for
>>>                                         CreateCoredumpOnCrash is
>>>                                                         now 'true'
>>>             so the
>>>             comment and logic
>>>                                         are no longer correct
>>>                                                         here. The
>>>             Windows
>>>             Client VM will
>>>                                         generate minidumps by default.
>>>
>>>                                                             old line
>>>             1007:
>>>             VMError::report_coredump_status("Minidumps
>>>                                         are not
>>>                                                         enabled by
>>>             default on
>>>                                         client versions of Windows",
>>>                                                         false);
>>>                                                             new line
>>>             1017:
>>>                                         jio_fprintf(stderr, "Minidumps
>>>                                                         are not
>>>             enabled by default
>>>                                         on client versions of
>>>             Windows.\n");
>>>             There are a number
>>>                                         of places where we
>>>                                                         change from
>>>             VMError::report_coredump_status() to
>>>                                         jio_fprintf()
>>>             and I'm not quite
>>>                                         seeing why this was done.
>>>
>>>             Update:
>>>             VMError::report_coredump_status()
>>>                                                         is misnamed.
>>>             It doesn't
>>>             report anything in
>>>                                         the sense that it
>>>                                                         doesn't
>>>             print anything. It
>>>             does record two
>>>                                         pieces of information about
>>>                                                         core dumps
>>>             so maybe
>>>             it should be
>>>             VMError::record_coredump_status().
>>>
>>>                                                             line 1100:
>>>                                         jio_fprintf(stderr, "%s.\n",
>>>             buffer);
>>>             At this point,
>>>                                         buffer contains the path to the
>>>             mini-dump file so
>>>                                         the above line simply prints
>>>             that on stderr. Why?
>>>
>>>             Yes, I see that
>>>                                         the old code did something
>>>                                                         similar:
>>>
>>>             1090
>>>             VMError::report_coredump_status(buffer,
>>>                                         true);
>>>
>>>             but
>>>                                         report_coredump_status() didn't
>>>                                                         actually print
>>>             anything. It just
>>>                                         squirreled away these in
>>>             vmError.cpp:
>>>
>>>             217 bool
>>>             VMError::coredump_status;
>>>             218 char
>>>             VMError::coredump_message[O_BUFLEN];
>>>
>>>             See comments above
>>>                                         about how
>>>             report_coredump_status()
>>>             is misnamed.
>>>
>>>                                                             At this
>>>             point, I'm
>>>                                         convinced that all the
>>>                                                         changes from
>>>             VMError::report_coredump_status() to
>>>                                         jio_fprintf() are
>>>                                                             a bad idea.
>>>
>>>
>>>                                                     Changing to
>>>             jio_fprintf is because
>>>             report_coredump_status did not
>>>                                         output anything, it is
>>>                                                     just a logging
>>>             as you
>>>                                         indicated. It is reasonable we
>>>                                                     change it to
>>>                                         record_coredump_status
>>>             instead. How about
>>>                                                     add printout to
>>>                                         report_coredump_status? So I
>>>             do not
>>>                                                     need to use
>>>             jio_printf here.
>>>
>>>                                                     Other
>>>             consideration of using
>>>                                         jio_fprintf after call
>>>                                                     shutdown called,
>>>             the static
>>>                                         output stream still exists,
>>>                                                     but not
>>>             guaranteed to work as
>>>                                         expected.
>>>
>>>
>>>                                                     Thanks
>>>                                                     Yumin
>>>
>>>             test/runtime/ErrorHandling/ProblematicFrameTest.java
>>>                                                             No comments.
>>>
>>>             test/runtime/ErrorHandling/SafeFetchInErrorHandlingTest.java
>>>                                                             No comments.
>>>
>>>             test/runtime/ErrorHandling/SecondaryErrorTest.java
>>>                                                             No comments.
>>>
>>>             test/runtime/Safepoint/AssertSafepointCheckConsistency1.java
>>>                                                             No comments.
>>>
>>>             test/runtime/Safepoint/AssertSafepointCheckConsistency2.java
>>>                                                             No comments.
>>>
>>>             test/runtime/Safepoint/AssertSafepointCheckConsistency3.java
>>>                                                             No comments.
>>>
>>>             test/runtime/Safepoint/AssertSafepointCheckConsistency4.java
>>>                                                             No comments.
>>>
>>>             test/runtime/Unsafe/RangeCheck.java
>>>                                                             No comments.
>>>
>>>             test/runtime/memory/ReadFromNoaccessArea.java
>>>                                                             No comments.
>>>
>>>
>>>
>>>             Thanks
>>>             Yumin
>>>
>>>
>>>                                                             On
>>>             3/23/2015 11:48 AM,
>>>                                         Yumin Qi wrote:
>>>
>>>             Since Thomas is
>>>                                         not a reviewer in open jdk,
>>>             I need two
>>>                                         volunteers (at least one
>>>             "R"eviewer).
>>>
>>>             Dan, since you
>>>                                         already reviewed previous
>>>             version, could you
>>>                                         have a look?
>>>
>>>             Thanks
>>>             Yumin
>>>
>>>             On 3/20/2015 3:20
>>>                                         PM, Yumin Qi wrote:
>>>
>>>                 Thomas,
>>>
>>>                  Thanks. Yes,
>>>                                         it is webrev04. Also, I
>>>                 have updated
>>>                                         webrev04 to add another
>>>                 test case change:
>>>             test/runtime/memory/ReadFromNoaccessArea.java
>>>                  (Thanks Dan's
>>>                                         update)
>>>
>>>                 Thanks
>>>                 Yumin
>>>
>>>                 On 3/20/2015
>>>                                         11:55 AM, Thomas Stüfe wrote:
>>>
>>>
>>>                     Hi Yumin,
>>>
>>>
>>
>



More information about the hotspot-runtime-dev mailing list