RFR: 8306580: Propagate CDS dumping errors instead of directly exiting the VM

Thu May 23 21:54:03 UTC 2024

On Thu, 23 May 2024 15:48:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Currently, when CDS dumping run into an unrecoverable error (e.g., file I/O error, out of memory), it calls MetaspaceShared::unrecoverable_writing_error(), which directly exits the VM.  Some of these errors can be propagated to the caller for a normal exit. 
>> 
>> This change introduces `MetaspaceShared::writing_error()` to report errors without exiting the VM. The function `MetaspaceShared::unrecoverable_writing_error()` now should only be used for errors that require the VM to exit. Verifier with tier1-5 tests.
>
> src/hotspot/share/cds/archiveBuilder.cpp line 332:
> 
>> 330:   if (!rs.is_reserved()) {
>> 331:     log_error(cds)("Failed to reserve " SIZE_FORMAT " bytes of output buffer.", buffer_size);
>> 332:     MetaspaceShared::writing_error();
> 
> I don't understand how that could work. Would the subsequent access to rs.base not crash the VM?

I agree with Thomas. We need to stop any further operations and exit the safepoint. Something like this

void VM_PopulateDumpSharedSpace::doit() {
  ...
  StaticArchiveBuilder builder;
  builder.gather_source_objs();
>>>>
  if (builder.reserve_buffer() == nullptr) {
     // report error ...
     this->_failed = true; // 
     return;
   }
<<<<

The failure needs to be propagated to the main thread.

  VM_PopulateDumpSharedSpace op;
  VMThread::execute(&op);
>>>>
   if (op._failed) {
       THROW_MSG(.....);
    }
<<<<
}

And the VM will eventually exit with an unhandled exception.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19370#discussion_r1612354497