RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. [v3]

Markus Grönlund mgronlun at openjdk.java.net
Fri Feb 18 11:23:51 UTC 2022


On Fri, 18 Feb 2022 05:44:34 GMT, KIRIYAMA Takuya <duke at openjdk.java.net> wrote:

>> I think JFR should report an error message and jvm should shut down safely instead of gurantee failure.
>> 
>> For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below
>> by using JfrJavaSupport::abort().
>> 
>> [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp)
>> [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp)
>> [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM...
>> 
>> I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort().
>> I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core 
>> because there is no space on device.
>> Could you please review the fix?
>
> KIRIYAMA Takuya has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device.

diff --git a/src/hotspot/share/jfr/jni/jfrJavaSupport.cpp b/src/hotspot/share/jfr/jni/jfrJavaSupport.cpp
index 95b96e02c06..015d4ebe065 100644
--- a/src/hotspot/share/jfr/jni/jfrJavaSupport.cpp
+++ b/src/hotspot/share/jfr/jni/jfrJavaSupport.cpp
@@ -563,14 +563,16 @@ void JfrJavaSupport::throw_runtime_exception(const char* message, TRAPS) {
 
 void JfrJavaSupport::abort(jstring errorMsg, JavaThread* t) {
   DEBUG_ONLY(check_java_thread_in_vm(t));
-
   ResourceMark rm(t);
-  const char* const error_msg = c_str(errorMsg, t);
-  if (error_msg != NULL) {
-    log_error(jfr, system)("%s",error_msg);
+  abort(c_str(errorMsg, t));
+}
+
+void JfrJavaSupport::abort(const char* error_msg, bool dump_core /* true */) {
+  if (error_msg != nullptr) {
+    log_error(jfr, system)("%s", error_msg);
   }
   log_error(jfr, system)("%s", "An irrecoverable error in Jfr. Shutting down VM...");
-  vm_abort();
+  vm_abort(dump_core);
 }
 
 JfrJavaSupport::CAUSE JfrJavaSupport::_cause = JfrJavaSupport::VM_ERROR;
diff --git a/src/hotspot/share/jfr/jni/jfrJavaSupport.hpp b/src/hotspot/share/jfr/jni/jfrJavaSupport.hpp
index 53d6eed68a8..1ec5a884b4b 100644
--- a/src/hotspot/share/jfr/jni/jfrJavaSupport.hpp
+++ b/src/hotspot/share/jfr/jni/jfrJavaSupport.hpp
@@ -112,6 +112,7 @@ class JfrJavaSupport : public AllStatic {
 
   // critical
   static void abort(jstring errorMsg, TRAPS);
+  static void abort(const char* error_msg, bool dump_core = true);
   static void uncaught_exception(jthrowable throwable, JavaThread* t);
 
   // asserts
diff --git a/src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp b/src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp
index 3a7ec286381..73404a1aede 100644
--- a/src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp
+++ b/src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp
@@ -25,8 +25,8 @@
 #ifndef SHARE_JFR_WRITERS_JFRSTREAMWRITERHOST_INLINE_HPP
 #define SHARE_JFR_WRITERS_JFRSTREAMWRITERHOST_INLINE_HPP
 
+#include "jfr/jni/jfrJavaSupport.hpp"
 #include "jfr/writers/jfrStreamWriterHost.hpp"
-
 #include "runtime/os.hpp"
 
 template <typename Adapter, typename AP>
@@ -77,6 +77,9 @@ inline void StreamWriterHost<Adapter, AP>::write_bytes(const u1* buf, intptr_t l
   while (len > 0) {
     const unsigned int nBytes = len > INT_MAX ? INT_MAX : (unsigned int)len;
     const ssize_t num_written = os::write(_fd, buf, nBytes);
+    if (errno == ENOSPC) {
+      JfrJavaSupport::abort("Failed to write to jfr stream because no space left on device", false);
+    }
     guarantee(num_written > 0, "Nothing got written, or os::write() failed");
     _stream_pos += num_written;
     len -= num_written;

src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp line 88:

> 86:         JavaThread* jt = JavaThread::current();
> 87:         ThreadInVMfromNative transition(jt);
> 88:         JfrJavaSupport::abort(JfrJavaSupport::new_string(msg, jt), jt, false);

Hi again Takuya, I'm sorry, but I should have noticed this earlier:

I now see that the code needs to allocate a Java string oop to conform to the existing abort function signature, which caters to invocations from Java. Then abort() immediately strips out the c-string from the oop. To be correct, also headers for logging/log.hpp and runtime/thread.inline.hpp should need be included.

I believe we can simplify this by updating the abort() signature so that we don't need to drag in those extra dependencies. Please see my following comment where I suggest a way to do this.

Thanks for your patience
Markus

-------------

PR: https://git.openjdk.java.net/jdk/pull/7227


More information about the hotspot-jfr-dev mailing list