RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device.
KIRIYAMA Takuya
duke at openjdk.java.net
Wed Feb 16 07:59:08 UTC 2022
On Thu, 10 Feb 2022 19:05:41 GMT, Markus Grönlund <mgronlun at openjdk.org> wrote:
>> I think JFR should report an error message and jvm should shut down safely instead of gurantee failure.
>>
>> For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below
>> by using JfrJavaSupport::abort().
>>
>> [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp)
>> [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp)
>> [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM...
>>
>> I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort().
>> I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core
>> because there is no space on device.
>> Could you please review the fix?
>
> src/hotspot/share/jfr/jni/jfrJavaSupport.hpp line 103:
>
>> 101:
>> 102: // critical
>> 103: static void abort(jstring errorMsg, TRAPS, bool dump_core=true);
>
> Not sure this is necessary. The existing core dump logic already handles the case where a core file cannot be generated due to disk full.
Thank you for your review.
Whether or not hotspot generate a core file is determined by the argument of vm_abort(bool dump_core). If the argument is "true", vm_abort(bool dump_core) calls os::abort(bool dump_core) to generate a core file.
See the following code:
https://github.com/openjdk/jdk/blob/3c160ab5bec0c2364ec3f43c5a5789098d4699e5/src/hotspot/share/runtime/java.cpp#L625
I think JfrJavaSupport::abort() should pass "false" as an argument to vm_abort(bool dump_core).
> test/hotspot/jtreg/runtime/jfr/TestJFRDiskFull.java line 127:
>
>> 125: raf.close();
>> 126: }
>> 127: }
>
> I appreciate the effort, but we can't have a test that intentionally provokes a disk full situation. Instead, the updated error message will have to be manually verified.
I use `@run main/manual` in TestJFRDiskFull.java. I think this label means manually test.
I mannually confirmed this test to pass with jtreg after this fix.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7227
More information about the hotspot-jfr-dev
mailing list