RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v4]

Thomas Stuefe stuefe at openjdk.java.net
Wed Oct 6 06:42:13 UTC 2021


On Wed, 6 Oct 2021 00:37:41 GMT, Xin Liu <xliu at openjdk.org> wrote:

> I don't like workaround. I intend to do thing right. In my understanding, fork_and_exec() is going to step into Native, isn't it?

The problem I have with this is that force-unlocking all mutexes changes the state and possibly destroys evidence in core files. 

For instance, let's say you have a crash in a thread owning a piece of memory, and unlocking all mutexes unlocks the mutex guarding this memory, lets other threads access and change the memory. The core file will reflect this. And this will be almost impossible to prove too. 

I really would prefer to limit this workaround to the specific use case where it is needed.

> 
> I would like to clean up [prepare_for_emergency_dump](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/jfr/recorder/repository/jfrEmergencyDump.cpp#L426) in next step. It's easier to maintain than just iterating locks manually. that's why I would like to have a natural name. How about `VMError::transition_java_current_to_native_from_vm()`?

Does that run before or after the hs-err file is written?


> > However, if we do this for normal crashes+OnError too, I'd like to see a test for that (you can simply reuse your test, add a second `@run` section and just start the VM with `-XX:ErrorHandlerTest=14 -XX:OnError=... -version`). Of course, that one would require `vm.debug`.
> 
> Actually, `-XX:+ErrorHandlerTest=14` is my first attempt. it turns out difficult to build a testcase which reliably reproduces this bug on all different platforms. I don't want to increase the burden of long-term operations.
> 
> In this case, ErrorHandleTest triggers `VMError::controlled_crash` early. I have seen that it's earlier than AttachListener. As a result, jcmd can't surely attach HotSpot.
> 
> For normal crashes+OnError scenario, I think `runtime/ErrorHandling/TestOnError.java` has covered it.

+1

As I wrote, I think this seems like a separate issue. OnError should always work, also if VM has an error in init phase.

>> test/hotspot/jtreg/runtime/ErrorHandling/TestOutOfMemoryErrorFromNIO.java line 25:
>> 
>>> 23: 
>>> 24: /*
>>> 25:  * @test TestOutOfMemoryErrorFromNIO
>> 
>> I would rename the test. The point is not to test OOM from NIO, but to test XX:OnError with jcmd which attaches to the parent. 
>> 
>> You could do the same with any OOM, right? E.g. just start the VM with a very small MaxMetaspaceSize. You would not even need a main function for that, since the VM would not come up but throw a OOM right away,
>> 
>> Proposal (but feel free to come up with something better): "TestOnErrorWithSelfAttachingJCmd"
>
> I found HotSpot is very subtle when it comes to `MaxMetaspaceSize` and `-XX:AbortVMOnException=java.lang.OutOfMemoryError`
> 
> On my host, the watershed is 3M. If I give it 2M, it will ignore `AbortVMOnException`. 
> 
> 
> java -XX:MaxMetaspaceSize=2M -XX:AbortVMOnException=java.lang.OutOfMemoryError 
> Error occurred during initialization of VM
> OutOfMemoryError: Metaspace
> 
> 
> Flip code blocks here can solve this problem. is it intentional? 
> 
> 
> diff --git a/src/hotspot/share/memory/metaspace.cpp b/src/hotspot/share/memory/metaspace.cpp
> index 74b5d1a0b30..efe430f7d3c 100644
> --- a/src/hotspot/share/memory/metaspace.cpp
> +++ b/src/hotspot/share/memory/metaspace.cpp
> @@ -974,15 +974,16 @@ void Metaspace::report_metadata_oome(ClassLoaderData* loader_data, size_t word_s
>          space_string);
>    }
> 
> -  if (!is_init_completed()) {
> -    vm_exit_during_initialization("OutOfMemoryError", space_string);
> -  }
> -
>    if (out_of_compressed_class_space) {
>      THROW_OOP(Universe::out_of_memory_error_class_metaspace());
>    } else {
>      THROW_OOP(Universe::out_of_memory_error_metaspace());
>    }
> +
> +  if (!is_init_completed()) {
> +    vm_exit_during_initialization("OutOfMemoryError", space_string);
> +  }
> +
>  }

I don't think you can flip it. Throwing the exception may involve further metaspace allocations, which gives you a circle. But for OnError this should not matter. OnError should also work for errors during initialization (whether the subsequent command, e.g. jcmd, succeeds in accessing the half-inited VM is another story).

I think this is a separate issue: make OnError work when the VM bails out during initialization.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5590


More information about the hotspot-dev mailing list