RFR: 8341310: Test containers/docker/TestJcmdWithSideCar.java fails after JDK-8327114 [v2]

Sebastian Lövdahl duke at openjdk.org
Wed Oct 2 17:53:35 UTC 2024


On Wed, 2 Oct 2024 16:50:05 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> The change of [JDK-8327114](https://bugs.openjdk.org/browse/JDK-8327114) also increased test coverage. In particular, the `TestJcmdWithSideCar.java` test got enhanced to cover these cases (prior to [JDK-8327114](https://bugs.openjdk.org/browse/JDK-8327114) only case 1 was tested):
>> 
>> 1. Shared volumes between attachee and attacher and shared pid namespace
>> 2. Shared volumes between attachee and attacher and shared pid namespace, both running with elevated privileges
>> 3. Shared pid namespace between attachee and attacher only
>> 4. Shared pid namespace between attachee and attacher, both running with elevated privileges
>> 
>> The OpenJDK attach code is able to handle cases 1 through 3 which pass, but the last case, `4`, hasn't been implemented yet when running as regular user and directing the container runtime to map the container user to that user as well. Thus, the test fails. For now I propose to disable the 4th test case. It can get re-enabled once the product code got updated to account for this case (tracked in https://bugs.openjdk.org/browse/JDK-8341349).
>> 
>> While at it, I've discovered that the `EventGeneratorLoop` running container always runs for a fixed time: `30` seconds. That's irrespective of the attacher containers being done. Therefore, two iterations of the loop spawning this container running the fixed set of time will at least run `60` seconds. In my test runs using `-summary:time` it was `70` seconds:
>> 
>> 
>> Passed: containers/docker/TestJcmdWithSideCar.java
>>   build: 2.08 seconds
>>   compile: 2.068 seconds
>>   build: 4.842 seconds
>>   compile: 4.841 seconds
>>   driver: 70.776 seconds
>> Test results: passed: 1
>> 
>> 
>> I don't think this is needed. We can destroy the process running `EventGeneratorLoop` and then wait for it to exit rather than sitting there and waiting until it is finished (even though we no longer do anything with it). This improvement has been implemented in https://github.com/openjdk/jdk/commit/5b2f646c73b747f6fff364347031074d24e49822 (separate commit). After this, the total runtime of the test reduces to about `22` seconds:
>> 
>> 
>> Passed: containers/docker/TestJcmdWithSideCar.java
>>   build: 2.169 seconds
>>   compile: 2.157 seconds
>>   build: 4.964 seconds
>>   compile: 4.963 seconds
>>   driver: 22.928 seconds
>> Test results: passed: 1
>> 
>> 
>> The actual test skip of the 4th test case is: https://github.com/openjdk/jdk/commit/95a7cc05f00e94190af583b9f9dbc659c7be879f
>> 
>> Thoughts? Could somebody please run th...
>
> Severin Gehwolf has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove the attachee container if it exists

Right, this turned into a rabbit hole of its own.. 😁 

`make test TEST="jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java" JTREG="JAVA_OPTIONS=-Djdk.test.container.command=docker"` is still failing:


Full child process STDOUT was saved to docker-stdout-139602.log
[2024-10-02T17:02:06.397293068Z] Waiting for completion for process 139602
[2024-10-02T17:02:06.397362093Z] Waiting for completion finished for process 139602
[COMMAND]
docker ps 
[2024-10-02T17:02:07.771881444Z] Gathering output for process 139723
[ELAPSED: 19 ms]
[STDERR]

[STDOUT]
CONTAINER ID   IMAGE                                                              COMMAND                  CREATED         STATUS         PORTS     NAMES
1db392c08e7c   jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd   "/jdk/bin/java -XX:+…"   5 seconds ago   Up 5 seconds             test-container-main

Full child process STDOUT was saved to docker-stdout-139723.log
[2024-10-02T17:02:07.790740367Z] Waiting for completion for process 139723
[2024-10-02T17:02:07.790915466Z] Waiting for completion finished for process 139723
[COMMAND]
docker rm test-container-main 
[2024-10-02T17:02:07.792750591Z] Gathering output for process 139734
[ELAPSED: 15 ms]
[STDERR]
Error response from daemon: You cannot remove a running container 1db392c08e7cbac3a256597c2ee693fdbbeab3a814cb229c83ac38c0c805f42d. Stop the container before attempting removal or force remove


I _think_ the underlying problem is that killing the process from the Java test is not properly propagated to the `EventGeneratorLoop` process running as a Docker container. Among others, I found https://www.kaggle.com/code/residentmario/best-practices-for-propagating-signals-on-docker talking about it. So, in the Docker case the container continues running for the whole 30 seconds even though we called `p.destroy()`. `docker rm` for the next test case will fail because the container is still running. So, I started out by trying this:


diff --git test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
index 5132f14d788..4726284cff1 100644
--- test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
+++ test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
@@ -251,8 +251,9 @@ public Process start(final boolean elevated) throws Exception {
             OutputAnalyzer out = DockerTestUtils.execute(Container.ENGINE_COMMAND, "ps")
                 .shouldHaveExitValue(0);
             if (out.contains(MAIN_CONTAINER_NAME)) {
-                DockerTestUtils.execute(Container.ENGINE_COMMAND, "rm", MAIN_CONTAINER_NAME)
+                DockerTestUtils.execute(Container.ENGINE_COMMAND, "stop", "--time=1", "--signal=SIGKILL", MAIN_CONTAINER_NAME)
                    .shouldHaveExitValue(0);
             }
             // start "main" container (the observee)
             DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop");


..only to be met by:


[main-container-process] docker: Error response from daemon: Conflict. The container name "/test-container-main" is already in use by container "f2754e97e6eecd6b0538e8a8c3d7c12214d16f9f32a9e2fe313c1d9f3258732c". You have to remove (or rename) that container to be able to reuse that name.
[main-container-process] See 'docker run --help'.
java.lang.RuntimeException: Timed out while waiting for main() to start
	at TestJcmdWithSideCar$MainContainer.waitForMainMethodStart(TestJcmdWithSideCar.java:293)
	at TestJcmdWithSideCar.main(TestJcmdWithSideCar.java:110)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:573)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333)
	at java.base/java.lang.Thread.run(Thread.java:1576)


Because we start containers with `--rm`, they seem to be asynchronously removed _after_ `stop` has finished. Finally, if we wait a little bit after the `stop` command, `make test TEST="jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java" JTREG="JAVA_OPTIONS=-Djdk.test.container.command=docker"` works again.


diff --git test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
index 5132f14d788..4726284cff1 100644
--- test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
+++ test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
@@ -251,8 +251,9 @@ public Process start(final boolean elevated) throws Exception {
             OutputAnalyzer out = DockerTestUtils.execute(Container.ENGINE_COMMAND, "ps")
                 .shouldHaveExitValue(0);
             if (out.contains(MAIN_CONTAINER_NAME)) {
-                DockerTestUtils.execute(Container.ENGINE_COMMAND, "rm", MAIN_CONTAINER_NAME)
+                DockerTestUtils.execute(Container.ENGINE_COMMAND, "stop", "--time=1", "--signal=SIGKILL", MAIN_CONTAINER_NAME)
                    .shouldHaveExitValue(0);
+                Thread.sleep(1_000L);
             }
             // start "main" container (the observee)
             DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop");

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21289#issuecomment-2389263714


More information about the hotspot-runtime-dev mailing list