RFR: 8341436: containers/docker/TestJcmdWithSideCar.java takes needlessly long to run

Kevin Walls kevinw at openjdk.org
Mon Oct 7 12:00:37 UTC 2024


On Thu, 3 Oct 2024 19:13:46 GMT, Sebastian Lövdahl <duke at openjdk.org> wrote:

> The fix is twofold.
> 
> 1. Stop the main container after an iteration is done. The main container is started with its runtime defined as 120 seconds, which means that each iteration takes 120 seconds. In reality, one iteration takes a few seconds while 115 seconds is spent waiting on the main container exiting.
> 
> 2. Change the name of the main container to be unique per iteration. Containers are started with `--rm`, which means they are removed after exiting. However, the removal is done asynchronously _after_ the `stop` command has returned. This means that the second iteration may get an error if the same container name is used if the removal was not done before the container is started in the next iteration.
> 
> On my machine, this cuts down the test runtime using Podman from 4m 13s to 17s. Using Docker, the runtime goes from 4m 15s to 41s.
> 
> Podman only runs half the test cases (since JDK-8341310) which explain some of the difference. But there is also something strange going on in the Docker case; every `docker stop` call takes 10 seconds, and I have not been able to figure out what exactly causes it.
> 
> Doing a manual `kill [container Java process PID]` gracefully terminates the Java process and container, but `docker stop` never does. Instead, it blocks for 10 seconds before abruptly killing the process using `SIGKILL`. I confirmed this with a simplified case and both
> `strace -e 'trace=!all' docker run --init eclipse-temurin:23 java ..` and `strace -e 'trace=!all' docker run eclipse-temurin:23 java ..`, no signals were ever visible when calling either `docker stop` or `docker kill`.
> 
> https://www.docker.com/blog/docker-best-practices-choosing-between-run-cmd-and-entrypoint/ and "What is PID 1 and why does it matter?" talks about why [`--init`](https://docs.docker.com/reference/cli/docker/container/run/#init) is supposed to help.

Hi, I think this looks good.
Having main container name be unique is good.  Do you just add "elevated-" + elevated here, it would be great to add even a random ID in there as well, as we can have multiple tests running on the same machine at times...

The other problem ( https://bugs.openjdk.org/browse/JDK-8341518 ) seems to be EventGeneratorLoop terminating too early, we have some logs where it's gone after 10 or 12 seconds, not 120.  It should maybe check a wall-clock time to ensure it lives long enough.  I can do that separately unless you think it makes sense to include it here, I don't think it will conflict.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21331#issuecomment-2396725847


More information about the serviceability-dev mailing list