RFR: 8341436: containers/docker/TestJcmdWithSideCar.java takes needlessly long to run [v3]
Kevin Walls
kevinw at openjdk.org
Wed Oct 9 11:14:01 UTC 2024
On Mon, 7 Oct 2024 19:37:51 GMT, Sebastian Lövdahl <duke at openjdk.org> wrote:
>> The fix is twofold.
>>
>> 1. Stop the main container after an iteration is done. The main container is started with its runtime defined as 120 seconds, which means that each iteration takes 120 seconds. In reality, one iteration takes a few seconds while 115 seconds is spent waiting on the main container exiting.
>>
>> 2. Change the name of the main container to be unique per iteration. Containers are started with `--rm`, which means they are removed after exiting. However, the removal is done asynchronously _after_ the `stop` command has returned. This means that the second iteration may get an error if the same container name is used if the removal was not done before the container is started in the next iteration.
>>
>> On my machine, this cuts down the test runtime using Podman from 4m 13s to 17s. Using Docker, the runtime goes from 4m 15s to 41s.
>>
>> Podman only runs half the test cases (since JDK-8341310) which explain some of the difference. But there is also something strange going on in the Docker case; every `docker stop` call takes 10 seconds, and I have not been able to figure out what exactly causes it.
>>
>> Doing a manual `kill [container Java process PID]` gracefully terminates the Java process and container, but `docker stop` never does. Instead, it blocks for 10 seconds before abruptly killing the process using `SIGKILL`. I confirmed this with a simplified case and both
>> `strace -e 'trace=!all' docker run --init eclipse-temurin:23 java ..` and `strace -e 'trace=!all' docker run eclipse-temurin:23 java ..`, no signals were ever visible when calling either `docker stop` or `docker kill`.
>>
>> https://www.docker.com/blog/docker-best-practices-choosing-between-run-cmd-and-entrypoint/ and "What is PID 1 and why does it matter?" talks about why [`--init`](https://docs.docker.com/reference/cli/docker/container/run/#init) is supposed to help.
>
> Sebastian Lövdahl has updated the pull request incrementally with one additional commit since the last revision:
>
> Have EventGeneratorLoop end after a more predictable duration
Hmm, actually no I _can_ still get a failure.
This change is still worthwhile, but not a reason to un-problemlist the test just yet (so thanks, yes problemlisting was a good step).
Previously, the EventGeneratorLoop had shown its ending message, so it was not surprising the test failed.
Now I don't see that message (good, EventGeneratorLoop still running), but can still get the same kind of failure
(java.lang.RuntimeException: 'sun.tools.jcmd.JCmd' missing from stdout/stderr). More to do in 8341518...
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21331#issuecomment-2402025140
More information about the serviceability-dev
mailing list