RFR: 7903580: Allow for re-attempting agent creation when an attempt fails [v2]

Jonathan Gibbons jjg at openjdk.org
Mon Nov 13 18:51:31 UTC 2023


On Mon, 13 Nov 2023 15:23:52 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

>> Can I please get a review of this change which proposes to implement the enhancement request noted in https://bugs.openjdk.org/browse/CODETOOLS-7903580?
>> 
>> Several times in our CI instances we have noticed that a test (action) fails because it is unable to create an `Agent` instance due to the socket connection not being established between the `AgentServer` and the `ServerSocket` running within the jtreg process. This causes the test itself to fail with `Error. Cannot get VM for test: java.net.SocketTimeoutException: Accept timed out`. It has also been noticed that most of the times this is intermittent and subsequent attempt for a different test (action) passes. 
>> 
>> The proposed change here introduces a new configuration parameter `--agent-attempts` under "Agent Pool Options", which allows for configuring the maximum number of attempts that are allowed for getting an agent for an action. This is an optional parameter and by default it has been set to a value of `2`, which by default then allows the agent creation to be retried once if the previous attempt fails. This then means that existing installations/usages of jtreg need not set a value for `--agent-attempts` to enroll for this feature. A value of `1` for this parameter implies that the agent creation attempt will be done only once, which is what currently happens in the absence of this proposed feature.
>> 
>> Inability to obtain an agent, for whatever reason, after attempting `--agent-attempts` times continues to result in test (action) failure, like it does today.
>> 
>> Some additional minor logging related changes have been done too, to help debugging some of the current observed failures.
>> 
>> This change has been tested in the following manners:
>> 
>> 1. This modified build of jtreg has been used to run tier1, tier2 and tier3 of JDK mainline and it has been verified that nothing regresses with this change. That however doesn't mean that agent creation re-attempt logic was tested in these runs (since the first attempt never failed).
>> 2. Locally this build was jtreg was additionally modified to selectively not connect to the `ServerSocket`'s port thus simulating a "Accept timed out" exception. That change then verified that the re-attempt logic does indeed correctly kick in and the test action passes after picking up the newly created agent.
>> 3. `--agent-attempts 1` was passed to the above modified build (from step 2) to verify that the re-attempt isn't attempted when `--agent-attempts 1` an...
>
> Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   update the tests to match the log message change

Changes requested by jjg (Lead).

src/share/classes/com/sun/javatest/regtest/exec/Agent.java line 101:

> 99:         try {
> 100:             pidMethod = Process.class.getDeclaredMethod("pid"); // only available in Java 9+
> 101:         } catch (NoSuchMethodException | SecurityException e) {

It can throw `UnsupportedOperationException`
https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Process.html#pid()

src/share/classes/com/sun/javatest/regtest/exec/Agent.java line 772:

> 770:          * Sets the maximum attempts to create or obtain an agent VM
> 771:          * @param numAttempts number of attempts
> 772:          * @throws IllegalArgumentException if {@code numAttempts} is lesser than {@code 1}

s/lesser/less/

src/share/classes/com/sun/javatest/regtest/exec/Agent.java line 990:

> 988:         private Duration idleTimeout;
> 989:         // default is 2 i.e. we retry a failed agent selection once
> 990:         private int numAgentSelectionAttempts = 2;

The default should be 1, meaning that the new behavior is opt-in, not opt-out.

You can always set a different default in the layers of software wrapping `jtreg`.

src/share/classes/com/sun/javatest/regtest/exec/Agent.java line 990:

> 988:         private Duration idleTimeout;
> 989:         // default is 2 i.e. we retry a failed agent selection once
> 990:         private int numAgentSelectionAttempts = 2;

The limit of 2 is set here but also later on.

src/share/classes/com/sun/javatest/regtest/tool/Tool.java line 462:

> 460:         },
> 461: 
> 462:         new Option(GNU, AGENT_POOL, null, "--agent-attempts") {

It might be good to add a FAQ entry for this option as well.

Q. `jtreg` has trouble starting agents; what can I do?
A. This can happen because _<fill-in-the-blank>_. Use the `--sgents-attempts` option to retry creating a agent up to a given number of times.

src/share/classes/com/sun/javatest/regtest/tool/i18n.properties line 304:

> 302:     couldn't be obtained from the pool. Similarly, a value of 2 will imply \
> 303:     that if the agent VM creation fails or an existing one couldn't be \
> 304:     obtained from the pool, then 1 retry will be attempted. \

The text about "queries" seems "weird".  Suggest:

The number of attempts jtreg will make to access an agent with the desired characteristics, either by creating a new one, or obtaining one from the pool of reusable VMs. The minimum value, and the default value, is 1.

-------------

PR Review: https://git.openjdk.org/jtreg/pull/173#pullrequestreview-1727960909
PR Review Comment: https://git.openjdk.org/jtreg/pull/173#discussion_r1391513864
PR Review Comment: https://git.openjdk.org/jtreg/pull/173#discussion_r1391516667
PR Review Comment: https://git.openjdk.org/jtreg/pull/173#discussion_r1391518324
PR Review Comment: https://git.openjdk.org/jtreg/pull/173#discussion_r1391528719
PR Review Comment: https://git.openjdk.org/jtreg/pull/173#discussion_r1391527547
PR Review Comment: https://git.openjdk.org/jtreg/pull/173#discussion_r1391534296


More information about the jtreg-dev mailing list