RFR: 8170896: TEST_BUG: java/rmi/server/Unreferenced/leaseCheckInterval/LeaseCheckInterval.java failed with unreferenced() not invoked after 20.0 seconds [v5]

Tue Jan 27 23:53:23 UTC 2026

On Thu, 22 Jan 2026 11:05:06 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

>> Can I please get a review of this test-only change which addresses intermittent failures in `java/rmi/server/Unreferenced/leaseCheckInterval/LeaseCheckInterval.java`?
>> 
>> The `@summary` in that test's test definition about what this test does:
>> 
>>>  @summary When the "java.rmi.dgc.leaseValue" system property is set to a
>>>  value much lower than its default (10 minutes), then the server-side
>>>  user-visible detection of DGC lease expiration-- in the form of
>>>  Unreferenced.unreferenced() invocations and possibly even local garbage
>>>  collection (including weak reference notification, finalization, etc.)--
>>>  may be delayed longer than expected.  While this is not a spec violation
>>>  (because there are no timeliness guarantees for any of these garbage
>>>  collection-related events), the user might expect that an unreferenced()
>>>  invocation for an object whose last client has terminated abnormally
>>>  should occur on relatively the same time order as the lease value
>>>  granted.
>> 
>> In its current form, the test uses a lease expiry of 10 seconds, launches a trivial `java` application which looks up the bound object from the registry and then terminates itself. After launching that trivial java application, the test then waits for 20 seconds, expecting that the `Unreferenced.unreferenced()` callback (upon lease expiry of 10 seconds) will be called within those 20 seconds. This wait intermittently fails because the `Unreferenced.unreferenced()` doesn't get called within those 20 seconds. 
>> 
>> Experiments show that the reason for these intermittent failures is due to the `SelfTerminator` application which does the registry lookup (and which involves connection establishment and communication over a socket) can sometimes take several seconds (5 or more for example). That effectively means that by the time this `SelfTerminator` starts its termination after the lookup, it's already several seconds into the "wait()" in the test.
>> 
>> The commit in this PR cleans up the test to more accurately track the duration of how long it took between the lease expiry and the `Unreferenced.unreferenced()` callback to be invoked. Additionally, just to make the test more robust, the maximum expected duration has been increased to 60 seconds instead of 20 seconds. Given the text in the test's summary, I think this increase is still within the expectations of how long it takes for the callback to be invoked after the client has exited abnormally.
>> 
>> The test continue...
>
> Jaikiran Pai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
> 
>  - merge latest from master branch
>  - merge latest from master branch
>  - Mark's suggestion - move the duration check to separate method
>  - merge latest from master branch
>  - Mark's review - use CountDownLatch
>  - merge latest from master branch
>  - 8170896: TEST_BUG: java/rmi/server/Unreferenced/leaseCheckInterval/LeaseCheckInterval.java failed with unreferenced() not invoked after 20.0 seconds

Thanks everybody for working on cleaning up this old test. The current state is a significant simplification from the earlier version. It seems like it's also a lot more reliable, though only time will tell! :-) I think there is a way to simplify the test further and to narrow the timing window.

The test forks a subprocess that runs the SelfTerminator program. This obtains a remote ref, causing the parent JVM to create a DGC lease. The subprocess then halts, which stops the JVM without relinquishing the DGC lease. Thus the parent has a lease that nobody will ever renew or cancel, creating the conditions for the lease to time out.

The observation is that the timeout logic is entirely within the parent JVM. The child's role is simply to obtain a lease and halt immediately. The parent JVM could be changed to call JavaVM.waitFor() to await the child's termination and to obtain a timestamp. This would remove the need to create a temporary file into which the child writes a timestamp from which the parent reads the timestamp. In addition, waitFor() allows the parent to check the child's exit status and signal an error if the status is nonzero.

I guess one could argue that having the child obtain the timestamp immediately after the registry call is closer to the beginning of the actual lease. However, I don't think this makes much of a practical difference. In addition, comparing values of Instant.now() from different JVMs _probably_ ends up using the same time base for comparison, but comparing timestamps from different JVMs might be subject to some weird time skew.

With this change the parent's pseudocode is this:

1. fork child
2. await child exit, check exit status
3. get start timestamp
4. latch.await()
5. get end timestamp, compare

Note that using the CountDownLatch still works, even if countDown() is called before await(), so this logic is still sound.

Anyway I think this simplifies things a bit by avoiding file I/O (and the explanations that go along with it) and make the timestamp logic easier to follow since it's all done in one place.

Another small cleanup item: is it necessary to have a `@modules` declaration in the test? I don't see anything in the test that uses RMI internals. (The Test Library might need this stuff, though, not sure.)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28919#issuecomment-3808160797