RFR(S): 8235454: [TESTBUG] Basic test for JFR event streaming for jdk_jfr_sanity

Fri Dec 13 04:47:47 UTC 2019

Hi Misha,

besides the obvious problem w/ the test's current failure mode, which is that all the failures will be reported as ' Error. Program java timed out....' which tend to be miscategorized as environment or test bugs; you also get a more subtle problem. your test doesn't fully define what it verifies, the most important part is defined by test harness now. and b/c test harness doesn't have any knowledge about the test specific it can assert only very broadly -- all test code has to finish in 2 minutes (subject to scale according to test factor). such broad assertions can lead to false-negative results from this test: at one extreme, onEvent takes 2 minutes - delta - epsilon, and onClose takes just delta, which is close to 0 ms, where epsilon is spent on test overhead e.g. spawning a JVM, at another extreme onEvent takes delta, and onClose -- 2minutes - delta - epsilon. and if the latter might be acceptable, the former sounds like a bug to me.

with that being said, I'm fine either way. I just wanted to make sure the drawbacks of current approach are clear and understood. 

Thanks,
-- Igor

> On Dec 10, 2019, at 11:43 AM, mikhailo.seledtsov at oracle.com wrote:
> 
> Hi Erik, Igor,
> 
> Pasting Igor's comment as well:
> 
>> the only problem I have w/ the test is its failure mode, other than by crash, the test can fail only by timeout, which isn't the nicest way to fail. I understand that there is no specific guarantee when an event is received by consumer, nor there are such guarantees for on-close, but I'd really like us to find a better way to detect and report failures. for example, you can replace no-arg CountDownLatch::await calls w/ one which takes timeout value (await(long, j.u.c.TimeUnit)), use some reasonable timeout value > and report test failure if the return value isn't true.
> 
> 
> 
> IMO, whether to use explicit timeout or not in await() in the test depends on the goal of a particular test case. If the goal is to test the timing/responsiveness of the API, we should use a timeout, given that there is an explicit or implicit guarantee for event arrival (specifically that the onEvent() and onClose() should arrive within some specific time). If the goal is to simply test that events arrive, we should not use the timeout.
> 
> In this case, this is a basic/sanity test, and the goal is not to test specific timing. Hence, I am in favor of leaving the test as is, and let the test harness signal the timeout. The user will see what happened from post-test diagnostics provided. I can slightly improve the test by adding a printout statement after each await(), to let user know whether the await() is through.
> 
> Let me know what you think.
> 
> 
> Thanks,
> Misha
> 
> On 12/9/19 8:41 PM, Erik Gahlin wrote:
>> Looks good
>> 
>> When it comes to failure mode.
>> 
>> I think it is better to let the test harness determine what is a reasonable timeout, and if needed, it can be change from the outside for all tests. Also, if the test fails using a timeout, the test harness will (hopefully) take a thread dump, which is useful for troubleshooting.
>> 
>> Erik
>> 
>>> On 10 Dec 2019, at 02:36, mikhailo.seledtsov at oracle.com wrote:
>>> 
>>> Please review this change that introduces a very simple basic/sanity test for JFR streaming, and adds this test to jdk_jfr_sanity (which runs in tier-1).
>>> 
>>>     JBS: https://bugs.openjdk.java.net/browse/JDK-8235454
>>>     Webrev: http://cr.openjdk.java.net/~mseledtsov/8235454.00/
>>>     Testing:
>>>         1. Ran this test over 200 times on multiple platforms in the test cluster: ALL PASS
>>> 
>>>          2. Ran the group jdk_jfr_sanity 80 times: ALL PASS
>>> 
>>> Thank you,
>>> Misha
>>>