jmx-dev [PATCH] JDK-8005472: com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh failed on windows

Thu Jan 10 13:44:02 PST 2013

On 1/10/13 7:20 AM, Jaroslav Bachorik wrote:
> Update: http://cr.openjdk.java.net/~jbachorik/8005472/webrev.04

Thanks for the update.

Note, argv[0] is used before argv.length is checked, so if no args are passed 
this gives index out of bounds instead of the usage message.

I see you take pains to write and flush the URL to stdout before writing the 
signaling file. Good. The obvious alternative (which I started writing but then 
erased) is just to put the URL into the signaling file. But this has a race 
between creation of the file and the writing of its contents. So, what you have 
works. (This kind of rendezvous problem occurs a lot; it seems like there ought 
to be a simpler way.)

I suspect the -e option caused hangs because if something failed, it would 
leave the server running, spoiling the next test run. The usual way to deal 
with this is to use the shell 'trap' statement, to kill subprocesses and remove 
temp files before exiting the shell. Probably a good practice in general, but 
perhaps too much shell hackery for this change. (Up to you if you want to 
tackle it.)

Regarding how the test is detecting success/failure, the concern is that if the 
client fails for some reason other than the failure being checked for, the test 
will still report passing. Since the error message is coming out of the client 
JVM, in principle it ought to be possible to redirect it somehow in order to do 
the assertion checking in Java. With the current shell scheme, not only are 
other failures reported as the test passing, these other failures are erased in 
the grep pipeline, so they're not even visible in the test log.

This last issue is rather far afield from this webrev, and fixing it will 
probably require some rearchitecting of the test. So maybe it should be 
considered independently. I just happened to notice this going on, and I 
noticed the similarity to what's going on in the RMI tests.

s'marks

> On 01/10/2013 09:52 AM, Stuart Marks wrote:
>> On 1/7/13 3:23 AM, Jaroslav Bachorik wrote:
>>> On 01/04/2013 11:37 PM, Kelly O'Hair wrote:
>>>> I suspect it is not hanging because it does not exist, but that some
>>>> other windows process has it's hands on it.
>>>> This is the stdout file from the server being started up right?
>>>> Could the server from a previous test run be still running?
>>>
>>> Exactly. Amy confirmed this and provided a patch which resolves the
>>> hanging problem.
>>>
>>> The update patch is at
>>> http://cr.openjdk.java.net/~jbachorik/8005472/webrev.01
>>
>> Hi Jaroslav,
>>
>> The change to remove the parentheses from around the server program
>> looks right. It avoids forking an extra process (at least in some
>> shells) and lets $! refer to the actual JVM, not an intermediate shell
>> process. The rm -f from Kelly's suggestion is good too.
>>
>> But there are other things wrong with the script. I don't think they
>> could cause hanging, but they could cause the script to fail in
>> unforeseen ways, or even to report success incorrectly.
>>
>> One problem is introduced by the change, where the Server's stderr is
>> also redirected into $URL_PATH along with stdout. This means that if the
>> Server program reports any errors, they'll get mixed into the URL_PATH
>> file instead of appearing in the test log. The URL_PATH file's contents
>> is never reported, so these error messages will be invisible.
>
> Fixed, only the stdout is redirected to $URL_PATH.
>
>>
>> The exit status of some of the critical commands (such as the
>> compilations) isn't checked, so if javac fails for some reason, the test
>> might not report failure. Instead, some weird error might or might not
>> be reported later (though one will still see the javac errors in the log).
>
> Fixed, introduced the check. The "set -e" was hanging the script so I
> have to check for the exit status manually.
>
>>
>> I don't think the sleep at line 80 is necessary, since the client runs
>> synchronously and should have exited by this point.
>
> And it's gone.
>
>>
>> The wait loop checking for the existence of the URL_PATH file doesn't
>> actually guarantee that the server is running or has initialized yet.
>> The file is actually created by the shell before the Server JVM starts
>> up. Thus, runClient might try to read from it before the server has
>> written anything to it. Or, as mentioned above, the server might have
>> written some error messages into the URL_PATH file instead of the
>> expected contents. Thus, the contents of the JMXURL variable can quite
>> possibly be incorrect.
>
> The err is not redirected to the file. A separate file is used to signal
> the availability of the server and that file is created from the java
> code after the server has been started. Also, the err and out  streams
> are flushed to make sure the JMX URL makes it into the file.
>
>>
>> If this occurs, what will happen when the client runs? It may emit some
>> error message, and this will be filtered out by the grep pipeline. Thus,
>> HAS_ERRORS might end up empty, and the test will report passing, even
>> though everything has failed!
>
> Shouldn't happen with only the controlled stdout redirected to the file.
>
>>
>> For this changeset I'd recommend at a minimum removing the redirection
>> of stderr to URL_PATH. If the server fails we'll at least see errors in
>> the test log.
>>
>> For checking the notification message, is there a way to modify the
>> client to report an exit status or throw an exception? Throwing an
>> exception from main() will exit the JVM with a nonzero status, so this
>> can be checked more easily from the script. I think this is less
>> error-prone than grepping the output for a specific error message. The
>> test should fail if there is *any* error; it should not succeed if an
>> expected error is absent.
>
> This is unfortunately not possible. The notification processing needs to
> be robust enough to prevent exiting JVM in cases like this. Therefore it
> only reports the problem, dumps the notification and carries on. The
> only place one can find something went wrong is the err stream.
>
>>
>> You might consider having jtreg build the client and server classes.
>> This might simplify some of the setup. Also, jtreg is meticulous about
>> aborting the test if any compilations fail, so it takes care of that for
>> you.
>
> I need same name classes with incompatible code compiled to two
> different locations - client and server. I was not able to figure out
> how to use jtreg to accomplish that.
>
> -JB-
>
>>
>> It would be nice if there were a better way to have the client
>> rendezvous with the server. I hate to suggest it, but sleeping
>> unconditionally after starting the server is probably necessary.
>> Anything more robust probably requires rearchitecting the test, though.
>>
>> Sorry to dump all this on you. But one of the shell-based RMI tests
>> suffers from *exactly* the same pathologies. (I have yet to fix it.)
>> Unfortunately, I believe that there are a lot of other shell-based tests
>> in the test suite that have similar problems. The lesson here is that
>> writing reliable shell tests is a lot harder than it seems.
>>
>> Thanks,
>>
>> s'marks
>