jmx-dev [PATCH] JDK-8005472: com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh failed on windows

Tue May 14 15:44:57 PDT 2013

Hi, sorry for the delay in my reply, and thanks for the update.

A timeout of 30 seconds should be sufficient.

Regarding duplicates: I was just thinking, if you're expecting exactly 10 
notifications, you should ensure that you receive exactly 10 notifications, and 
they're the right ones. But if duplicates is the only possible way that more 
than 10 notifications could occur, then that's probably fine. I just don't know 
enough about JMX to know.

Now... I took a look at the Client.java implementation, and ... groan (sorry) 
... I see you're using the single-element array trick to work around the 
inability to mutate local variables from an anonymous inner class.

I strongly recommend that you avoid using this trick.

I don't know what thread the notification callbacks run on. If they are on 
different threads, then there are inherent memory visibility problems, since 
array elements cannot be declared volatile. There is also potential concurrent 
access to seqSet, since it's accessed from both callbacks.

(Hm, seqSet isn't modified anywhere. Should the notification sequence numbers 
be added to the set somewhere?)

One approach for dealing with this is to make all the mutable data structures 
thread-safe, e.g. by wrapping seqSet using Collections.synchronizedSet() and 
using an AtomicBoolean instead of boolean[1].

Another approach would be to have the callbacks just append the notification 
objects to a synchronized list, and then have the main test thread do 
postprocessing on the list to ensure that it got the notifications it expected 
and that there were no duplicates.

Either way is fine, in order to avoid the threading issues.

Thanks, and sorry to belabor this review.

s'marks

On 5/9/13 2:25 AM, Jaroslav Bachorik wrote:
> Hi Stuart,
>
> On St 8. květen 2013, 01:50:22 CEST, Stuart Marks wrote:
>> Hi Jaroslav,
>>
>> Great to see this shell test get rewritten!
>>
>> Looks like you're avoiding multiple JVM processes as well, by loading
>> the different versions of the classes into different classloaders. It
>> looks like a bit of trouble, but probably less than the amount of
>> trouble caused by the shell script.
>>
>> I have a couple minor points.
>>
>> The timeout value of one second seems quite low. Under normal
>> operation, spawning a couple threads and should proceed very quickly.
>> However, our testing environment is quite hostile, and things that
>> seem like they ought to proceed quickly often take considerably longer
>> than one might think. Since you're counting notifications, and in
>> normal operation they all come in, we don't wait for the actual
>> timeout unless there's a failure. So it might make sense to raise the
>> timeout to 10 or perhaps 30 seconds.
>
> I've adjusted the timeout for 30 seconds. Hopefully, it will be enough.
>
>>
>> On the other hand, I have a question about whether counting the number
>> of notifications is correct. Would it be possible for there to be a
>> bug where an extra notification is sent? If so, this might mean that
>> the test would exit prematurely, indicating success?
>
> It would be a severe error in the notification system implementation.
> However, I've added a check for duplicated notifications (each
> notification carries its sequence number) which makes the test fail in
> case of duplication. But I don't expect it to happen.
>
> The updated webrev is at
> http://cr.openjdk.java.net/~jbachorik/8005472/webrev.09
>
>
> -JB-
>
>>
>> s'marks
>>
>> On 5/6/13 2:04 AM, Jaroslav Bachorik wrote:
>>> On Pá 3. květen 2013, 16:16:53 CEST, Daniel Fuchs wrote:
>>>> Hi Jaroslav,
>>>>
>>>> In Client.java - you could consider replacing the AtomicLong
>>>> with a CountDownLatch.
>>>>
>>>> This would allow you to remove the various Thread.sleep() in the
>>>> code (in particular the one at the end).
>>>>
>>>> You could use CountDownLatch.await(long timeout, TimeUnit unit) to
>>>> avoid waiting for ever in case of bugs, and the advantage is that
>>>> the test would be able to exit as soon as the count down latch
>>>> reaches 0, without having to wait for an arbitrary timeout.
>>>
>>> Great, thanks for the pointer! I've changed the test to use the
>>> CountDownLatch.
>>>
>>> http://cr.openjdk.java.net/~jbachorik/8005472/webrev.08/
>>>
>>> -JB-
>>>
>>>>
>>>> Very nice to see a shell test go away :-)
>>>>
>>>> -- daniel
>>>>
>>>>
>>>> On 5/3/13 3:41 PM, Jaroslav Bachorik wrote:
>>>>> Please re-review the updated webrev
>>>>> http://cr.openjdk.java.net/~jbachorik/8005472/webrev.06
>>>>>
>>>>> I've replaced the shell script with the plain java test. The javac API
>>>>> is used to compile the the auxiliary classes as was recommended. This
>>>>> allowed to simplify the test.
>>>>>
>>>>> The test does not check for a certain string in the standard output
>>>>> anymore - it turns out that it is possible to count the number of all
>>>>> the received JMX notifications (even though some notifications can be
>>>>> lost, we receive a special notification with the number of the lost
>>>>> regular notifications). It is then possible to match the actual number
>>>>> of processed notifications (received + lost) against the expected
>>>>> number
>>>>> - different numbers mean that the notification processing thread had
>>>>> been interrupted unexpectedly.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -JB-
>>>>>
>>>>> On 8.2.2013 17:37, Chris Hegarty wrote:
>>>>>>
>>>>>>> Jon Gibbons suggested invoking the compiler API directly from java
>>>>>>> instead of writing a shell script. Doing this seems fairly simple,
>>>>>>> and I
>>>>>>> think it would be advantageous to keep things entirely in Java. I
>>>>>>> may
>>>>>>> attempt to rewrite the defaultSVID test using the compiler API.
>>>>>>
>>>>>> Here's a test that does just that.
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/2de8c6c2d652/test/sun/misc/JarIndex/metaInfFilenames/Basic.java
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Chris.
>>>>>
>>>>
>>>
>>>
>