Need reviewers, jdk7 testing changes

Wed Dec 9 10:30:48 PST 2009

Martin Buchholz wrote:
> On Tue, Dec 8, 2009 at 16:42, Kelly O'Hair <Kelly.Ohair at sun.com> wrote:
> 
>>> In theory, the process is supposed to prevent
>>> regression test failures from ever creeping in - that's the whole point.
>>> When they do (inevitably) creep in,
>>> they are supposed to be aggressively targeted.
>>> A gatekeeper demonstrates the failure to a developer,
>>> and the developer is given X time units to fix the test breakage,
>>> or face reversion of the breakage-inducing change.
>> But it's not that simple. In some cases the cause of the failure is
>> a change made to a different repository, by a different team,
> 
> Changes that get in this way are signs of process failure.
> E.g. failing tests in java.util caused by hotspot commits

Deja Vu... we have had this discussion before. ;^)

How can any team possibly run ALL the tests?
We would all grind to a halt.
And what is this "signs of process failure", are you some kind
of manager now? ;^) "Hurumph... Every failure is a result of poor planning..."
(just yanking your chain. ;^)

Testing has become a balancing act for everyone, we do what
we can given the constraints we have. When things fall through
the cracks, we try and patch the cracks so it doesn't happen
again. We use what we can to automate, but there are limits.

> are signs the hotspot commit was inadequately tested.

I'll let you arm wrestle the hotspot guys over that comment. ;^)

> 
>> or even to the base system software via an update.
>> The ProblemList.txt file was meant to deal with that situation,
>> and also tests that are spuriously failing for unknown reasons.
> 
> I agree that flaky and platform-dependent failures are a big problem.
> 
>> When the gatekeeper can do what you say, that is great, and ideal.
>> But I just don't see it happening that way in all situations.
>>
>>> I would like to see more effort devoted to fixing the tests
>>> (or the code!) rather than adding infrastructure that might
>>> have the effect of hiding the test failures.
>> Sigh... I'm not trying to hide the failures, and if you haven't
>> noticed, I have fixed quite a few tests myself.
> 
> I have noticed, and I do appreciate it.
> 
> I do some flaky test fixing myself.

Well to your credit, tests you have worked on in the past are
typically not on my flaky list.
Unfortunately, some of your tests are damn good tests, and cut
a larger swath through the jdk logic than what I suspect your
original test focus was. So your tests may come under more
scrutiny as bugs in various areas get filed, pointing at your
tests and the nasty little test that found their bug. ;^)

> 
>> If anything, I'm making the fact that we have test failures more
>> public, and the ultimate goal is to fix the tests.
>> But I needed a baseline, a line in the sand, an expectation
>> of what tests should always pass. And an ability to run all
>> the tests in a timely manner.
> 
> I prefer the approach taken by my diff-javatest tool
> of not caring about how many tests failed with the
> reference jdk.  Just run them twice, and look for
> *new* failures.

What a pain... and what a loss, so many of these tests can become
healthy functioning members of society, given a little support and
a helping hand. And if they are just bad tests, I agree with
Mr. Scrooge let's 'decrease the surplus population'. ;^)

On diff-javatest, what if the moon is full, and the reference jdk
passed a test but your jdk didn't? For no good reason. I hate that.

> 
>> Now it's time to go through the ProblemList.txt and do a little
>> triage, file a few bugs, fix what tests can be fixed, and/or
>> correct my ProblemList.txt file if I got it wrong.
>>
>>> BTW, I run regtests on java.util collections regularly,
>>> but only on Linux.
>> I think it is an expected situation that many developers can only
>> run the tests on one platform.
>>
>> But do you expect the gatekeeper to run all the tests on all the
>> platforms?
>> And if for example, your tests fail on Windows you would be
>> given X time units to fix it? Could you?
> 
> Of course, you want the perfect combination of nasty and nice.
> For some test failures, it's unreasonable to expect the guilty party
> to fix them.  But they should be prepared to be helpful.

Agreed, but both helpful and pro-active in preventing it from happening
again. We want "good citizen" developers. Which in general I think we have,
I'm just trying to make sure there are lots of tools and mechanisms to
allow everyone to be "good citizens". ;^)

-kto

> 
> Martin