jtreg testing integrated

Thu May 22 11:59:02 PDT 2008

Mark Wielaard wrote:
> Hi Martin,
> 
> On Tue, 2008-05-20 at 06:00 -0700, Martin Buchholz wrote:
>> On Tue, May 20, 2008 at 2:32 AM, Mark Wielaard <mark at klomp.org> wrote:
>>>> I like a policy of "Read my lips; no new test failures" but OpenJDK
>>>> is not quite there; we get test failure creep when changes in
>>>> one component break another component's tests.
>>> Yes, that would be idea. At least for openjdk6/icedtea we seem to be
>>> pretty close actually. It will be more challenging for openjdk7. I
>>> haven't quite figured out all the dynamics around "workspace
>>> integration". But I assume we can get the master tree to zero fail and
>>> then demand that any integration cycle doesn't introduce regressions.
>> There are too many tests to require team integrators to run
>> them all on each integration cycle.
> 
> I am not sure. It does take about 3 hours to run all the included tests
> (and I assume that when we add more tests or integrate things like mauve
> it will rise). But I do hope people, not just integrators, will run them
> regularly. Especially when they are working on/integrating larger
> patches. And we can always fall back on autobuilders so we have a full
> report at least soon after something bad happens so there is some chance
> to revert a change relatively quickly.

3 hours for runs with:
    -client and -server?
    one OS?
    32bit and 64bit?

And you are only talking about the tests in the jdk/test area I assume.

The issue I have seen with the testing is that if it isn't done on a good
spread of options and platforms and variations, something gets missed,
there is little consistency between developers as to what the official
requires test matrix is. Once we have a 100% pass list, and a required
matrix, I have a system to help enforce these tests being run, but so
far, nobody has given me a way to run just that list.
Once I have it, I think we can keep all or most repositories golden.

The Hotspot team runs many many variations, using options like -Xcomp
and -Xbatch and loads more. But they are trying to stress the VM.

> 
>>   For a few years I've advocated
>> adding another level to the tree of workspaces.  My model is to
>> rename the current MASTER workspace to PURGATORY, and
>> add a "golden MASTER".
>> The idea is that once a week or so all tests are run exhaustively,
>> and when it is confirmed that there are no new test failures,
>> the tested code from PURGATORY is promoted to MASTER.
> 
> This is fascinating. Intuitively I would call for less levels instead of
> more because that makes issues show up earlier. It is one of the things
> I haven't really wrapped my head around. The proliferation of separate
> branches/workspaces. One main master tree where all work goes into by
> default and only have separate (ad hoc) branches/workspaces for larger
> work items that might be destabilizing seems an easier model to work
> with.

Easier model to work with for developers yes, I agree, and I have to admit
when I came into the java org from a different part of Sun, I was somewhat
amazed too.  But it does serve a very valuable purpose, it corrals and isolates
the different teams, and prevents major and semi-major regressions from
impacting everyone in the organization.   At a cost I suppose.

Each layer does add a cost... and the PURGATORY&GOLDEN idea is a good one,
but just like having different team areas, it adds another delay in people
seeing a change show up in the MASTER area.

I'm more of a 'test before integrate' person, with streamlining and
automating the testing process, making it part of the developer push process,
adapting the tests as major regressions sneak by (you can never catch all
regressions, no matter what you do). Blocking pushes on any failure.
So I'm trying to throw hardware at the problem until we can possible do
the "exhaustive testing" that Martin mentions, as part of a developer
pushing a change in, before anyone else sees it, all automated.

But for automation, I want a test guarantee:
   "these tests should pass 100% of the time on all platforms in all situations"
and then we can think about enforcing it.
None of this wishy washy "this test fails sometimes due to the phase of
the moon" crap. ...  oops, can I say crap in a public email??...  ;^}

-kto

> 
> Cheers,
> 
> Mark
>