From martinrb at google.com Mon May 19 08:30:57 2008 From: martinrb at google.com (Martin Buchholz) Date: Mon, 19 May 2008 08:30:57 -0700 Subject: jtreg testing integrated In-Reply-To: <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> Message-ID: <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> [+compiler-dev, jtreg-use] On Mon, May 19, 2008 at 7:56 AM, Andrew John Hughes wrote: > 2008/5/19 Mark Wielaard : >> >> make jtregcheck -k runs the testsuites of hotspot (4 tests, all PASS), >> langtools (1,342 PASS, 1 FAIL - the version check) and jdk (2,875 tests >> of which about 130 fail - rerunning tests now). corba, jaxp and jaxws >> don't come with any tests. This takes about 3 hours on my machine. Once upon a time, I wrote a test that made sure the hotspot and jdk library's idea of the current version and supported targets were in sync. Unfortunately, it is not a requirement on hotspot integrations that they pass this test, so the test starts failing whenever hotspot starts supporting the class file version number for the next major release. At least this is a strong hint to the javac team to catch up soon by incrementing their supported targets, etc... I like a policy of "Read my lips; no new test failures" but OpenJDK is not quite there; we get test failure creep when changes in one component break another component's tests. >> Most of the failures are because the host javaweb.sfbay.sun.com cannot >> be resolved. The jtreg tests were originally designed to be run only by Sun JDK development and test engineers. If someone can come up with a portable way of testing network services (like ftp clients) without setting up a dedicated machine with a well-known name, that would be good. Alternatively, making the name of this machine configurable when jtreg is run would also be an improvement, and a much simpler one. But the obvious idea of using environment variables doesn't work. Most environment variables are not passed to the running java test program. If it's considered acceptable for IcedTea hackers to get their hands dirty with not-100%-free technology, y'all could try running the jtreg tests against IcedTea, vanilla OpenJDK7, OpenJDK6, and JDK 6u6, and comparing the test failures. I once wrote a script to compare two jtreg test runs, diff-javatest. Jonathan et al, could you work (with me) on releasing that as open source? Martin >> But there are also some genuine failures in java.awt.color, >> jmx.snmp, javax.script, javax.print, ... so enough to do for >> enterprising hackers! >> From Jonathan.Gibbons at Sun.COM Mon May 19 09:52:23 2008 From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons) Date: Mon, 19 May 2008 09:52:23 -0700 Subject: jtreg testing integrated In-Reply-To: <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> Message-ID: <7A698D2C-CC96-4117-8A8D-9F92CFFF0A2A@sun.com> Martin, jtreg is now open source, as of just before JavaOne. See http://openjdk.java.net/jtreg There is a small new utility called "jtdiff" that comes with jtreg that may do what you want. It will do n-way comparison of any JavaTest-type results (meaning jtreg, JCK, etc) where each set of results can be given as a JavaTest work directory, report directory, or just the summary.txt file within a report directory. The output can be plain text or HTML. jtdiff is within jtreg.jar, so the easiest way to invoke it is java -cp jtreg.jar com.sun.javatest.diff.Main -- Jon On May 19, 2008, at 8:30 AM, Martin Buchholz wrote: > [+compiler-dev, jtreg-use] > > On Mon, May 19, 2008 at 7:56 AM, Andrew John Hughes > wrote: >> 2008/5/19 Mark Wielaard : >>> >>> make jtregcheck -k runs the testsuites of hotspot (4 tests, all >>> PASS), >>> langtools (1,342 PASS, 1 FAIL - the version check) and jdk (2,875 >>> tests >>> of which about 130 fail - rerunning tests now). corba, jaxp and >>> jaxws >>> don't come with any tests. This takes about 3 hours on my machine. > > Once upon a time, I wrote a test that made sure the hotspot > and jdk library's idea of the current version and supported targets > were in sync. Unfortunately, it is not a requirement on hotspot > integrations that they pass this test, so the test starts failing > whenever > hotspot starts supporting the class file version number for the next > major release. At least this is a strong hint to the javac team to > catch up soon by incrementing their supported targets, etc... > > I like a policy of "Read my lips; no new test failures" but OpenJDK > is not quite there; we get test failure creep when changes in > one component break another component's tests. > >>> Most of the failures are because the host javaweb.sfbay.sun.com >>> cannot >>> be resolved. > > The jtreg tests were originally designed to be run only by Sun JDK > development and test engineers. If someone can come up with a > portable way of testing network services (like ftp clients) without > setting up a dedicated machine with a well-known name, that > would be good. Alternatively, making the name of this > machine configurable when jtreg is run would also > be an improvement, and a much simpler one. But the obvious > idea of using environment variables doesn't work. Most environment > variables are not passed to the running java test program. > > If it's considered acceptable for IcedTea hackers to get their > hands dirty with not-100%-free technology, y'all could try > running the jtreg tests against IcedTea, vanilla OpenJDK7, > OpenJDK6, and JDK 6u6, and comparing the test failures. > > I once wrote a script to compare two jtreg test runs, diff-javatest. > Jonathan et al, could you work (with me) on releasing that as open > source? > > Martin > >>> But there are also some genuine failures in java.awt.color, >>> jmx.snmp, javax.script, javax.print, ... so enough to do for >>> enterprising hackers! >>> From martinrb at google.com Mon May 19 14:33:59 2008 From: martinrb at google.com (Martin Buchholz) Date: Mon, 19 May 2008 14:33:59 -0700 Subject: jtreg testing integrated In-Reply-To: <7A698D2C-CC96-4117-8A8D-9F92CFFF0A2A@sun.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <7A698D2C-CC96-4117-8A8D-9F92CFFF0A2A@sun.com> Message-ID: <1ccfd1c10805191433i66f23073w7575d9d46acbdb6e@mail.gmail.com> Jonathan, Thanks for jtdiff. Suggestions: - The various help options are confusing. Just print all the help available if given -help or -usage. - The usage should give a paragraph explaining what it does. Not too much work; why, you've practically written the required words in the quoted text below. - My first version of diff-javatest was symmetrical. It printed any difference between the two runs, in both directions. Later I realized that (at least my own) usage invariably had the notion of a "reference" JDK and a "test" JDK. I was interested in tests run in the "test" JDK but not in the reference JDK, but not vice versa; typically the reference JDK results were historical ones produced by someone wearing a QA hat, and were more complete than the ones in the test JDK, where results were more likely to be part of an edit-compile-test cycle. Try printing the usage message from my own diff-javatest script, which should still be accessible inside Sun, in /java/tl, for example. Martin On Mon, May 19, 2008 at 9:52 AM, Jonathan Gibbons wrote: > Martin, > > jtreg is now open source, as of just before JavaOne. See > http://openjdk.java.net/jtreg > > There is a small new utility called "jtdiff" that comes with jtreg that may > do what > you want. It will do n-way comparison of any JavaTest-type results (meaning > jtreg, JCK, etc) > where each set of results can be given as a JavaTest work directory, report > directory, > or just the summary.txt file within a report directory. The output can be > plain text or HTML. > > jtdiff is within jtreg.jar, so the easiest way to invoke it is > java -cp jtreg.jar com.sun.javatest.diff.Main > > -- Jon > > On May 19, 2008, at 8:30 AM, Martin Buchholz wrote: > >> [+compiler-dev, jtreg-use] >> >> On Mon, May 19, 2008 at 7:56 AM, Andrew John Hughes >> wrote: >>> >>> 2008/5/19 Mark Wielaard : >>>> >>>> make jtregcheck -k runs the testsuites of hotspot (4 tests, all PASS), >>>> langtools (1,342 PASS, 1 FAIL - the version check) and jdk (2,875 tests >>>> of which about 130 fail - rerunning tests now). corba, jaxp and jaxws >>>> don't come with any tests. This takes about 3 hours on my machine. >> >> Once upon a time, I wrote a test that made sure the hotspot >> and jdk library's idea of the current version and supported targets >> were in sync. Unfortunately, it is not a requirement on hotspot >> integrations that they pass this test, so the test starts failing whenever >> hotspot starts supporting the class file version number for the next >> major release. At least this is a strong hint to the javac team to >> catch up soon by incrementing their supported targets, etc... >> >> I like a policy of "Read my lips; no new test failures" but OpenJDK >> is not quite there; we get test failure creep when changes in >> one component break another component's tests. >> >>>> Most of the failures are because the host javaweb.sfbay.sun.com cannot >>>> be resolved. >> >> The jtreg tests were originally designed to be run only by Sun JDK >> development and test engineers. If someone can come up with a >> portable way of testing network services (like ftp clients) without >> setting up a dedicated machine with a well-known name, that >> would be good. Alternatively, making the name of this >> machine configurable when jtreg is run would also >> be an improvement, and a much simpler one. But the obvious >> idea of using environment variables doesn't work. Most environment >> variables are not passed to the running java test program. >> >> If it's considered acceptable for IcedTea hackers to get their >> hands dirty with not-100%-free technology, y'all could try >> running the jtreg tests against IcedTea, vanilla OpenJDK7, >> OpenJDK6, and JDK 6u6, and comparing the test failures. >> >> I once wrote a script to compare two jtreg test runs, diff-javatest. >> Jonathan et al, could you work (with me) on releasing that as open source? >> >> Martin >> >>>> But there are also some genuine failures in java.awt.color, >>>> jmx.snmp, javax.script, javax.print, ... so enough to do for >>>> enterprising hackers! >>>> > > From Jonathan.Gibbons at Sun.COM Mon May 19 16:07:00 2008 From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons) Date: Mon, 19 May 2008 16:07:00 -0700 Subject: jtreg testing integrated In-Reply-To: <1ccfd1c10805191433i66f23073w7575d9d46acbdb6e@mail.gmail.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <7A698D2C-CC96-4117-8A8D-9F92CFFF0A2A@sun.com> <1ccfd1c10805191433i66f23073w7575d9d46acbdb6e@mail.gmail.com> Message-ID: <48320814.7070201@sun.com> Martin, (removed compiler-dev) Your comments about unsymmetric runs are interesting. jtdiff performs an n-way comparison, and I'd want to keep that functionality. The two use cases I had in mind were: -- Given a set of nightly builds on a set of platforms, compare the results across all the platforms, and report differences -- Given the same set of nightly builds on a set of platforms, for each platform perform a pair-wise comparison against the corresponding results last night/week/month. I'll see about adding an option to specify a reference set of results, for your "developer time" use case. --- Separately, check out the options for handling @ignore tests. Even on older versions of jtreg you can use "-k:!ignore" to ignore @ignore tests. (This works because @ignore tests are given an implicit "ignore" keyword.) With later versions of jtreg, you can use -Ignore:{quiet,error,run} to control how @ignore tests should be handled. Using this option, you should be to get closer to the goal of "all tests should pass", meaning that there are less failures and so less need to compare the output results with jtdiff. -- Jon Martin Buchholz wrote: > Jonathan, > > Thanks for jtdiff. > > Suggestions: > > - The various help options are confusing. > Just print all the help available if given -help or -usage. > - The usage should give a paragraph explaining what it does. > Not too much work; why, you've practically written the required > words in the quoted text below. > - My first version of diff-javatest was symmetrical. It printed any > difference between the two runs, in both directions. Later I > realized that (at least my own) usage invariably had the notion > of a "reference" JDK and a "test" JDK. I was interested in tests > run in the "test" JDK but not in the reference JDK, but not vice > versa; typically the reference JDK results were historical ones > produced by someone wearing a QA hat, and were more complete > than the ones in the test JDK, where results were more likely to be > part of an edit-compile-test cycle. > > Try printing the usage message from my own diff-javatest script, > which should still be accessible inside Sun, in /java/tl, for example. > > Martin > > On Mon, May 19, 2008 at 9:52 AM, Jonathan Gibbons > wrote: > >> Martin, >> >> jtreg is now open source, as of just before JavaOne. See >> http://openjdk.java.net/jtreg >> >> There is a small new utility called "jtdiff" that comes with jtreg that may >> do what >> you want. It will do n-way comparison of any JavaTest-type results (meaning >> jtreg, JCK, etc) >> where each set of results can be given as a JavaTest work directory, report >> directory, >> or just the summary.txt file within a report directory. The output can be >> plain text or HTML. >> >> jtdiff is within jtreg.jar, so the easiest way to invoke it is >> java -cp jtreg.jar com.sun.javatest.diff.Main >> >> -- Jon >> >> On May 19, 2008, at 8:30 AM, Martin Buchholz wrote: >> >> >>> [+compiler-dev, jtreg-use] >>> >>> On Mon, May 19, 2008 at 7:56 AM, Andrew John Hughes >>> wrote: >>> >>>> 2008/5/19 Mark Wielaard : >>>> >>>>> make jtregcheck -k runs the testsuites of hotspot (4 tests, all PASS), >>>>> langtools (1,342 PASS, 1 FAIL - the version check) and jdk (2,875 tests >>>>> of which about 130 fail - rerunning tests now). corba, jaxp and jaxws >>>>> don't come with any tests. This takes about 3 hours on my machine. >>>>> >>> Once upon a time, I wrote a test that made sure the hotspot >>> and jdk library's idea of the current version and supported targets >>> were in sync. Unfortunately, it is not a requirement on hotspot >>> integrations that they pass this test, so the test starts failing whenever >>> hotspot starts supporting the class file version number for the next >>> major release. At least this is a strong hint to the javac team to >>> catch up soon by incrementing their supported targets, etc... >>> >>> I like a policy of "Read my lips; no new test failures" but OpenJDK >>> is not quite there; we get test failure creep when changes in >>> one component break another component's tests. >>> >>> >>>>> Most of the failures are because the host javaweb.sfbay.sun.com cannot >>>>> be resolved. >>>>> >>> The jtreg tests were originally designed to be run only by Sun JDK >>> development and test engineers. If someone can come up with a >>> portable way of testing network services (like ftp clients) withoutjsqrdg >>> setting up a dedicated machine with a well-known name, that >>> would be good. Alternatively, making the name of this >>> machine configurable when jtreg is run would also >>> be an improvement, and a much simpler one. But the obvious >>> idea of using environment variables doesn't work. Most environment >>> variables are not passed to the running java test program. >>> >>> If it's considered acceptable for IcedTea hackers to get their >>> hands dirty with not-100%-free technology, y'all could try >>> running the jtreg tests against IcedTea, vanilla OpenJDK7, >>> OpenJDK6, and JDK 6u6, and comparing the test failures. >>> >>> I once wrote a script to compare two jtreg test runs, diff-javatest. >>> Jonathan et al, could you work (with me) on releasing that as open source? >>> >>> Martin >>> >>> >>>>> But there are also some genuine failures in java.awt.color, >>>>> jmx.snmp, javax.script, javax.print, ... so enough to do for >>>>> enterprising hackers! >>>>> >>>>> >> From martinrb at google.com Tue May 20 05:36:23 2008 From: martinrb at google.com (Martin Buchholz) Date: Tue, 20 May 2008 05:36:23 -0700 Subject: jtreg testing integrated In-Reply-To: <48320814.7070201@sun.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <7A698D2C-CC96-4117-8A8D-9F92CFFF0A2A@sun.com> <1ccfd1c10805191433i66f23073w7575d9d46acbdb6e@mail.gmail.com> <48320814.7070201@sun.com> Message-ID: <1ccfd1c10805200536y3df6ed11id6b0476eec250572@mail.gmail.com> Wait, now I remember, I used to wear the integrator hat too, and I wanted the ability to data-mine a long series of javatest runs. I never tackled that problem. The general problem is really hard. You want a report that says things like Test FOO has been failing intermittently since the Ides of March, but only on 64-bit x86 platforms. Martin On Mon, May 19, 2008 at 4:07 PM, Jonathan Gibbons wrote: > Martin, > > (removed compiler-dev) > > Your comments about unsymmetric runs are interesting. jtdiff performs an > n-way > comparison, and I'd want to keep that functionality. > > The two use cases I had in mind were: > > -- Given a set of nightly builds on a set of platforms, compare the results > across > all the platforms, and report differences > > -- Given the same set of nightly builds on a set of platforms, for each > platform > perform a pair-wise comparison against the corresponding results last > night/week/month. > > I'll see about adding an option to specify a reference set of results, for > your > "developer time" use case. > From martinrb at google.com Tue May 20 06:00:53 2008 From: martinrb at google.com (Martin Buchholz) Date: Tue, 20 May 2008 06:00:53 -0700 Subject: jtreg testing integrated In-Reply-To: <1211275953.3284.33.camel@dijkstra.wildebeest.org> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> Message-ID: <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> On Tue, May 20, 2008 at 2:32 AM, Mark Wielaard wrote: >> I like a policy of "Read my lips; no new test failures" but OpenJDK >> is not quite there; we get test failure creep when changes in >> one component break another component's tests. > > Yes, that would be idea. At least for openjdk6/icedtea we seem to be > pretty close actually. It will be more challenging for openjdk7. I > haven't quite figured out all the dynamics around "workspace > integration". But I assume we can get the master tree to zero fail and > then demand that any integration cycle doesn't introduce regressions. There are too many tests to require team integrators to run them all on each integration cycle. For a few years I've advocated adding another level to the tree of workspaces. My model is to rename the current MASTER workspace to PURGATORY, and add a "golden MASTER". The idea is that once a week or so all tests are run exhaustively, and when it is confirmed that there are no new test failures, the tested code from PURGATORY is promoted to MASTER. > That brings up the question how to export a JTreport/JTwork environment. > I only posted the text results http://icedtea.classpath.org/~mjw/jtreg/ > since the JTreport and JTwork files have all kinds of hard coded > absolute path references. It would be nice to be able to export it all > so I can upload it to some public site for others to look at and compare > with. My (unavailable) diff-javatest script had to contend with absolute paths in the html in the report directory as well. It made paths relative by removing root dirs. It would be good if javatest's output was made more "portable" in this sense. It's hard, because you really do want direct pointers to failing tests and .jtr files, and their location relative to the report directory cannot in general be relativized. Martin From mark at klomp.org Tue May 20 02:32:33 2008 From: mark at klomp.org (Mark Wielaard) Date: Tue, 20 May 2008 11:32:33 +0200 Subject: jtreg testing integrated In-Reply-To: <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> Message-ID: <1211275953.3284.33.camel@dijkstra.wildebeest.org> Hi Martin, On Mon, 2008-05-19 at 08:30 -0700, Martin Buchholz wrote: > On Mon, May 19, 2008 at 7:56 AM, Andrew John Hughes > wrote: > > 2008/5/19 Mark Wielaard : > >> > >> make jtregcheck -k runs the testsuites of hotspot (4 tests, all PASS), > >> langtools (1,342 PASS, 1 FAIL - the version check) and jdk (2,875 tests > >> of which about 130 fail - rerunning tests now). corba, jaxp and jaxws > >> don't come with any tests. This takes about 3 hours on my machine. > > Once upon a time, I wrote a test that made sure the hotspot > and jdk library's idea of the current version and supported targets > were in sync. Unfortunately, it is not a requirement on hotspot > integrations that they pass this test, so the test starts failing whenever > hotspot starts supporting the class file version number for the next > major release. At least this is a strong hint to the javac team to > catch up soon by incrementing their supported targets, etc... In this case it is a more mundane version check failure: tools/javac/versions/check.sh javac thinks its version isn't 1.6.0, but javac 1.6.0-internal > I like a policy of "Read my lips; no new test failures" but OpenJDK > is not quite there; we get test failure creep when changes in > one component break another component's tests. Yes, that would be idea. At least for openjdk6/icedtea we seem to be pretty close actually. It will be more challenging for openjdk7. I haven't quite figured out all the dynamics around "workspace integration". But I assume we can get the master tree to zero fail and then demand that any integration cycle doesn't introduce regressions. > The jtreg tests were originally designed to be run only by Sun JDK > development and test engineers. If someone can come up with a > portable way of testing network services (like ftp clients) without > setting up a dedicated machine with a well-known name, that > would be good. Alternatively, making the name of this > machine configurable when jtreg is run would also > be an improvement, and a much simpler one. But the obvious > idea of using environment variables doesn't work. Most environment > variables are not passed to the running java test program. Making it configurable, or even ignorable with keywords would be crucial for distribution testing. Most distributions don't allow their build daemons to access the network. But for quality control it is essential that they do run the full test suite. I haven't made an inventory of what services would be needed on a machine to be replace javaweb.sfbay.sun.com with a public machine, but we can certainly run some services on icedtea.classpath.org or maybe one of the openjdk.java.net machines. > If it's considered acceptable for IcedTea hackers to get their > hands dirty with not-100%-free technology, y'all could try > running the jtreg tests against IcedTea, vanilla OpenJDK7, > OpenJDK6, and JDK 6u6, and comparing the test failures. :) Well, the whole idea behind IcedTea is to provide an OpenJDK derivative that doesn't depend on any non-free build or runtime requirements. But I am certainly interested in comparing results. I do think OpenJDK6 and IcedTea are now so close that we shouldn't be seeing any test result differences between the two. That brings up the question how to export a JTreport/JTwork environment. I only posted the text results http://icedtea.classpath.org/~mjw/jtreg/ since the JTreport and JTwork files have all kinds of hard coded absolute path references. It would be nice to be able to export it all so I can upload it to some public site for others to look at and compare with. Cheers, MArk From twisti at complang.tuwien.ac.at Tue May 20 01:23:55 2008 From: twisti at complang.tuwien.ac.at (Christian Thalinger) Date: Tue, 20 May 2008 10:23:55 +0200 Subject: jtreg testing integrated In-Reply-To: <48320814.7070201@sun.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <7A698D2C-CC96-4117-8A8D-9F92CFFF0A2A@sun.com> <1ccfd1c10805191433i66f23073w7575d9d46acbdb6e@mail.gmail.com> <48320814.7070201@sun.com> Message-ID: <1211271835.31722.31.camel@imac523d.theobroma-systems.com> On Mon, 2008-05-19 at 16:07 -0700, Jonathan Gibbons wrote: > Martin, > > (removed compiler-dev) > > Your comments about unsymmetric runs are interesting. jtdiff performs > an n-way > comparison, and I'd want to keep that functionality. > > The two use cases I had in mind were: > > -- Given a set of nightly builds on a set of platforms, compare the > results across > all the platforms, and report differences > > -- Given the same set of nightly builds on a set of platforms, for each > platform > perform a pair-wise comparison against the corresponding results > last night/week/month. That's exactly what I want to have. Does jtdiff currently support anything of the above? - twisti From Jonathan.Gibbons at Sun.COM Tue May 20 08:04:31 2008 From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons) Date: Tue, 20 May 2008 08:04:31 -0700 Subject: jtreg testing integrated In-Reply-To: <1211271835.31722.31.camel@imac523d.theobroma-systems.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <7A698D2C-CC96-4117-8A8D-9F92CFFF0A2A@sun.com> <1ccfd1c10805191433i66f23073w7575d9d46acbdb6e@mail.gmail.com> <48320814.7070201@sun.com> <1211271835.31722.31.camel@imac523d.theobroma-systems.com> Message-ID: <3692E890-F5C7-489B-9FED-BB01F0368238@Sun.COM> Twisti, As I said, jtreg is (just) a relatively simple n-way diff program for comparing sets of jtreg results. By itself, it does not provide any infrastructure for maintaining those results for wider later comparison. However, if you were to organize your results into a directory tree such as jtreg/DATE/PLATFORM then it should be reasonably easy to write scripts to the platform- wide or date-based comparison runs. I don't think I mentioned yesterday, jtdiff has an Ant task too, so maybe you can do the processing you need inside Ant. You should also be able to invoke jtdiff directly from other Java code too. -- Jon On May 20, 2008, at 1:23 AM, Christian Thalinger wrote: > On Mon, 2008-05-19 at 16:07 -0700, Jonathan Gibbons wrote: >> Martin, >> >> (removed compiler-dev) >> >> Your comments about unsymmetric runs are interesting. jtdiff >> performs >> an n-way >> comparison, and I'd want to keep that functionality. >> >> The two use cases I had in mind were: >> >> -- Given a set of nightly builds on a set of platforms, compare the >> results across >> all the platforms, and report differences >> >> -- Given the same set of nightly builds on a set of platforms, for >> each >> platform >> perform a pair-wise comparison against the corresponding results >> last night/week/month. > > That's exactly what I want to have. Does jtdiff currently support > anything of the above? > > - twisti > From Jonathan.Gibbons at Sun.COM Tue May 20 08:10:09 2008 From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons) Date: Tue, 20 May 2008 08:10:09 -0700 Subject: jtreg testing integrated In-Reply-To: <1ccfd1c10805200536y3df6ed11id6b0476eec250572@mail.gmail.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <7A698D2C-CC96-4117-8A8D-9F92CFFF0A2A@sun.com> <1ccfd1c10805191433i66f23073w7575d9d46acbdb6e@mail.gmail.com> <48320814.7070201@sun.com> <1ccfd1c10805200536y3df6ed11id6b0476eec250572@mail.gmail.com> Message-ID: <267A8587-D86F-4B86-9525-E9F45271FF77@Sun.COM> Martin, One benefit of jtdiff is that it only requires you to keep the summary.txt file, and not the whole report or work directory. So, it might be possible to keep more historical data and analyse it going forward. Also, jtdiff has "pluggable output formatters", so you could write an XML output formatter and just keep jtdiff reports and analyse those for historical trends. -- Jon On May 20, 2008, at 5:36 AM, Martin Buchholz wrote: > Wait, now I remember, I used to wear the integrator hat too, > and I wanted the ability to data-mine a long series of javatest runs. > I never tackled that problem. > > The general problem is really hard. > You want a report that says things like > > Test FOO has been failing intermittently since the Ides of March, > but only on 64-bit x86 platforms. > > Martin > > On Mon, May 19, 2008 at 4:07 PM, Jonathan Gibbons > wrote: >> Martin, >> >> (removed compiler-dev) >> >> Your comments about unsymmetric runs are interesting. jtdiff >> performs an >> n-way >> comparison, and I'd want to keep that functionality. >> >> The two use cases I had in mind were: >> >> -- Given a set of nightly builds on a set of platforms, compare the >> results >> across >> all the platforms, and report differences >> >> -- Given the same set of nightly builds on a set of platforms, for >> each >> platform >> perform a pair-wise comparison against the corresponding results >> last >> night/week/month. >> >> I'll see about adding an option to specify a reference set of >> results, for >> your >> "developer time" use case. >> From Jonathan.Gibbons at Sun.COM Tue May 20 08:24:09 2008 From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons) Date: Tue, 20 May 2008 08:24:09 -0700 Subject: jtreg testing integrated In-Reply-To: <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> Message-ID: On May 20, 2008, at 6:00 AM, Martin Buchholz wrote: > > My (unavailable) diff-javatest script had to contend with absolute > paths in the html in the report directory as well. It made paths > relative by removing root dirs. It would be good if javatest's output > was made more "portable" in this sense. It's hard, because you > really do want direct pointers to failing tests and .jtr files, and > their location relative to the report directory cannot in general > be relativized. > > Martin In times past, we tried to resolve the "report" problem you describe within JavaTest. We actually tried use relative pointers where possible. The problem was that every solution we came up with broke someone's use case. A particularly notable problem was people running tests on one system and then moving results around to another system. The solution was to provide a utility called "EditLinks" within the JavaTest framework. I assume it is still available within JT Harness. This is a simple utility for post-processing the links within report files so that you can move report files and work directories around as you choose. [ Another possibility in the JT Harness space is that it now has a much more configurable report generator. Perhaps the time has come to look again at the relationship between the work and report directories. JT Harness lives at http://jtharness.dev.java.net with a mailing list at interest at jtharness.dev.java.net. ] -- Jon From Jonathan.Gibbons at Sun.COM Tue May 20 09:56:51 2008 From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons) Date: Tue, 20 May 2008 09:56:51 -0700 Subject: jtreg testing integrated In-Reply-To: <1211275953.3284.33.camel@dijkstra.wildebeest.org> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> Message-ID: <4B402212-E420-4EBF-87F1-CDB4FF648620@sun.com> On May 20, 2008, at 2:32 AM, Mark Wielaard wrote: > Hi Martin, > > On Mon, 2008-05-19 at 08:30 -0700, Martin Buchholz wrote: >> On Mon, May 19, 2008 at 7:56 AM, Andrew John Hughes >> wrote: >>> 2008/5/19 Mark Wielaard : >>>> >>>> make jtregcheck -k runs the testsuites of hotspot (4 tests, all >>>> PASS), >>>> langtools (1,342 PASS, 1 FAIL - the version check) and jdk (2,875 >>>> tests >>>> of which about 130 fail - rerunning tests now). corba, jaxp and >>>> jaxws >>>> don't come with any tests. This takes about 3 hours on my machine. >> >> Once upon a time, I wrote a test that made sure the hotspot >> and jdk library's idea of the current version and supported targets >> were in sync. Unfortunately, it is not a requirement on hotspot >> integrations that they pass this test, so the test starts failing >> whenever >> hotspot starts supporting the class file version number for the next >> major release. At least this is a strong hint to the javac team to >> catch up soon by incrementing their supported targets, etc... > > In this case it is a more mundane version check failure: > tools/javac/versions/check.sh > javac thinks its version isn't 1.6.0, but javac 1.6.0-internal > >> I like a policy of "Read my lips; no new test failures" but OpenJDK >> is not quite there; we get test failure creep when changes in >> one component break another component's tests. > > Yes, that would be idea. At least for openjdk6/icedtea we seem to be > pretty close actually. It will be more challenging for openjdk7. I > haven't quite figured out all the dynamics around "workspace > integration". But I assume we can get the master tree to zero fail and > then demand that any integration cycle doesn't introduce regressions. > >> The jtreg tests were originally designed to be run only by Sun JDK >> development and test engineers. If someone can come up with a >> portable way of testing network services (like ftp clients) without >> setting up a dedicated machine with a well-known name, that >> would be good. Alternatively, making the name of this >> machine configurable when jtreg is run would also >> be an improvement, and a much simpler one. But the obvious >> idea of using environment variables doesn't work. Most environment >> variables are not passed to the running java test program. > > Making it configurable, or even ignorable with keywords would be > crucial > for distribution testing. Most distributions don't allow their build > daemons to access the network. But for quality control it is essential > that they do run the full test suite. jtreg allows tests to be tagged with keyords, that can be used on the command line to filter the tests to be executed. > > > I haven't made an inventory of what services would be needed on a > machine to be replace javaweb.sfbay.sun.com with a public machine, but > we can certainly run some services on icedtea.classpath.org or maybe > one > of the openjdk.java.net machines. > >> If it's considered acceptable for IcedTea hackers to get their >> hands dirty with not-100%-free technology, y'all could try >> running the jtreg tests against IcedTea, vanilla OpenJDK7, >> OpenJDK6, and JDK 6u6, and comparing the test failures. > > :) Well, the whole idea behind IcedTea is to provide an OpenJDK > derivative that doesn't depend on any non-free build or runtime > requirements. > > But I am certainly interested in comparing results. I do think > OpenJDK6 > and IcedTea are now so close that we shouldn't be seeing any test > result > differences between the two. > > That brings up the question how to export a JTreport/JTwork > environment. > I only posted the text results http://icedtea.classpath.org/~mjw/ > jtreg/ > since the JTreport and JTwork files have all kinds of hard coded > absolute path references. It would be nice to be able to export it all > so I can upload it to some public site for others to look at and > compare > with. At a minimum, you'd want to publish the summary.txt files from the report directory. Note also that JT Harness comes with a couple of servlets you can install for pretty viewing .jtr and .jtx files. > > > Cheers, > > MArk > From martinrb at google.com Thu May 22 08:12:07 2008 From: martinrb at google.com (Martin Buchholz) Date: Thu, 22 May 2008 08:12:07 -0700 Subject: jtreg testing integrated In-Reply-To: <1211466426.4054.42.camel@dijkstra.wildebeest.org> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> <1211466426.4054.42.camel@dijkstra.wildebeest.org> Message-ID: <1ccfd1c10805220812p2a81641ct49c4f6a8cd339ca7@mail.gmail.com> [+quality-discuss, jdk7-gk] On Thu, May 22, 2008 at 7:27 AM, Mark Wielaard wrote: > Hi Martin, > > On Tue, 2008-05-20 at 06:00 -0700, Martin Buchholz wrote: >> On Tue, May 20, 2008 at 2:32 AM, Mark Wielaard wrote: >> >> I like a policy of "Read my lips; no new test failures" but OpenJDK >> >> is not quite there; we get test failure creep when changes in >> >> one component break another component's tests. >> > >> > Yes, that would be idea. At least for openjdk6/icedtea we seem to be >> > pretty close actually. It will be more challenging for openjdk7. I >> > haven't quite figured out all the dynamics around "workspace >> > integration". But I assume we can get the master tree to zero fail and >> > then demand that any integration cycle doesn't introduce regressions. >> >> There are too many tests to require team integrators to run >> them all on each integration cycle. > > I am not sure. It does take about 3 hours to run all the included tests > (and I assume that when we add more tests or integrate things like mauve > it will rise). Not all the regression tests are open source yet, and not all the test suites available are open source (and some are likely to be permanently encumbered). And we should be adding more static analysis tools to the testing process. It sure would be nice to run all tests with -server and -client, and with different GCs, and on 32 and 64-bit platforms, with Java assertions enabled and disabled, with C++ assertions enabled and disabled. Soon a "full" testing cycle looks like it might take a week. But I do hope people, not just integrators, will run them > regularly. Especially when they are working on/integrating larger > patches. And we can always fall back on autobuilders so we have a full > report at least soon after something bad happens so there is some chance > to revert a change relatively quickly. Much of the world works on this model - commit to trunk, wait for trouble, revert. It's certainly much cheaper, and gets feedback quicker, but creates fear among developers ("Notoriously careless developer X just did a commit. I think I'll wait for a week before pulling") >> For a few years I've advocated >> adding another level to the tree of workspaces. My model is to >> rename the current MASTER workspace to PURGATORY, and >> add a "golden MASTER". >> The idea is that once a week or so all tests are run exhaustively, >> and when it is confirmed that there are no new test failures, >> the tested code from PURGATORY is promoted to MASTER. > > This is fascinating. Intuitively I would call for less levels instead of > more because that makes issues show up earlier. It is one of the things > I haven't really wrapped my head around. The proliferation of separate > branches/workspaces. One main master tree where all work goes into by > default and only have separate (ad hoc) branches/workspaces for larger > work items that might be destabilizing seems an easier model to work > with. It's certainly more work for the integrators. But for the developers my model is simple and comfortable. Youir integrator will give you a workspace to commit changes to. Commit there whenever you feel like. Go on to the next coding task. Your changes will take a while to percolate into MASTER, but what do you care? When you sync, you pull in changes from MASTER, which are *guaranteed* to not break any of your tests. If you want specific changes quickly, pull from PURGATORY or a less-tested team workspace. If you have a project where you need to share your work with other developers immediately, no problem - just create a project-specific shared workspace that all project team members can commit to directly. Decide on a level of testing the team is comfortable with - including none at all. Developers in my model are more productive partly because they don't have to be afraid of breaking other developers. They can do enough testing for 95% confidence (which for many changes might mean no testing at all) then commit. The system will push back buggy changes automatically. Too many times I've suffered because tests in library land have been broken by changes in hotspot. Nevertheless, the JDK MASTER is remarkably stable for a project with so many developers, largely because of the gradual integration process, with changes going into MASTER only after being tested by integrators. JDK developers don't go around chatting about "build weather" - is the build broken today? AGAIN? This development model doesn't work as well for most open source projects, because they have fewer, smarter, and more dedicated developers, so there is less need. Also, it's hard to find good integrators. Most people (like myself) end up doing it as a part-time job. But just like source code control systems have gotten sexy, perhaps someday "code integration and testing systems" will become sexy, and everyone will want to write one. Martin From mark at klomp.org Thu May 22 08:20:22 2008 From: mark at klomp.org (Mark Wielaard) Date: Thu, 22 May 2008 17:20:22 +0200 Subject: jtreg testing integrated In-Reply-To: References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> Message-ID: <1211469622.4054.48.camel@dijkstra.wildebeest.org> Hi Jon, On Tue, 2008-05-20 at 08:24 -0700, Jonathan Gibbons wrote: > The solution was to provide a utility called "EditLinks" within the > JavaTest > framework. I assume it is still available within JT Harness. This is a > simple > utility for post-processing the links within report files so that you > can move > report files and work directories around as you choose. Heay that is pretty neat! So I just did this in my icedtea6 dir: $ for i in hotspot langtools jdk; do java -cp test/jtreg.jar \ com.sun.javatest.EditLinks -e `pwd` \ http://icedtea.classpath.org/~mjw/jtreg test/$i; done And uploaded the results: http://icedtea.classpath.org/~mjw/jtreg/test/ And indeed the links now work and you can get a html overview of the failure lists and correct links to the .jtr files such as: http://icedtea.classpath.org/~mjw/jtreg/test/jdk/JTreport/html/failed.html Great, Mark From mlists at juma.me.uk Thu May 22 08:35:00 2008 From: mlists at juma.me.uk (Ismael Juma) Date: Thu, 22 May 2008 15:35:00 +0000 (UTC) Subject: jtreg testing integrated References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> <1211466426.4054.42.camel@dijkstra.wildebeest.org> <1ccfd1c10805220812p2a81641ct49c4f6a8cd339ca7@mail.gmail.com> Message-ID: Martin Buchholz writes: > This development model doesn't work as well for most > open source projects, because they have fewer, smarter, and more > dedicated developers, so there is less need. It's also related to the rate of change taking place (which often is correlated to the amount of developers in the project). As an external observer, it seems to me like the Linux kernel has a similar model to OpenJDK with several integrators at different levels (Linus, Andrew Morton, subsystem maintainers, arch maintainers, etc.). The rate of change that takes place in each Linux kernel release is huge and it would be hard to achieve it in any other way. Regards, Ismael From martinrb at google.com Thu May 22 08:47:23 2008 From: martinrb at google.com (Martin Buchholz) Date: Thu, 22 May 2008 08:47:23 -0700 Subject: jtreg testing integrated In-Reply-To: References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> <1211466426.4054.42.camel@dijkstra.wildebeest.org> <1ccfd1c10805220812p2a81641ct49c4f6a8cd339ca7@mail.gmail.com> Message-ID: <1ccfd1c10805220847o7b76201bj412441ab269fca2b@mail.gmail.com> It's true that the Linux kernel is developed in a distributed tree-of-trees model, just like OpenJDK, and Linus himself is the most outspoken advocate of distributed source code control systems. (I mostly agree with him there) One difference is the culture of testing. The Linux kernel is hard to test, and doesn't seem to have a strong culture of testing, while the JDK has on the order of a million tests available to be run, which makes great stability and reliability possible. Martin On Thu, May 22, 2008 at 8:35 AM, Ismael Juma wrote: > Martin Buchholz writes: >> This development model doesn't work as well for most >> open source projects, because they have fewer, smarter, and more >> dedicated developers, so there is less need. > > It's also related to the rate of change taking place (which often is correlated > to the amount of developers in the project). As an external observer, it seems > to me like the Linux kernel has a similar model to OpenJDK with several > integrators at different levels (Linus, Andrew Morton, subsystem maintainers, > arch maintainers, etc.). The rate of change that takes place in each Linux > kernel release is huge and it would be hard to achieve it in any other way. > > Regards, > Ismael > > From mlists at juma.me.uk Thu May 22 09:25:13 2008 From: mlists at juma.me.uk (Ismael Juma) Date: Thu, 22 May 2008 16:25:13 +0000 (UTC) Subject: jtreg testing integrated References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> <1211466426.4054.42.camel@dijkstra.wildebeest.org> <1ccfd1c10805220812p2a81641ct49c4f6a8cd339ca7@mail.gmail.com> <1ccfd1c10805220847o7b76201bj412441ab269fca2b@mail.gmail.com> Message-ID: Martin Buchholz writes: > One difference is the culture of testing. The Linux kernel > is hard to test, and doesn't seem to have a strong culture > of testing, while the JDK has on the order of a million tests > available to be run, which makes great stability and reliability > possible. While I understand the testing point, I am not convinced that the conclusion follows in practice. More specifically, it seems like you're implying that the JDK is more stable and reliable than the Linux kernel. :) It's always hard to make general judgements based on personal experiences, but I don't remember when I last had a kernel panic and I always use the latest stable release. Admittedly I am mostly a desktop/server user, so I don't touch the flakier parts of the kernel like suspend and resume. HotSpot -server on the other hand has been less than inspiring since jdk6u4 when running some very popular applications. I ran across an easy to reproduce crash running eclipse[1], a different crash running the eclipse compiler[2] and index corruption in Lucene[3]. All of them started from jdk6u4 and none have been fixed as of jdk6u10 b24. So maybe there's still some work to be done to achieve great stability and reliability. :) Yes, I am aware that bugs will always exist, I am just a bit sad that such nasty problems were introduced in a stable release and no fix exists yet. It also doesn't help that HotSpot has always been rock-solid in the past, so you could call me spoiled. :) Regards, Ismael [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6614100 -> Note that this still happens with the latest jdk6u10 beta (b24) unlike what is implied by the resolution of 6659207 (which someone decided 6614100 was a duplicate of). [2] http://icedtea.classpath.org/bugzilla/show_bug.cgi?id=152 -> Note that some people posted similar crash dumps in [1], but the original description of [1] was for a crash in a different point. [3] http://tinyurl.com/64c9px (Lucene JIRA) From mark at klomp.org Thu May 22 07:16:18 2008 From: mark at klomp.org (Mark Wielaard) Date: Thu, 22 May 2008 16:16:18 +0200 Subject: jtreg testing integrated In-Reply-To: <48320814.7070201@sun.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <7A698D2C-CC96-4117-8A8D-9F92CFFF0A2A@sun.com> <1ccfd1c10805191433i66f23073w7575d9d46acbdb6e@mail.gmail.com> <48320814.7070201@sun.com> Message-ID: <1211465778.4054.32.camel@dijkstra.wildebeest.org> Hi Jonathan, On Mon, 2008-05-19 at 16:07 -0700, Jonathan Gibbons wrote: > Separately, check out the options for handling @ignore tests. Even on > older versions > of jtreg you can use "-k:!ignore" to ignore @ignore tests. (This works > because @ignore > tests are given an implicit "ignore" keyword.) With later versions of > jtreg, you can use > -Ignore:{quiet,error,run} to control how @ignore tests should be > handled. Using this > option, you should be to get closer to the goal of "all tests should > pass", meaning > that there are less failures and so less need to compare the output > results with jtdiff. This is a really a great feature! For icedtea we now use "-v1 -a -ignore:quiet", that give output and results that should be pretty familiar to people. And this is the set that I hope we can get to be all PASS in the default case. One extension might be to have a "-ignore:try" that does try to run it, that doesn't report it failure, but that does flag it as unexpected XPASS to alert people to bugs that are (accidentally) fixed but where the testcase was not yet enabled. Cheers, Mark From mark at klomp.org Thu May 22 07:27:06 2008 From: mark at klomp.org (Mark Wielaard) Date: Thu, 22 May 2008 16:27:06 +0200 Subject: jtreg testing integrated In-Reply-To: <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> Message-ID: <1211466426.4054.42.camel@dijkstra.wildebeest.org> Hi Martin, On Tue, 2008-05-20 at 06:00 -0700, Martin Buchholz wrote: > On Tue, May 20, 2008 at 2:32 AM, Mark Wielaard wrote: > >> I like a policy of "Read my lips; no new test failures" but OpenJDK > >> is not quite there; we get test failure creep when changes in > >> one component break another component's tests. > > > > Yes, that would be idea. At least for openjdk6/icedtea we seem to be > > pretty close actually. It will be more challenging for openjdk7. I > > haven't quite figured out all the dynamics around "workspace > > integration". But I assume we can get the master tree to zero fail and > > then demand that any integration cycle doesn't introduce regressions. > > There are too many tests to require team integrators to run > them all on each integration cycle. I am not sure. It does take about 3 hours to run all the included tests (and I assume that when we add more tests or integrate things like mauve it will rise). But I do hope people, not just integrators, will run them regularly. Especially when they are working on/integrating larger patches. And we can always fall back on autobuilders so we have a full report at least soon after something bad happens so there is some chance to revert a change relatively quickly. > For a few years I've advocated > adding another level to the tree of workspaces. My model is to > rename the current MASTER workspace to PURGATORY, and > add a "golden MASTER". > The idea is that once a week or so all tests are run exhaustively, > and when it is confirmed that there are no new test failures, > the tested code from PURGATORY is promoted to MASTER. This is fascinating. Intuitively I would call for less levels instead of more because that makes issues show up earlier. It is one of the things I haven't really wrapped my head around. The proliferation of separate branches/workspaces. One main master tree where all work goes into by default and only have separate (ad hoc) branches/workspaces for larger work items that might be destabilizing seems an easier model to work with. Cheers, Mark From Jonathan.Gibbons at Sun.COM Thu May 22 10:00:25 2008 From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons) Date: Thu, 22 May 2008 10:00:25 -0700 Subject: jtreg testing integrated In-Reply-To: <1211465778.4054.32.camel@dijkstra.wildebeest.org> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <7A698D2C-CC96-4117-8A8D-9F92CFFF0A2A@sun.com> <1ccfd1c10805191433i66f23073w7575d9d46acbdb6e@mail.gmail.com> <48320814.7070201@sun.com> <1211465778.4054.32.camel@dijkstra.wildebeest.org> Message-ID: <50814D57-B9B9-40F7-BA08-1C95528AD989@Sun.COM> Mark, The suggestion regarding -ignore:try is interesting, but I'd have to think about how it could be done. The underlying JT Harness does not support such a concept, and I doubt that it would be easy to add it. We'd have to create a side table in jtreg of @ignored tests that were executed and which passed. I'll have to go talk to the JT Harness folk to see if I could add that info into a report. One idea that has been on the table for a while is the idea of a "known failure list". You do a test run and then compare results against a list containing tests which are regrettably known to fail. Seems to me we could use -ignore:run for jtreg, then invoke jtdiff against the KFL to get a report of which tests did not behave as expected, including tests which now pass which previously did not. -- Jon On May 22, 2008, at 7:16 AM, Mark Wielaard wrote: > Hi Jonathan, > > On Mon, 2008-05-19 at 16:07 -0700, Jonathan Gibbons wrote: >> Separately, check out the options for handling @ignore tests. Even on >> older versions >> of jtreg you can use "-k:!ignore" to ignore @ignore tests. (This >> works >> because @ignore >> tests are given an implicit "ignore" keyword.) With later >> versions of >> jtreg, you can use >> -Ignore:{quiet,error,run} to control how @ignore tests should be >> handled. Using this >> option, you should be to get closer to the goal of "all tests should >> pass", meaning >> that there are less failures and so less need to compare the output >> results with jtdiff. > > This is a really a great feature! For icedtea we now use "-v1 -a > -ignore:quiet", that give output and results that should be pretty > familiar to people. And this is the set that I hope we can get to be > all > PASS in the default case. > > One extension might be to have a "-ignore:try" that does try to run > it, > that doesn't report it failure, but that does flag it as unexpected > XPASS to alert people to bugs that are (accidentally) fixed but where > the testcase was not yet enabled. > > Cheers, > > Mark > From Jonathan.Gibbons at Sun.COM Thu May 22 10:17:38 2008 From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons) Date: Thu, 22 May 2008 10:17:38 -0700 Subject: jtreg testing integrated In-Reply-To: <1211469622.4054.48.camel@dijkstra.wildebeest.org> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> <1211469622.4054.48.camel@dijkstra.wildebeest.org> Message-ID: <1D341CE9-3799-4B75-B7D6-A064F9F2CFD1@Sun.COM> Mark, If you're publishing results on a web server, you might be interested in some servlets available in JTHarness. They may be a little dusty now -- we wrote them in the early days of Tomcat and .jsp files, but they can give you a fancy view of a .jtr file in particular. The're in javatest.jar, in the com.sun.javatest.servlets package. The documentation should hopefully be available in the JT Harness online help. -- Jon On May 22, 2008, at 8:20 AM, Mark Wielaard wrote: > Hi Jon, > > On Tue, 2008-05-20 at 08:24 -0700, Jonathan Gibbons wrote: >> The solution was to provide a utility called "EditLinks" within the >> JavaTest >> framework. I assume it is still available within JT Harness. This >> is a >> simple >> utility for post-processing the links within report files so that you >> can move >> report files and work directories around as you choose. > > Heay that is pretty neat! > > So I just did this in my icedtea6 dir: > $ for i in hotspot langtools jdk; do java -cp test/jtreg.jar \ > com.sun.javatest.EditLinks -e `pwd` \ > http://icedtea.classpath.org/~mjw/jtreg test/$i; done > > And uploaded the results: > http://icedtea.classpath.org/~mjw/jtreg/test/ > > And indeed the links now work and you can get a html overview of the > failure lists and correct links to the .jtr files such as: > http://icedtea.classpath.org/~mjw/jtreg/test/jdk/JTreport/html/failed.html > > Great, > > Mark > From Kelly.Ohair at Sun.COM Thu May 22 11:59:02 2008 From: Kelly.Ohair at Sun.COM (Kelly O'Hair) Date: Thu, 22 May 2008 11:59:02 -0700 Subject: jtreg testing integrated In-Reply-To: <1211466426.4054.42.camel@dijkstra.wildebeest.org> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> <1211466426.4054.42.camel@dijkstra.wildebeest.org> Message-ID: <4835C276.5010004@sun.com> Mark Wielaard wrote: > Hi Martin, > > On Tue, 2008-05-20 at 06:00 -0700, Martin Buchholz wrote: >> On Tue, May 20, 2008 at 2:32 AM, Mark Wielaard wrote: >>>> I like a policy of "Read my lips; no new test failures" but OpenJDK >>>> is not quite there; we get test failure creep when changes in >>>> one component break another component's tests. >>> Yes, that would be idea. At least for openjdk6/icedtea we seem to be >>> pretty close actually. It will be more challenging for openjdk7. I >>> haven't quite figured out all the dynamics around "workspace >>> integration". But I assume we can get the master tree to zero fail and >>> then demand that any integration cycle doesn't introduce regressions. >> There are too many tests to require team integrators to run >> them all on each integration cycle. > > I am not sure. It does take about 3 hours to run all the included tests > (and I assume that when we add more tests or integrate things like mauve > it will rise). But I do hope people, not just integrators, will run them > regularly. Especially when they are working on/integrating larger > patches. And we can always fall back on autobuilders so we have a full > report at least soon after something bad happens so there is some chance > to revert a change relatively quickly. 3 hours for runs with: -client and -server? one OS? 32bit and 64bit? And you are only talking about the tests in the jdk/test area I assume. The issue I have seen with the testing is that if it isn't done on a good spread of options and platforms and variations, something gets missed, there is little consistency between developers as to what the official requires test matrix is. Once we have a 100% pass list, and a required matrix, I have a system to help enforce these tests being run, but so far, nobody has given me a way to run just that list. Once I have it, I think we can keep all or most repositories golden. The Hotspot team runs many many variations, using options like -Xcomp and -Xbatch and loads more. But they are trying to stress the VM. > >> For a few years I've advocated >> adding another level to the tree of workspaces. My model is to >> rename the current MASTER workspace to PURGATORY, and >> add a "golden MASTER". >> The idea is that once a week or so all tests are run exhaustively, >> and when it is confirmed that there are no new test failures, >> the tested code from PURGATORY is promoted to MASTER. > > This is fascinating. Intuitively I would call for less levels instead of > more because that makes issues show up earlier. It is one of the things > I haven't really wrapped my head around. The proliferation of separate > branches/workspaces. One main master tree where all work goes into by > default and only have separate (ad hoc) branches/workspaces for larger > work items that might be destabilizing seems an easier model to work > with. Easier model to work with for developers yes, I agree, and I have to admit when I came into the java org from a different part of Sun, I was somewhat amazed too. But it does serve a very valuable purpose, it corrals and isolates the different teams, and prevents major and semi-major regressions from impacting everyone in the organization. At a cost I suppose. Each layer does add a cost... and the PURGATORY&GOLDEN idea is a good one, but just like having different team areas, it adds another delay in people seeing a change show up in the MASTER area. I'm more of a 'test before integrate' person, with streamlining and automating the testing process, making it part of the developer push process, adapting the tests as major regressions sneak by (you can never catch all regressions, no matter what you do). Blocking pushes on any failure. So I'm trying to throw hardware at the problem until we can possible do the "exhaustive testing" that Martin mentions, as part of a developer pushing a change in, before anyone else sees it, all automated. But for automation, I want a test guarantee: "these tests should pass 100% of the time on all platforms in all situations" and then we can think about enforcing it. None of this wishy washy "this test fails sometimes due to the phase of the moon" crap. ... oops, can I say crap in a public email??... ;^} -kto > > Cheers, > > Mark > From Kelly.Ohair at Sun.COM Thu May 22 12:13:36 2008 From: Kelly.Ohair at Sun.COM (Kelly O'Hair) Date: Thu, 22 May 2008 12:13:36 -0700 Subject: jtreg testing integrated In-Reply-To: <1ccfd1c10805220812p2a81641ct49c4f6a8cd339ca7@mail.gmail.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> <1211466426.4054.42.camel@dijkstra.wildebeest.org> <1ccfd1c10805220812p2a81641ct49c4f6a8cd339ca7@mail.gmail.com> Message-ID: <4835C5E0.3060104@sun.com> It's been my experience that when people say "Too many times ... broken by changes in hotspot ..." That it's not that frequent an event at all, just that any single event like it becomes a huge effort to isolate and resolve, two of them quickly become "too many". And I indeed understand that very much. It's like a gcc or C compiler bug, nasty nasty problems and mostly just makes people angry or upset that it ever happened and that you lost so much time tracking it down. Nobody likes this to happen, but asking every team to run all your tests on all platforms before they integrate isn't a very good solution either, not when many of the tests are not easily runnable by that team. It's like asking the plumber to run all the electrical tests to make sure he hasn't caused a short somewhere, easier said than done, and will probably cost you much more in the long run. I will try and look into creating an automated test of some of the jdk basic java tests, but I need help in creating a set of 100% pass tests that run in a reasonable amount of time, say 15min-20min on a fast machine? I could probably get that into their automated system easily enough, but asking them to run everything just won't fly. -kto Martin Buchholz wrote: > [+quality-discuss, jdk7-gk] > > On Thu, May 22, 2008 at 7:27 AM, Mark Wielaard wrote: >> Hi Martin, >> >> On Tue, 2008-05-20 at 06:00 -0700, Martin Buchholz wrote: >>> On Tue, May 20, 2008 at 2:32 AM, Mark Wielaard wrote: >>>>> I like a policy of "Read my lips; no new test failures" but OpenJDK >>>>> is not quite there; we get test failure creep when changes in >>>>> one component break another component's tests. >>>> Yes, that would be idea. At least for openjdk6/icedtea we seem to be >>>> pretty close actually. It will be more challenging for openjdk7. I >>>> haven't quite figured out all the dynamics around "workspace >>>> integration". But I assume we can get the master tree to zero fail and >>>> then demand that any integration cycle doesn't introduce regressions. >>> There are too many tests to require team integrators to run >>> them all on each integration cycle. >> I am not sure. It does take about 3 hours to run all the included tests >> (and I assume that when we add more tests or integrate things like mauve >> it will rise). > > Not all the regression tests are open source yet, and not all the > test suites available are open source (and some are likely to be > permanently encumbered). And we should be adding more > static analysis tools to the testing process. > > It sure would be nice to run all tests with -server and -client, > and with different GCs, and on 32 and 64-bit platforms, > with Java assertions enabled and disabled, > with C++ assertions enabled and disabled. > > Soon a "full" testing cycle looks like it might take a week. > > But I do hope people, not just integrators, will run them >> regularly. Especially when they are working on/integrating larger >> patches. And we can always fall back on autobuilders so we have a full >> report at least soon after something bad happens so there is some chance >> to revert a change relatively quickly. > > Much of the world works on this model - > commit to trunk, wait for trouble, revert. > It's certainly much cheaper, and gets feedback quicker, > but creates fear among developers ("Notoriously careless > developer X just did a commit. I think I'll wait for a week > before pulling") > >>> For a few years I've advocated >>> adding another level to the tree of workspaces. My model is to >>> rename the current MASTER workspace to PURGATORY, and >>> add a "golden MASTER". >>> The idea is that once a week or so all tests are run exhaustively, >>> and when it is confirmed that there are no new test failures, >>> the tested code from PURGATORY is promoted to MASTER. >> This is fascinating. Intuitively I would call for less levels instead of >> more because that makes issues show up earlier. It is one of the things >> I haven't really wrapped my head around. The proliferation of separate >> branches/workspaces. One main master tree where all work goes into by >> default and only have separate (ad hoc) branches/workspaces for larger >> work items that might be destabilizing seems an easier model to work >> with. > > It's certainly more work for the integrators. But for the developers > my model is simple and comfortable. Youir integrator will give you > a workspace to commit changes to. > Commit there whenever you feel like. Go on to the next coding task. > Your changes will take a while to percolate into MASTER, > but what do you care? > When you sync, you pull in changes from MASTER, which are > *guaranteed* to not break any of your tests. If you want specific > changes quickly, pull from PURGATORY or a less-tested team > workspace. > > If you have a project where you need to share your work > with other developers immediately, > no problem - just create a project-specific shared workspace > that all project team members can commit to directly. > Decide on a level of testing the team is comfortable with - > including none at all. > > Developers in my model are more productive partly because > they don't have to be afraid of breaking other developers. > They can do enough testing for 95% confidence > (which for many changes might mean no testing at all) > then commit. The system will push back buggy changes > automatically. > > Too many times I've suffered because tests in library land > have been broken by changes in hotspot. Nevertheless, > the JDK MASTER is remarkably stable for a project with so > many developers, largely because of the gradual integration > process, with changes going into MASTER only after being > tested by integrators. JDK developers don't go around chatting > about "build weather" - is the build broken today? AGAIN? > > This development model doesn't work as well for most > open source projects, because they have fewer, smarter, and more > dedicated developers, so there is less need. > Also, it's hard to find good integrators. Most people (like myself) > end up doing it as a part-time job. But just like source code > control systems have gotten sexy, perhaps someday > "code integration and testing systems" will become sexy, > and everyone will want to write one. > > Martin From mark at klomp.org Thu May 22 15:04:47 2008 From: mark at klomp.org (Mark Wielaard) Date: Fri, 23 May 2008 00:04:47 +0200 Subject: jtreg testing integrated In-Reply-To: <4835C276.5010004@sun.com> References: <1211188871.5783.26.camel@dijkstra.wildebeest.org> <17c6771e0805190756l3abb06d0g74158054589471fb@mail.gmail.com> <1ccfd1c10805190830j37ef4f8bg12de006c9e051298@mail.gmail.com> <1211275953.3284.33.camel@dijkstra.wildebeest.org> <1ccfd1c10805200600m74b6f735g9159f10a27f8dc26@mail.gmail.com> <1211466426.4054.42.camel@dijkstra.wildebeest.org> <4835C276.5010004@sun.com> Message-ID: <1211493887.3181.73.camel@dijkstra.wildebeest.org> Hi Kelly, On Thu, 2008-05-22 at 11:59 -0700, Kelly O'Hair wrote: > Mark Wielaard wrote: > > I am not sure. It does take about 3 hours to run all the included tests > > (and I assume that when we add more tests or integrate things like mauve > > it will rise). But I do hope people, not just integrators, will run them > > regularly. Especially when they are working on/integrating larger > > patches. And we can always fall back on autobuilders so we have a full > > report at least soon after something bad happens so there is some chance > > to revert a change relatively quickly. > > 3 hours for runs with: > -client and -server? > one OS? > 32bit and 64bit? > > And you are only talking about the tests in the jdk/test area I assume. No all the jtreg based test (-a -ignore:quiet) currently included in langtools, jdk and hotspot. On a x86_64, dual core, 3.2Ghz, Fedora 8. > The issue I have seen with the testing is that if it isn't done on a good > spread of options and platforms and variations, something gets missed, > there is little consistency between developers as to what the official > requires test matrix is. Once we have a 100% pass list, and a required > matrix, I have a system to help enforce these tests being run, but so > far, nobody has given me a way to run just that list. > Once I have it, I think we can keep all or most repositories golden. Yes, that would be ideal. > The Hotspot team runs many many variations, using options like -Xcomp > and -Xbatch and loads more. But they are trying to stress the VM. I assume there are many more hotspot tests than the 4 currently included. Hopefully they can be liberated so more people can run them. > I'm more of a 'test before integrate' person, with streamlining and > automating the testing process, making it part of the developer push process, > adapting the tests as major regressions sneak by (you can never catch all > regressions, no matter what you do). Blocking pushes on any failure. > So I'm trying to throw hardware at the problem until we can possible do > the "exhaustive testing" that Martin mentions, as part of a developer > pushing a change in, before anyone else sees it, all automated. With a more distributed version control system a lot more can be separated I guess. Your idea of a core test of tests that should always pass 100% is good. Then autobuilders could take over. And everybody that cares about a particular architecture/setup/configuration could add their own autobuilder to the mix and make sure the full blown testsuite keeps passing completely. For GCC there is a nice system where when someone commits something on a platform that they don't have access to a build machine runs and send email to that person "After your latest commit the gnats Ada compiler cross compiled from mips64 to ppc AIX failed the following tests. GO FIX IT!". > But for automation, I want a test guarantee: > "these tests should pass 100% of the time on all platforms in all situations" > and then we can think about enforcing it. > None of this wishy washy "this test fails sometimes due to the phase of > the moon" crap. ... oops, can I say crap in a public email??... ;^} Yes, that is the biggest danger "flaky tests". With Mauve we actually have that problem. And we are constantly fighting it. It is a huge cost to all involved :{ Cheers, Mark