From Joe.Darcy at Sun.COM Fri Feb 1 02:29:02 2008 From: Joe.Darcy at Sun.COM (Joseph D. Darcy) Date: Thu, 31 Jan 2008 18:29:02 -0800 Subject: BigInteger performance improvements In-Reply-To: <47A14D21.8020807@mindspring.com> References: <47A14D21.8020807@mindspring.com> Message-ID: <47A283EE.40903@sun.com> Hello. Yes, this is the right group :-) As "Java Floating-Point Czar" I also look after BigInteger, although I haven't had much time for proactive maintenance there in a while. I think using faster, more mathematically sophisticated algorithms in BigInteger for large values would be a fine change to explore, as long as the performance on small values was preserved and regression tests appropriate for the new algorithms were developed too. I'd prefer to process changes in smaller chunks rather a huge one all at once; however, I may be a bit slow on reviews in the near future due to some other openjdk work I'm leading up (http://openjdk.java.net/projects/jdk6/). -Joe Darcy Alan Eliasen wrote: > I'm planning on tackling the performance issues in the BigInteger > class. In short, inefficient algorithms are used for > multiplication, exponentiation, conversion to strings, etc. I intend to > improve this by adding algorithms with better asymptotic behavior that > will work better for large numbers, while preserving the existing > algorithms for use with smaller numbers. > > This encompasses a lot of different bug reports: > > 4228681: Some BigInteger operations are slow with very large numbers > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4228681 > > (This was closed but never fixed.) > > > 4837946: Implement Karatsuba multiplication algorithm in BigInteger > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4837946 > > I've already done the work on this one. My implementation is > intended to be easy to read, understand, and check. It significantly > improves multiplication performance for large numbers. > > > 4646474: BigInteger.pow() algorithm slow in 1.4.0 > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4646474 > > This will be improved in a couple ways: > > * Rewrite pow() to use the above Karatsuba multiplication > * Implement Karatsuba squaring > * Finding a good threshhold for Karatsuba squaring > * Rewrite pow() to use Karatsuba squaring > * Add an optimization to use left-shifting for multiples of 2 in the > base. This improves speed by thousands of times for things like > Mersenne numbers. > > > 4641897: BigInteger.toString() algorithm slow for large numbers > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4641897 > > This algorithm uses a very inefficient algorithm for large numbers. > I plan to replace it with a recursive divide-and-conquer algorithm > devised by Schoenhage and Strassen. I have developed and tested this in > my own software. This operates hundreds or thousands of times faster > than the current version for large numbers. It will also benefit from > faster multiplication and exponentiation. > > > In the future, we should also add multiplication routines that are > even more efficient for very large numbers, such as Toom-Cook > multiplication, which is more efficient than Karatsuba multiplication > for even larger numbers. > > Has anyone else worked on these? Is this the right group? > > I will probably submit the Karatsuba multiplication patch soon. > Would it be more preferable to implement *all* of these parts first and > submit one large patch? > From Rebecca.Searls at Sun.COM Fri Feb 1 19:41:15 2008 From: Rebecca.Searls at Sun.COM (Rebecca Searls) Date: Fri, 01 Feb 2008 14:41:15 -0500 Subject: IOException java.lang.PosixProcess Message-ID: <47A375DB.20902@Sun.COM> Please advise how I can track down the cause of this fail. My Java version is: java version 1.5.0 gij (GNU libcj) version 4.2.1 (Ubuntu 4.2.1ubuntu5) --------- Stacktace --- cldc-run: Copying 1 file to /home/guest/x/jwt-client/dist/nbrun12962 Copying 1 file to /home/guest/x/jwt-client/dist/nbrun12962 Jad URL for OTA execution: http://localhost:8082/servlet/org.netbeans.modules.mobility .project.jam.JAMServlet/home/guest/x/jwt-client/dist//jwt-client.jad Starting emulator in execution mode java.io.IOException: java.io.IOException: No such file or directory at java.lang.PosixProcess.(libgcj.so.81) at java.lang.Runtime.execInternal(libgcj.so.81) at java.lang.Runtime.exec(libgcj.so.81) at java.lang.Runtime.exec(libgcj.so.81) at com.sun.kvem.environment.JVM.run(Unknown Source) at com.sun.kvem.environment.EmulatorInvoker.runEmulatorOtherVM(Unknown Source) at com.sun.kvem.environment.EmulatorInvoker.runEmulator(Unknown Source) at com.sun.kvem.environment.ProfileEnvironment$KVMThread.runEmulator(Unknown Source ) at com.sun.kvem.environment.ProfileEnvironment$KVMThread.run(Unknown Source) Caused by: java.io.IOException: No such file or directory at java.lang.PosixProcess.nativeSpawn(libgcj.so.81) at java.lang.PosixProcess.spawn(libgcj.so.81) at java.lang.PosixProcess$ProcessManager.run(libgcj.so.81) ricoh-run: From eliasen at mindspring.com Tue Feb 5 02:16:23 2008 From: eliasen at mindspring.com (Alan Eliasen) Date: Mon, 04 Feb 2008 19:16:23 -0700 Subject: BigInteger performance improvements In-Reply-To: <47A283EE.40903@sun.com> References: <47A14D21.8020807@mindspring.com> <47A283EE.40903@sun.com> Message-ID: <47A7C6F7.4020206@mindspring.com> Joseph D. Darcy wrote: > Yes, this is the right group :-) As "Java Floating-Point Czar" I also > look after BigInteger, although I haven't had much time for proactive > maintenance there in a while. I think using faster, more mathematically > sophisticated algorithms in BigInteger for large values would be a fine > change to explore, as long as the performance on small values was > preserved and regression tests appropriate for the new algorithms were > developed too. My last step for this patch is improving performance of pow() for small numbers, which got slightly slower for some small arguments. (But some arguments, like those containing powers of 2 in the base, are *much* faster.) The other functions like multiply() are basically unchanged for small numbers. The new code, as you might expect, examines the size of the numbers and runs the old "grade-school" algorithm on small numbers and the Karatsuba algorithm on larger numbers (with the threshhold point being determined by benchmark and experiment.) As to the matter of regression tests, I would presume that Sun already has regression tests for BigInteger to make sure it gets correct results. Can you provide me with these, or are they in the OpenJDK distribution already? I can check these to make sure that they use numbers big enough to trigger the threshholds, and if not, extend them in the same style. Regression tests should hopefully be a no-op, assuming you have some already. I'm not adding any new functionality and no results should change--they should just run faster, of course. It should of course be impossible to write a regression test that "succeeds with the patch applied, and fails without the patch" like is requested on the OpenJDK page, unless it is time-limited. Of course, I could extend the regression tests to test *huge* numbers which may or may not be desired if you want the regression tests to run in short time. For example, a single exponentiation that takes milliseconds in my new version takes on the order of 15 minutes with JDK 1.6. How long are you willing to let regression tests take? How many combinations of arguments do you currently test? Do you think more are necessary? > I'd prefer to process changes in smaller chunks rather a huge one all at > once; however, I may be a bit slow on reviews in the near future due to > some other openjdk work I'm leading up I will be submitting a patch addressing the three functions: multiply(), pow() and (the private) square(). These are intertwined and it would be more work and testing for both of us to separate out the patches and apply them in 3 phases. The patch will definitely not be huge. My patches are designed to be as readable and simple as possible for this phase. They all build on existing functions, and eschew lots of low-level bit-fiddling, as those types of changes are harder to understand and debug. I think it's best to get working algorithms with better asymptotic efficiency, as those will vastly improve performance for large numbers, and tune them by doing more low-level bit fiddling later. Even without being tuned to the nth degree, the new algorithms are vastly faster for large numbers, and identical for small numbers. -- Alan Eliasen | "Furious activity is no substitute eliasen at mindspring.com | for understanding." http://futureboy.us/ | --H.H. Williams From Joe.Darcy at Sun.COM Tue Feb 5 05:48:54 2008 From: Joe.Darcy at Sun.COM (Joseph D. Darcy) Date: Mon, 04 Feb 2008 21:48:54 -0800 Subject: BigInteger performance improvements In-Reply-To: <47A7C6F7.4020206@mindspring.com> References: <47A14D21.8020807@mindspring.com> <47A283EE.40903@sun.com> <47A7C6F7.4020206@mindspring.com> Message-ID: <47A7F8C6.8060404@sun.com> Alan Eliasen wrote: > Joseph D. Darcy wrote: > >> Yes, this is the right group :-) As "Java Floating-Point Czar" I also >> look after BigInteger, although I haven't had much time for proactive >> maintenance there in a while. I think using faster, more mathematically >> sophisticated algorithms in BigInteger for large values would be a fine >> change to explore, as long as the performance on small values was >> preserved and regression tests appropriate for the new algorithms were >> developed too. >> > > My last step for this patch is improving performance of pow() for > small numbers, which got slightly slower for some small arguments. (But > some arguments, like those containing powers of 2 in the base, are > *much* faster.) The other functions like multiply() are basically > unchanged for small numbers. The new code, as you might expect, > examines the size of the numbers and runs the old "grade-school" > algorithm on small numbers and the Karatsuba algorithm on larger numbers > (with the threshhold point being determined by benchmark and experiment.) > > As to the matter of regression tests, I would presume that Sun > already has regression tests for BigInteger to make sure it gets correct > results. Can you provide me with these, or are they in the OpenJDK > distribution already? Let's see, I haven't moved the existing tests over from the closed world to open, but I can do that after the repositories are accepting changes again. > I can check these to make sure that they use > numbers big enough to trigger the threshholds, and if not, extend them > in the same style. A general comment is that BigInteger and BigDecimal are rather old classes and our expectations for regression tests have increased over time, which is to say there are fewer regression tests than if the classes were developed today. For my own numerical work, a very large fraction of my engineering time is now spent developing tests rather than code. > Regression tests should hopefully be a no-op, > assuming you have some already. I'm not adding any new functionality > and no results should change--they should just run faster, of course. > While all the existing tests should still pass, that doesn't necessarily imply that no new tests should be written :-) Especially for numerics, the tests need to probe the algorithms where they are likely to fail, and the likely failure points can change with the algorithm. Taking one notorious example, the Pentuim FDIV instruction passed existing divide tests and ran billions of random tests successfully, but (after the fact) a small program targeted at probing interesting SRT boundaries was able to find an indication of a problem after running for only a fraction of a second: http://www.cs.berkeley.edu/~wkahan/srtest > It should of course be impossible to write a regression test that > "succeeds with the patch applied, and fails without the patch" like is > requested on the OpenJDK page, unless it is time-limited. > > Of course, I could extend the regression tests to test *huge* numbers > which may or may not be desired if you want the regression tests to run > in short time. For example, a single exponentiation that takes > milliseconds in my new version takes on the order of 15 minutes with JDK > 1.6. How long are you willing to let regression tests take? How many > combinations of arguments do you currently test? Do you think more are > necessary? > I'm confident the existing tests will need to be augmented; I can work with you developing them. >> I'd prefer to process changes in smaller chunks rather a huge one all at >> once; however, I may be a bit slow on reviews in the near future due to >> some other openjdk work I'm leading up >> > > I will be submitting a patch addressing the three functions: > multiply(), pow() and (the private) square(). These are intertwined and > it would be more work and testing for both of us to separate out the > patches and apply them in 3 phases. The patch will definitely not be huge. > Yes, that sounds like a good bundle of initial changes. > My patches are designed to be as readable and simple as possible for > this phase. They all build on existing functions, and eschew lots of > low-level bit-fiddling, as those types of changes are harder to > understand and debug. I think it's best to get working algorithms with > better asymptotic efficiency, as those will vastly improve performance > for large numbers, and tune them by doing more low-level bit fiddling > later. Even without being tuned to the nth degree, the new algorithms > are vastly faster for large numbers, and identical for small numbers Regards, -Joe From mark at klomp.org Fri Feb 8 10:56:41 2008 From: mark at klomp.org (Mark Wielaard) Date: Fri, 8 Feb 2008 10:56:41 +0000 (UTC) Subject: IOException java.lang.PosixProcess References: <47A375DB.20902@Sun.COM> Message-ID: Hi Rebecca, Rebecca Searls writes: > Please advise how I can track down the cause of this fail. > > My Java version is: > > java version 1.5.0 > gij (GNU libcj) version 4.2.1 (Ubuntu 4.2.1ubuntu5) You might have more feedback on the gcj mailinglist java at gcc.gnu.org But looking at the stacktrace below it looks like the JVM.run() method is trying to execute something that doesn't exist. The IOException isn't very helpful since it doesn't include the file name of the thing that couldn't be execed. I'll look into improving that exception message. But if you can share the source of the emulator you are trying to run then we can see what is really being called. Cheers, Mark > --------- Stacktace --- > cldc-run: > Copying 1 file to /home/guest/x/jwt-client/dist/nbrun12962 > Copying 1 file to /home/guest/x/jwt-client/dist/nbrun12962 > Jad URL for OTA execution: http://localhost:8082/servlet/org.netbeans.modules.mobility > .project.jam.JAMServlet/home/guest/x/jwt-client/dist//jwt-client.jad > Starting emulator in execution mode > java.io.IOException: java.io.IOException: No such file or directory > at java.lang.PosixProcess.(libgcj.so.81) > at java.lang.Runtime.execInternal(libgcj.so.81) > at java.lang.Runtime.exec(libgcj.so.81) > at java.lang.Runtime.exec(libgcj.so.81) > at com.sun.kvem.environment.JVM.run(Unknown Source) > at com.sun.kvem.environment.EmulatorInvoker.runEmulatorOtherVM (Unknown Source) > at com.sun.kvem.environment.EmulatorInvoker.runEmulator(Unknown Source) > at com.sun.kvem.environment.ProfileEnvironment$KVMThread.runEmulator (Unknown Source > ) > at com.sun.kvem.environment.ProfileEnvironment$KVMThread.run (Unknown Source) > Caused by: java.io.IOException: No such file or directory > at java.lang.PosixProcess.nativeSpawn(libgcj.so.81) > at java.lang.PosixProcess.spawn(libgcj.so.81) > at java.lang.PosixProcess$ProcessManager.run(libgcj.so.81) > ricoh-run: > > From linuxhippy at gmail.com Fri Feb 8 15:34:25 2008 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Fri, 8 Feb 2008 16:34:25 +0100 Subject: Performance regression in java.util.zip.Deflater In-Reply-To: <194f62550801090423s2cd83a1aia4c81541c1e28c04@mail.gmail.com> References: <194f62550712201120p1d10ac45xf86eb9cacd2eee87@mail.gmail.com> <476ADDAF.2070409@sun.com> <194f62550712201336y3380808bv3726d891873be277@mail.gmail.com> <476AEDCD.6080504@sun.com> <194f62550712201520p30d7b15wa8f2005749a77243@mail.gmail.com> <476B0ABA.6030102@sun.com> <194f62550712201702n6f44efd5hda27c397e8d1ce96@mail.gmail.com> <476B1916.2060502@sun.com> <194f62550801090353x484a856bl3b6bfdc1e65cf58d@mail.gmail.com> <194f62550801090423s2cd83a1aia4c81541c1e28c04@mail.gmail.com> Message-ID: <194f62550802080734y1c3b7c0ag6d6926655f1c5114@mail.gmail.com> Hi again, Did anybody have some time to look at my patches? I also thought about implementing striding in the CRC32/Adler32 classes which basically suffer from the same block-the-gc behaviour as Inflater/Deflater did before they were "fixed" ;) Furthermore do you have good ideas for regression tests? The usual compression/decompression works fine, can you imagine corner-cases which would be worth special testing? Should the tests written in the jtreg format? Thanks, lg Clemens From David.Bristor at Sun.COM Sat Feb 9 04:19:39 2008 From: David.Bristor at Sun.COM (Dave Bristor) Date: Fri, 08 Feb 2008 20:19:39 -0800 Subject: Performance regression in java.util.zip.Deflater In-Reply-To: <194f62550802080734y1c3b7c0ag6d6926655f1c5114@mail.gmail.com> References: <194f62550712201120p1d10ac45xf86eb9cacd2eee87@mail.gmail.com> <476ADDAF.2070409@sun.com> <194f62550712201336y3380808bv3726d891873be277@mail.gmail.com> <476AEDCD.6080504@sun.com> <194f62550712201520p30d7b15wa8f2005749a77243@mail.gmail.com> <476B0ABA.6030102@sun.com> <194f62550712201702n6f44efd5hda27c397e8d1ce96@mail.gmail.com> <476B1916.2060502@sun.com> <194f62550801090353x484a856bl3b6bfdc1e65cf58d@mail.gmail.com> <194f62550801090423s2cd83a1aia4c81541c1e28c04@mail.gmail.com> <194f62550802080734y1c3b7c0ag6d6926655f1c5114@mail.gmail.com> Message-ID: <47AD29DB.5050806@sun.com> Hi Clemens, I'm sorry to say that I've been tied up with other pressing issues. I will have some feedback more worthwhile than this by Fri/15. Sorry for the delay, Dave Clemens Eisserer wrote: > Hi again, > > Did anybody have some time to look at my patches? > I also thought about implementing striding in the CRC32/Adler32 > classes which basically suffer from the same block-the-gc behaviour as > Inflater/Deflater did before they were "fixed" ;) > > Furthermore do you have good ideas for regression tests? > The usual compression/decompression works fine, can you imagine > corner-cases which would be worth special testing? > Should the tests written in the jtreg format? > > Thanks, lg Clemens From linuxhippy at gmail.com Sun Feb 10 17:09:27 2008 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Sun, 10 Feb 2008 18:09:27 +0100 Subject: Performance regression in java.util.zip.Deflater In-Reply-To: <47AD29DB.5050806@sun.com> References: <194f62550712201120p1d10ac45xf86eb9cacd2eee87@mail.gmail.com> <476AEDCD.6080504@sun.com> <194f62550712201520p30d7b15wa8f2005749a77243@mail.gmail.com> <476B0ABA.6030102@sun.com> <194f62550712201702n6f44efd5hda27c397e8d1ce96@mail.gmail.com> <476B1916.2060502@sun.com> <194f62550801090353x484a856bl3b6bfdc1e65cf58d@mail.gmail.com> <194f62550801090423s2cd83a1aia4c81541c1e28c04@mail.gmail.com> <194f62550802080734y1c3b7c0ag6d6926655f1c5114@mail.gmail.com> <47AD29DB.5050806@sun.com> Message-ID: <194f62550802100909x3387290bo733be20f2b61dbeb@mail.gmail.com> Hi Dave, > I'm sorry to say that I've been tied up with other pressing issues. I will > have some feedback more worthwhile than this by Fri/15. I don't have any stress with this. No need to hurry, if theres no time left ... theres no time left ;) Thanks, lg Clemens From David.Bristor at Sun.COM Fri Feb 15 23:43:12 2008 From: David.Bristor at Sun.COM (Dave Bristor) Date: Fri, 15 Feb 2008 15:43:12 -0800 Subject: Early version of striding Deflater In-Reply-To: <194f62550801131532v4a3b443bt550beb6bd34549cb@mail.gmail.com> References: <194f62550801090711q35d8a5f1wb5a4a29480b40f9b@mail.gmail.com> <4787D6B8.5030805@sun.com> <194f62550801131532v4a3b443bt550beb6bd34549cb@mail.gmail.com> Message-ID: <47B62390.7040309@sun.com> Hi Clemens, Here's some file-by-file feedback, answers to questions, etc. I've attached 2 files: * Deflater.c.reformat is by-and-large the same as the file that you sent, except that it compiles on Solaris without warnings (it wouldn't compile there w/o change; perhaps the linux compiler (I'm guessing) you used is more lenient), and is formatted more in keeping with the rest of the file's style. * Deflater.c has some further changes on my part, described below. I've run all our regression tests on this one and it passes. I haven't run JCK tests, nor our more extensive performance suite. File-by-file commentary: *** Deflate.c Doesn't compile, at least not on Solaris. Several warnings. I fixed them. See attached Deflater.c.reformat. (I have not tried compiling on other platforms.) Some stylistic issues need attention; see e.g. deflateBytes for brace positioning, trailing whitespace, space between keyword and parens. Should document fields in def_data (compare with zip_util.h). Some field IDs no longer necessary, since they're passed in as params. I removed them. There's a slight change to the semantics of "finished". Current code sets Deflater.finished only if setParams is false. Changed code may set it regardlewss of setParams. I suspect this is OK: if client code changed strategy or level and called Deflater.deflate(), it would invoke deflateParams(), and not alter the value of finished. The client's subsequent call to Deflater.deflate() would call deflate() which AFAICT would cause finished to be set. What do you think? Why fall through from Z_OK to Z_BUF_ERROR? In Z_BUF_ERROR and default cases, why continue execution of loop instead of returning 0 as does original code? I changed this; see attached Deflater.c *** Deflater.java: OK *** DeflaterOutputStream.java: OK More inline below. Clemens Eisserer wrote: > Hi Dave, > > Thanks a lot for your reply. > To make it short: Of course I understand that this is low-priority > (also for me, its a fun-only fix because someone in forums.java.net > mentioned it) so don't hurry. > Sorry that I wasted your time with my messy files, they were taken > from my "playground" thats why they were in such a bad shape - they > were only intended to give an idea which "road" I was taking. I > attached the new files taken from the mercurial repositories and only > modified at the affected places. > >> With a change of this sort, we really do need tests along with a fix. Have >> you started writing any test cases? > I completly agree - I have some simple test-cases which test more or > less only very basic functionality of Deflater and they work well > (also FlatterTest passes). > I'll write some more tests which test exotic use-cases like changing > compression-level, ... during compression. Great, thanks. It would be a Good Idea to have a test that checks my assumption re finished (see above). > I have some open questions: > 1.) Is the seperate structure approach to hold the stride-buffers ok? I think so. > 2.) Any suggestions for the following names: 1. strm-field in class > (defAdr), 2. defAdr-parameter,3. defptr - long_to_ptr of defAdr, 4. > def_data - name of the structure Those don't quite match what I see in the code; but what's in the code seems OK: def_data for the struct, def_adr as a param to deflateBytes, etc. etc. defptr to reference a def_data in init and deflateBytes > 3.) I am not really used to program in C. Are the adress-operations ok > which I used to get members of the new struct def_data? It seems OK. > Thanks for your patience, lg Clemens > > Some notes, and changes in ramdom order: > * Changed deflate-bytes to the old behaviour to return after the call > to deflateParams Good; AFAICT at maintains the existing semantics. > * Verified that its ok to call deflateParams when there's not enough > space in the output-buffer to flush all "old" data out (thanks to Mark > Adler) > * I changed the method-signiture of the native method compared to > original, because some variables were read from JNI-code, whereas they > could have been passed simply down using method parameters. I think > its "cleaner" to pass it. The long argument to deflateBytes is a bit cumbersome, but Ken Russell opined that it provides better performance, so it's a Good Thing: > Kenneth Russell wrote: > I strongly agree with the contributor's suggestion. Not only is passing > the argument from Java less code, it is also faster since field access > from Java can be optimized by the HotSpot compiler, where field accesses > through the JNI must go through the same set of boilerplate > C/C++/assembly every time. > * Allocation of the stride-buffers together with the z_stream > structure. z_stream is really large, so the two stride-buffers should > not add that much overhead. However this has the advantage of not > mallocing/freeing and also beeing able to fill the input-stride-buffer > once for several calls of the native method. Looks good. > * Renamed the strm-adress-parameter to defadr, because it no longer > really points to a strm. I did not rename the java field "strm" > because I did not have an idea for a proper name. It should have had a different name from day one. I'm slightly loathe to make a name change, nor do I have (Friday afternoon) a Really Good Idea. > * Removed striding from DeflaterOutputStream, (looked how code looked in 1.4.2). Looks good. From your other email: > I also thought about implementing striding in the CRC32/Adler32 > classes which basically suffer from the same block-the-gc behaviour as > Inflater/Deflater did before they were "fixed" ;) I suspect these are not as much of an issue: presumably CRC32 and adler32 calculations are faster than deflation (caveat: I haven't measured them). Other have pointed out / shamed us because, hey, shouldn't these be in Java anyway? > Furthermore do you have good ideas for regression tests? > The usual compression/decompression works fine, can you imagine > corner-cases which would be worth special testing? > Should the tests written in the jtreg format? I'd like to see tests where there's a possibility of semantics having changed; as noted above re "finished". I haven't done a performance analysis, but don't expect a regression. If anything, since the striding is kept within native code, there should be fewer Java -> native calls, and better performance (though that is perhaps not measurable). My last comment is about this change in general. It seems like a reasonable fix, though the corresponding bug: http://bugs.sun.com/view_bug.do?bug_id=6399199 is a low priority one for us, and this is code we generally feel is best left alone when possible. We are still learning how best to work with contributions from outside of Sun. I will check with other who've maintained this code in the past, to get their opinion of making this kind of change. While I'm currently responsible for jar/zip code, it's only one of the hats I currently wear ;-) Thanks, Dave From David.Bristor at Sun.COM Mon Feb 18 21:21:13 2008 From: David.Bristor at Sun.COM (Dave Bristor) Date: Mon, 18 Feb 2008 13:21:13 -0800 Subject: Early version of striding Deflater In-Reply-To: <47B62390.7040309@sun.com> References: <194f62550801090711q35d8a5f1wb5a4a29480b40f9b@mail.gmail.com> <4787D6B8.5030805@sun.com> <194f62550801131532v4a3b443bt550beb6bd34549cb@mail.gmail.com> <47B62390.7040309@sun.com> Message-ID: <47B9F6C9.9060801@sun.com> Here are the attached files that I forgot to attach ;-( Dave Dave Bristor wrote: > Hi Clemens, > > Here's some file-by-file feedback, answers to questions, etc. I've > attached 2 files: > * Deflater.c.reformat is by-and-large the same as the file that you > sent, except that it compiles on Solaris without warnings (it wouldn't > compile there w/o change; perhaps the linux compiler (I'm guessing) you > used is more lenient), and is formatted more in keeping with the rest of > the file's style. > > * Deflater.c has some further changes on my part, described below. I've > run all our regression tests on this one and it passes. I haven't run > JCK tests, nor our more extensive performance suite. > > File-by-file commentary: > > *** Deflate.c > > Doesn't compile, at least not on Solaris. Several warnings. I fixed > them. See attached Deflater.c.reformat. (I have not tried compiling on > other platforms.) > > Some stylistic issues need attention; see e.g. deflateBytes for brace > positioning, trailing whitespace, space between keyword and parens. > > Should document fields in def_data (compare with zip_util.h). > > Some field IDs no longer necessary, since they're passed in as params. > I removed them. > > There's a slight change to the semantics of "finished". Current code > sets Deflater.finished only if setParams is false. Changed code may set > it regardlewss of setParams. I suspect this is OK: if client code > changed strategy or level and called Deflater.deflate(), it would invoke > deflateParams(), and not alter the value of finished. The client's > subsequent call to Deflater.deflate() would call deflate() which AFAICT > would cause finished to be set. What do you think? > > Why fall through from Z_OK to Z_BUF_ERROR? In Z_BUF_ERROR and default > cases, why continue execution of loop instead of returning 0 as does > original code? I changed this; see attached Deflater.c > > *** Deflater.java: OK > > *** DeflaterOutputStream.java: OK > > More inline below. > > Clemens Eisserer wrote: >> Hi Dave, >> >> Thanks a lot for your reply. >> To make it short: Of course I understand that this is low-priority >> (also for me, its a fun-only fix because someone in forums.java.net >> mentioned it) so don't hurry. >> Sorry that I wasted your time with my messy files, they were taken >> from my "playground" thats why they were in such a bad shape - they >> were only intended to give an idea which "road" I was taking. I >> attached the new files taken from the mercurial repositories and only >> modified at the affected places. >> >>> With a change of this sort, we really do need tests along with a >>> fix. Have >>> you started writing any test cases? >> I completly agree - I have some simple test-cases which test more or >> less only very basic functionality of Deflater and they work well >> (also FlatterTest passes). >> I'll write some more tests which test exotic use-cases like changing >> compression-level, ... during compression. > > Great, thanks. It would be a Good Idea to have a test that checks my > assumption re finished (see above). > >> I have some open questions: >> 1.) Is the seperate structure approach to hold the stride-buffers ok? > > I think so. > >> 2.) Any suggestions for the following names: 1. strm-field in class >> (defAdr), 2. defAdr-parameter,3. defptr - long_to_ptr of defAdr, 4. >> def_data - name of the structure > > Those don't quite match what I see in the code; but what's in the code > seems OK: > def_data for the struct, > def_adr as a param to deflateBytes, etc. etc. > defptr to reference a def_data in init and deflateBytes > >> 3.) I am not really used to program in C. Are the adress-operations ok >> which I used to get members of the new struct def_data? > > It seems OK. > >> Thanks for your patience, lg Clemens >> >> Some notes, and changes in ramdom order: >> * Changed deflate-bytes to the old behaviour to return after the call >> to deflateParams > > Good; AFAICT at maintains the existing semantics. > >> * Verified that its ok to call deflateParams when there's not enough >> space in the output-buffer to flush all "old" data out (thanks to Mark >> Adler) >> * I changed the method-signiture of the native method compared to >> original, because some variables were read from JNI-code, whereas they >> could have been passed simply down using method parameters. I think >> its "cleaner" to pass it. > > The long argument to deflateBytes is a bit cumbersome, but Ken Russell > opined that it provides better performance, so it's a Good Thing: > > > Kenneth Russell wrote: > > I strongly agree with the contributor's suggestion. Not only is passing > > the argument from Java less code, it is also faster since field access > > from Java can be optimized by the HotSpot compiler, where field accesses > > through the JNI must go through the same set of boilerplate > > C/C++/assembly every time. > >> * Allocation of the stride-buffers together with the z_stream >> structure. z_stream is really large, so the two stride-buffers should >> not add that much overhead. However this has the advantage of not >> mallocing/freeing and also beeing able to fill the input-stride-buffer >> once for several calls of the native method. > > Looks good. > >> * Renamed the strm-adress-parameter to defadr, because it no longer >> really points to a strm. I did not rename the java field "strm" >> because I did not have an idea for a proper name. > > It should have had a different name from day one. I'm slightly loathe > to make a name change, nor do I have (Friday afternoon) a Really Good Idea. > >> * Removed striding from DeflaterOutputStream, (looked how code looked >> in 1.4.2). > > Looks good. > > From your other email: > > > I also thought about implementing striding in the CRC32/Adler32 > > classes which basically suffer from the same block-the-gc behaviour as > > Inflater/Deflater did before they were "fixed" ;) > > I suspect these are not as much of an issue: presumably CRC32 and > adler32 calculations are faster than deflation (caveat: I haven't > measured them). Other have pointed out / shamed us because, hey, > shouldn't these be in Java anyway? > > > Furthermore do you have good ideas for regression tests? > > The usual compression/decompression works fine, can you imagine > > corner-cases which would be worth special testing? > > Should the tests written in the jtreg format? > > I'd like to see tests where there's a possibility of semantics having > changed; as noted above re "finished". > > I haven't done a performance analysis, but don't expect a regression. > If anything, since the striding is kept within native code, there should > be fewer Java -> native calls, and better performance (though that is > perhaps not measurable). > > My last comment is about this change in general. It seems like a > reasonable fix, though the corresponding bug: > http://bugs.sun.com/view_bug.do?bug_id=6399199 > is a low priority one for us, and this is code we generally feel is best > left alone when possible. We are still learning how best to work with > contributions from outside of Sun. I will check with other who've > maintained this code in the past, to get their opinion of making this > kind of change. While I'm currently responsible for jar/zip code, it's > only one of the hats I currently wear ;-) > > Thanks, > Dave > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Deflater.c.reformat URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Deflater.c URL: From eliasen at mindspring.com Tue Feb 19 22:09:31 2008 From: eliasen at mindspring.com (Alan Eliasen) Date: Tue, 19 Feb 2008 15:09:31 -0700 Subject: [PATCH] 4837946: Implement Karatsuba multiplication algorithm in BigInteger In-Reply-To: <47A14D21.8020807@mindspring.com> References: <47A14D21.8020807@mindspring.com> Message-ID: <47BB539B.8090901@mindspring.com> Attached is a patch for bug 4837946, for implementing asymptotically faster algorithms for multiplication of large numbers in the BigInteger class: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4837946 This patch implements Karatsuba multiplication and Karatsuba squaring for numbers above a certain size (found by experimentation). These patches are designed to be as easy to read as possible, and are implemented in terms of already-existing methods in BigInteger. Some more performance might be squeezed out of them by doing more low-level bit-fiddling, but I wanted to get a working version in and tested. This is quite a bit faster for large arguments. The crossover point between the "grade-school" algorithm and the Karatsuba algorithm for multiplication is set at 35 "ints" or about 1120 bits, which was found to give the best crossover in timing tests, so you won't see improvement below that. Above that, it's asymptotically faster. (O(n^1.585), compared to O(n^2)) for the grade-school algorithm. Double the number of digits, and the "grade school" algorithm takes about 4 times as long, Karatsuba takes about 3 times as long. It's vastly superior for very large arguments. I'd also like to create another RFE for implementing even faster multiplication algorithms for yet larger numbers, such as Toom-Cook. Previously, I had indicated that I'd submit faster algorithms for pow() at the same time, but the number of optimizations for pow() has grown rather large, and I plan on working on it a bit more and submitting it separately. Many/most combinations of operands for pow() are now vastly faster (some hundreds of thousands of times,) but I'd like to make it faster (or, at the least, the same performance for all arguments, a few of which have gotten slightly slower.) Unfortunately, these optimizations add to the size and complexity of that patch, which is why I'm submitting it separately. I have created regression tests that may or may not be what you want; they simply multiply a very large bunch of numbers together and output their results to a very large file, which you then "diff" against known correct values. (My tests produce 345 MB of output per run!) I validated the results by comparing them to the output of both JDK 1.6 and the Kaffe JVM, which was compiled to use the well-known and widely-tested GMP libraries for its BigInteger work. All tests pass. I haven't submitted these tests, but am awaiting getting a copy of the existing regression tests that Joseph Darcy discussed on this list. Please let me know if there's a problem with the patch. I had to hand-edit a few lines to remove the work I'm doing for pow(). -- Alan Eliasen | "Furious activity is no substitute eliasen at mindspring.com | for understanding." http://futureboy.us/ | --H.H. Williams -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: BigInteger.diff URL: From eliasen at mindspring.com Wed Feb 20 22:45:48 2008 From: eliasen at mindspring.com (Alan Eliasen) Date: Wed, 20 Feb 2008 15:45:48 -0700 Subject: [PATCH] 4837946: Implement Karatsuba multiplication algorithm in BigInteger In-Reply-To: <87skzo6l5r.fsf@mid.deneb.enyo.de> References: <47A14D21.8020807@mindspring.com> <47BB539B.8090901@mindspring.com> <87skzo6l5r.fsf@mid.deneb.enyo.de> Message-ID: <47BCAD9C.1030002@mindspring.com> Anonymous member wrote: > Out of curiousity, how does the Java implementation compare to GMP in > terms of speed? (Note: These numbers apply to the *revised* patch that I'll be posting in a few minutes.) It depends on the size of the numbers. When I run the regression tests that I wrote, my updated Java version runs in about 2 minutes. When I run the same regression test in Kaffe using GMP, it takes about 2 *hours*. (Some of the same tests using Java 1.6 take many, many hours without my optimizations for the pow() function). But those are a lot of small numbers. There is a lot of overhead in converting from Java to C types and back, and in Kaffe's relative slowness. And currently, only multiply() has been improved in my Java patches that I've submitted. Kaffe is about 25 times slower on the average program anyway. When you run Kaffe/GMP for very large numbers, Kaffe/GMP starts being very much faster. OpenJDK still can't compete for numbers that are on the cutting edge of number theory. But we'll be much better than we were before. If I were to write the same code in pure C using GMP, then GMP would be much faster. But I haven't done that. So it's hard to compare GMP to Java exactly. But some numbers. For multiplying the numbers 3^1000000 * 3^1000001, (with my fixes to do the exponentiation hundreds of thousands of times faster factored out; without these, JDK 1.6 would be thousands of times slower,) the times for 20 iterations are: JDK 1.6 OpenJDK1.7 with my patches Kaffe w/GMP 292987 ms 28650 ms 866 ms Thus, for numbers of this size, my algorithms are more than 10 times faster than JDK 1.6, but 33 times slower than Kaffe/GMP. For multiplying the somewhat special form 2^1000000 * 2^1000001, (the arguments of this in binary are a 1 followed by a million zeroes) JDK 1.6 OpenJDK1.7 with my patches Kaffe w/GMP 117298 ms 386 ms 474 ms So, my algorithm is 303 times faster than JDK, and just slightly faster than Kaffe/GMP. For multiplying numbers 3^14000000 * 3^14000001, the time for 1 iteration is: JDK 1.6 OpenJDK1.7 with my patches Kaffe w/GMP 3499115 ms 89505 ms 910 ms Thus, for numbers of this size, my patches make it 38.9 times faster, but still 99 times slower than GMP. Sigh. Well, to be expected. If you work with large numbers, GMP becomes ridiculously faster. Kaffe with GMP becomes only slightly slower than pure GMP as the portion of time in Java gets small. GMP includes 3-way Toom-Cook multiplication, and the much faster FFT multiplication (which is O(n)) for very large numbers, and hand-crafted assembly language that uses 64-bit instructions and limbs on my 64-bit architecture (as opposed to the 32-bit ints used in BigInteger. Hopefully some day in the future, BigInteger will be rewritten to use longs internally instead of ints. Multiplying two 64-bit longs does 4 times as much work as multiplying two 32-bit ints, and will thus likely be significantly faster, especially on 64-bit architectures.) So GMP is still very much faster for very large numbers. It is also *much* faster in turning numbers into strings, and in exponentiation. The algorithms in BigInteger are horrible for pow() and toString(). See the following bugs: 4641897: BigInteger.toString() algorithm slow for large numbers http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4641897 4646474: BigInteger.pow() algorithm slow in 1.4.0 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4646474 I have several fixes for both of these. My JVM-based programming language Frink ( http://futureboy.us/frinkdocs/ ) implements many of these improvements, and is much faster for a variety of arguments. I just need to finish improving these for a variety of different arguments, and considering the threading and memory-size-vs-speed tradeoffs in implementing toString efficiently. -- Alan Eliasen | "Furious activity is no substitute eliasen at mindspring.com | for understanding." http://futureboy.us/ | --H.H. Williams From eliasen at mindspring.com Wed Feb 20 22:50:58 2008 From: eliasen at mindspring.com (Alan Eliasen) Date: Wed, 20 Feb 2008 15:50:58 -0700 Subject: [PATCH] 4837946: Implement Karatsuba multiplication algorithm in BigInteger In-Reply-To: <47BB539B.8090901@mindspring.com> References: <47A14D21.8020807@mindspring.com> <47BB539B.8090901@mindspring.com> Message-ID: <47BCAED2.6090408@mindspring.com> Attached is a *revised* patch for bug 4837946, for implementing asymptotically faster algorithms for multiplication of large numbers in the BigInteger class: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4837946 It only differs from my patch posted yesterday in one respect--the new methods getLower() and getUpper() now call trustedStripLeadingZeroInts() which has two effects: * Helps preserve the invariant expected by BigInteger that zero has a signum of zero and a mag array with length of zero (due to the way that the private constructor BigInteger(int[] mag, int signum) works.) * Optimizes some cases like multiplying 2*1000000, where the bit string has a large number of zeroes in a row. Please use this patch instead of the one I posted yesterday. -- Alan Eliasen | "Furious activity is no substitute eliasen at mindspring.com | for understanding." http://futureboy.us/ | --H.H. Williams -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: BigInteger.diff URL: From Martin.Buchholz at Sun.COM Thu Feb 28 20:50:47 2008 From: Martin.Buchholz at Sun.COM (Martin Buchholz) Date: Thu, 28 Feb 2008 12:50:47 -0800 Subject: 6633613: (str) StringCoding optimizations to avoid unnecessary array copies with Charset arg Message-ID: <47C71EA7.5090406@sun.com> I propose to partially fix 6633613: (str) StringCoding optimizations to avoid unnecessary array copies with Charset arg with the patches below (a more ambitious patch will hopefully follow later): Iris, please review. Martin First, warning suppression: diff --git a/src/share/classes/java/lang/StringCoding.java b/src/share/classes/java/lang/StringCoding.java --- a/src/share/classes/java/lang/StringCoding.java +++ b/src/share/classes/java/lang/StringCoding.java @@ -53,22 +53,23 @@ class StringCoding { private StringCoding() { } - /* The cached coders for each thread - */ - private static ThreadLocal decoder = new ThreadLocal(); - private static ThreadLocal encoder = new ThreadLocal(); + /** The cached coders for each thread */ + private final static ThreadLocal> decoder = + new ThreadLocal>(); + private final static ThreadLocal> encoder = + new ThreadLocal>(); private static boolean warnUnsupportedCharset = true; - private static Object deref(ThreadLocal tl) { - SoftReference sr = (SoftReference)tl.get(); + private static T deref(ThreadLocal> tl) { + SoftReference sr = tl.get(); if (sr == null) return null; return sr.get(); } - private static void set(ThreadLocal tl, Object ob) { - tl.set(new SoftReference(ob)); + private static void set(ThreadLocal> tl, T ob) { + tl.set(new SoftReference(ob)); } // Trim the given byte array to the given length @@ -174,7 +175,7 @@ class StringCoding { static char[] decode(String charsetName, byte[] ba, int off, int len) throws UnsupportedEncodingException { - StringDecoder sd = (StringDecoder)deref(decoder); + StringDecoder sd = deref(decoder); String csn = (charsetName == null) ? "ISO-8859-1" : charsetName; if ((sd == null) || !(csn.equals(sd.requestedCharsetName()) || csn.equals(sd.charsetName()))) { @@ -273,7 +274,7 @@ class StringCoding { static byte[] encode(String charsetName, char[] ca, int off, int len) throws UnsupportedEncodingException { - StringEncoder se = (StringEncoder)deref(encoder); + StringEncoder se = deref(encoder); String csn = (charsetName == null) ? "ISO-8859-1" : charsetName; if ((se == null) || !(csn.equals(se.requestedCharsetName()) || csn.equals(se.charsetName()))) { second, actual fix: diff --git a/src/share/classes/java/lang/StringCoding.java b/src/share/classes/java/lang/StringCoding.java --- a/src/share/classes/java/lang/StringCoding.java +++ b/src/share/classes/java/lang/StringCoding.java @@ -194,8 +194,7 @@ class StringCoding { static char[] decode(Charset cs, byte[] ba, int off, int len) { StringDecoder sd = new StringDecoder(cs, cs.name()); - byte[] b = Arrays.copyOf(ba, ba.length); - return sd.decode(b, off, len); + return sd.decode(Arrays.copyOfRange(ba, off, off + len), 0, len); } static char[] decode(byte[] ba, int off, int len) { @@ -293,8 +292,7 @@ class StringCoding { static byte[] encode(Charset cs, char[] ca, int off, int len) { StringEncoder se = new StringEncoder(cs, cs.name()); - char[] c = Arrays.copyOf(ca, ca.length); - return se.encode(c, off, len); + return se.encode(Arrays.copyOfRange(ca, off, off + len), 0, len); } static byte[] encode(char[] ca, int off, int len) {