RFR 4641897: Faster string conversion of large integers

Alan Eliasen eliasen at mindspring.com
Sat Jun 22 05:26:26 UTC 2013


On 06/21/2013 06:42 PM, Brian Burkhalter wrote:
> I think
> the main open problem for this patch and the Karatsuba one just
> integrated this week is to determine algorithm crossover thresholds
> which are not going to provoke performance degradations for certain
> intermediate bit lengths on the platforms of interest. Any
> suggestions on this topic would also be appreciated.

   I'll send along some of the simple programs I used to tune the
thresholds for Karatsuba and Toom-Cook multiplication.  I posted these
 As has been stated before, tuning of these thresholds is as much of an
art as it is a science, as results will vary greatly for different bit
patterns.  You have to kind of weight toward the bit patterns that occur
most often in real-world programs (e.g powers of 2 +/-1, factorials, etc.)

   It would be great to have some sort of auto-tuner as part of the
build process, but of course that doesn't work when we're building
binary distributions that just get downloaded by end-users.
Arbitrary-precision math packages like GMP have a bunch of hard-coded
thresholds that are selected when your processor architecture is
detected, but if you want to work harder, you can run their autotune
programs to build headers that work well on your specific computer with
its unique cache sizes, memory speeds, etc.  Of course, you'll sometimes
see very different thresholds found when you run the tuning program
multiple times!  It's easy in C/C++ to do conditional #defines based on
detected processor architecture to get good performance, but I know
that's not the Java way.

   The thresholds I chose for the running program were found to work
well on a variety of Linux/Intel and Windows architectures, all the way
back to an 800 MHz Pentium III.  (Unfortunately, it's impossible to
build JDK 8 on those older systems due to its memory requirements.  One
of my semi-fast AMD-64 Linux systems still takes 21 hours to build JDK 8
due to excessive swap.  It could build JDK 7 much faster.)

    I set the thresholds intentionally conservative to work well on all
the architectures I have access to.  The thresholds could have been
moved lower on modern architectures, but with a slight possible impact
to older architectures.  In any case, the performance of the various
algorithms is nearly the same in the crossover regions, so it usually
doesn't matter all that much.

   However, I can see that it's quite possible that JVMs which have very
expensive memory allocation overhead, or where allocation contention is
high, could be impacted more severely, and might have to have their
thresholds adjusted.  It's probably impossible to find a fixed set of
thresholds that perform optimally on all platforms.

   My tuning programs require a fair amount of re-running and changing
of the arguments tested, and keeping track of the thresholds that work
well for all arguments.  Again, more of an art than a science.

   Here's what you have to do:

   1.)  In your JDK code, in BigInteger.java, make KARATSUBA_THRESHOLD
and TOOM_COOK_THRESHOLD "public" instead of "private final".  This
allows the tuning program to tweak them.  (Be sure to change this back
when you're done tuning!)

   2.) Recompile JDK

   3.) Compile BigIntTiming.java

    javac BigIntTiming.java

   4.) Run BigIntTiming to see which thresholds are fastest.  Keep track
of these.  Times are in milliseconds.  Smaller numbers are faster.  The
first column is the Karatsuba threshold, second column is the Toom-Cook
threshold.  (Note that these may need to be modified again when we add
Schoenhage-Strassen multiplication.)

    java BigIntTiming

   I usually start with multiplications of two (small) large numbers,
say 2^200000 * 3^100001, and then change the code to work with much
bigger numbers, e.g. 2^2000000 * 3^1000000 to exercise larger arguments.
  (That is, repeat steps 3) and 4) many times.)  Larger arguments will
test many smaller powers as the recursive algorithms break down the
multiplications into smaller and smaller numbers.   You will want to
change the bounds of the

   for (int i=1; i<20 ...

    loop to adjust to the time spent in various number sizes.  You want
the timings to be significant but not so long as to take forever.


    The code for the tuning program is available at:

    http://futureboy.us/temp/BigIntTiming.java

   Let me know what sort of thresholds you find to work best on various
architectures.

-- 
  Alan Eliasen
  eliasen at mindspring.com
  http://futureboy.us/



More information about the core-libs-dev mailing list