Performance regression in java.util.zip.Deflater
Rémi Forax
forax at univ-mlv.fr
Thu Dec 20 20:00:11 UTC 2007
Clemens Eisserer a écrit :
> Hello,
>
> Sombody posted at
> http://forums.java.net/jive/thread.jspa?messageID=251006 that he has
> problems with the performance of java.util.zip.Deflater starting with
> version 1.5.0_07.
>
Hi Clements,
This bug has a long history,
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6399199
but i don't know what is the current state of this bug.
> I did a very dumb micro-benchmark and it seems to confirm it, with
> small buffers (the original author used a 1000 byte buffer), 1.4.2
> took ~1000ms whereas 6.0/7.0b23 take 11000ms. Even when using a 32kb
> buffer 1.4.2 is still twice as fast.
> I played a bit with oprofile and it clearly shows up that memcopy eats
> all the memory.
>
> The problem is that every time the whole input-buffer is copied to the
> native side, assuming that every call 2000bytes (ratio 50%) of input
> data are compressed "away" from the input, the method copies every
> call to deflateBytes 5000k, 4998k, 4996k , ....
> This can't be solved easily because we don't know how many bytes zlib
> may consume from the input-data.
>
> I would have a few ideas how this issue could be solved:
>
> 1.) Using DirectByteBuffers for data-transfer.
> pros: Array-Like access from the native side, no negative inpact on GC.
> cons: Data has to be copied, wasted RAM (because we have two copies,
> one in the byte[] supplied by the user, and one outside the heap in
> the DirectByteBuffer, possible OOMs because out-of-native memory.
>
> 2.) Use GetPrimitiveArrayCritical:
> pros: no copying involved at all, no redundant copies of data arround.
> cons: quite harsh to the GC (blocked until compression is finished) -
> maybe even scalability limiter.
> I've modified Deflate.c to use GetPrimitiveArrayCritical, and it now
> compresses in 100ms instead of 11000, even twice as fast as 1.4.2.
> Although this solution looks quite cool, I doubt its behaviour does
> comply with Sun's quality expectations.
>
> 3.) Limit the amount of byted trasfereed to the native side:
> pros: no redundant copies of input-data
> cons: still a lot of copying (however not n^2), maybe more JNI calls
> to get same work done.
>
> I would be happy about suggestions and thoughts in general. Maybe
> somebody knows why the old JVMs performed so much better here?
>
> Thank you in advance, lg Clemens
>
>
>
> Test-Case:
> public class DeflaterTest
> {
>
> public static byte[] compresserZlib(byte[] donnees)
> {
> ByteArrayOutputStream resultat = new ByteArrayOutputStream();
> byte[] buffer = new byte[1000];
> int nbEcrits;
>
> Deflater deflater = new Deflater();
> deflater.setInput(donnees);
> deflater.setLevel(0);
> deflater.finish();
>
> while (!deflater.finished())
> {
> nbEcrits = deflater.deflate(buffer);
> resultat.write(buffer, 0, nbEcrits);
> }
>
> return resultat.toByteArray();
> }
>
> public static void main(String[] args)
> {
> Random r = new Random();
> byte[] buffer = new byte[5000000];
> for(int i=0; i < buffer.length; i++)
> {
> buffer[i] = (byte) (r.nextInt()%127);
> }
>
> for(int i=0; i < 100; i++)
> {
> long start = System.currentTimeMillis();
> byte[] result = compresserZlib(buffer);
> long end = System.currentTimeMillis();
>
> System.out.println("Run took: "+(end-start)+" "+result[Math.abs(buffer[0])]);
> }
>
> }
> }
>
Rémi
More information about the core-libs-dev
mailing list