Performance regression in java.util.zip.Deflater

Thu Dec 20 19:20:30 UTC 2007

Hello,

Sombody posted at
http://forums.java.net/jive/thread.jspa?messageID=251006 that he has
problems with the performance of java.util.zip.Deflater starting with
version 1.5.0_07.
I did a very dumb micro-benchmark and it seems to confirm it, with
small buffers (the original author used a 1000 byte buffer), 1.4.2
took ~1000ms whereas 6.0/7.0b23 take 11000ms. Even when using a 32kb
buffer 1.4.2 is still twice as fast.
I played a bit with oprofile and it clearly shows up that memcopy eats
all the memory.

The problem is that every time the whole input-buffer is copied to the
native side, assuming that every call 2000bytes (ratio 50%) of input
data are compressed "away" from the input, the method copies every
call to deflateBytes 5000k, 4998k, 4996k , ....
This can't be solved easily because we don't know how many bytes zlib
may consume from the input-data.

I would have a few ideas how this issue could be solved:

1.) Using DirectByteBuffers for data-transfer.
pros: Array-Like access from the native side, no negative inpact on GC.
cons: Data has to be copied, wasted RAM (because we have two copies,
one in the byte[] supplied by the user, and one outside the heap in
the DirectByteBuffer, possible OOMs because out-of-native memory.

2.) Use GetPrimitiveArrayCritical:
pros: no copying involved at all, no redundant copies of data arround.
cons: quite harsh to the GC (blocked until compression is finished) -
maybe even scalability limiter.
I've modified Deflate.c to use GetPrimitiveArrayCritical, and it now
compresses in 100ms instead of 11000, even twice as fast as 1.4.2.
Although this solution looks quite cool, I doubt its behaviour does
comply with Sun's quality expectations.

3.) Limit the amount of byted trasfereed to the native side:
pros: no redundant copies of input-data
cons: still a lot of copying (however not n^2), maybe more JNI calls
to get same work done.

I would be happy about suggestions and thoughts in general. Maybe
somebody knows why the old JVMs performed so much better here?

Thank you in advance, lg Clemens

Test-Case:
public class DeflaterTest
{

public static byte[] compresserZlib(byte[] donnees)
	{
	ByteArrayOutputStream resultat = new ByteArrayOutputStream();
	byte[] buffer = new byte[1000];
	int nbEcrits;

	Deflater deflater = new Deflater();
	deflater.setInput(donnees);
	deflater.setLevel(0);
	deflater.finish();

	while (!deflater.finished())
	{
		nbEcrits = deflater.deflate(buffer);
		resultat.write(buffer, 0, nbEcrits);
	}

		return resultat.toByteArray();
	}

	public static void main(String[] args)
	{
		Random r = new Random();
		byte[] buffer = new byte[5000000];
		for(int i=0; i < buffer.length; i++)
		{
			buffer[i] = (byte) (r.nextInt()%127);
		}

		for(int i=0; i < 100; i++)
		{
			long start = System.currentTimeMillis();
			byte[] result = compresserZlib(buffer);
			long end = System.currentTimeMillis();

			System.out.println("Run took: "+(end-start)+" "+result[Math.abs(buffer[0])]);
		}

	}
}