4206909 - adding Z_SYNC_FLUSH support to deflaters

Sat Sep 5 07:23:50 UTC 2009

On Fri, Sep 4, 2009 at 6:21 PM, Martin Buchholz <martinrb at google.com> wrote:
> On Fri, Sep 4, 2009 at 18:02, Xueming Shen<Xueming.Shen at sun.com> wrote:
>>> Overall, I'm least happy with #4, since I feel it leaves a bug.  flush()
>>> on a stream should flush everything I've written to the stream.  This
>>> bug is that it currently doesn't, and this doesn't fix it.  It makes it
>>> possible for people to fix it (which isn't possible currently without
>>> using a completely separate implementation), but it doesn't fix the bug.
>
> I think "bug" is going too far.  There are OutputStreams that can't dispose
> of previously read data without reading some more first.  It depends on
> the transformation.  DeflaterOutputStream is in a gray area -
> it "could" write all the data to its underlying stream, but at the cost of
> sacrificing some compression (which is its purpose).

I think you're saying read when you mean write... and what is the
point of calling flush if it doesn't mean to actually flush what I
wrote?

As for bug, how about :

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4077821

and

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4032746

Which both speak that it breaks the OutputStream interface (and both
were closed as "not reproducible?)

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4206909

The doc for Flushable.flush says:
           Flushes this stream by writing any buffered output to the
underlying stream.

I think the lack of flush actually working on a DeflaterOutputStream
is the bug that we're attempting to fix.

>> Understood and "agreed":-) I'm wiling to change position under more
>> pressure:-)  And we
>> can add that anytime. It's better than put it in now and have to take it out
>> later or add in some
>> ugly workaround.
>
> Maybe we should understand the risk.  Doing a SYNC_FLUSH on every
> DOS.flush() won't cause the compression/decompression to "break",
> you will just have very slow and very bad compression.
> How bad could it be?  Suppose we test with random data,
> and doing a SYNC_FLUSH on every byte?  Presumably the "compressed"
> output will  be larger than the input by some factor.  If that factor
> is close to 1,
> then it's probably OK...It's "only" a performance problem.
>
> Anyways, I am leaning towards changing DOS to do the Right Thing.

I imagine that doing it on every byte would be horrible, but I imagine
there is some trade-off point at some number of bytes, past which it
won't make much difference.

Running some tests, it usually takes about 50 bytes between flushes
before the compression is "reasonable", and it achieves equivalent to
"no flushes" at about 500-2000 bytes, depending on input (at level 6,
at least).  At less than 10 bytes, the compressed stream is larger.
Lots of variability, though.

Brandon