4206909 - adding Z_SYNC_FLUSH support to deflaters

Brandon Long blong at google.com
Thu Sep 3 22:19:44 UTC 2009


On 09/03/09 Xueming Shen uttered the following other thing:
>
> Same as the proposed spec/impl change to  
> InflaterInputStream.isAvailable(), it is also a risky/incompatible
> change to override DeflaterOutputStream.flush() method to flush the  
> underlying deflater. Existing apps
> that are using DOS.flush() (directly or in-directly) will not be happy  
> to see the overall compression ratio gets
> worse after migrating to JDK7. I don't have any data to back this  
> assumption, given DOS has been used for
> decade, this risk for sure is real. This of course contradicts to the  
> request of this RFE which expects/assumes
> that the DOS.flush() should sync_flush the underlying deflater.
>
> Basically the change in Deflater alone is "good" enough to provide A  
> solution for this issue (developer currently
> has to use nasty/hack workaround), it's easy to override DOS.flush  
> (and/or IIS.isAvailble()) to whatever you
> like if the existing is not what you want. Having said that, I still  
> hope the consensus is that we should just do
> the RIGHT thing:-)

Agreed, I hope making it do the right thing is the result.  People wrote
an entire pure-java implementation of zlib to work around this problem.
I was the fourth team at Google to run into it separately.  I don't
think anyone who explicitly calls "flush" will get what the expect.

And I wouldn't expect the overall compression ratio to get that much
worse, unless you are calling flush a really large amount of time.
Z_FULL_FLUSH would be a much bigger change, and clearly the wrong thing
to do on a DeflaterOutputStream.flush.

> As you said, the only benefit of adding a pair of syncFlush()/flushed is  
> to match the existing finish()/finished,
> personally I feel the finish()/finished() control flow() is  
> "complicated" enough, why add another one. Still fell
> the best choice is to simply expose what the zlib provides, I'm still  
> debating with myself if  the Z_FINISH should
> be included as one of the flush options as well, someone might prefer  
> not to use that finish()/finished() control
> flow...

The biggest challenge is that you have to loop your calls to deflate
with FLUSH (and FINISH) till it completes, and knowing when you're done
is complicated, the finished/flushed methods simplify that... but if the
entire api was updated, its possible it could be reworked to make the
external looping unnecessary... though that may involve using more
memory and maybe cpu by using a growable output buffer instead of the
fixed size output byte array.

Anyways, hence the "stick to the minimal changes that match the existing
API", if someone decides that there needs to be a java.util.zip2
(java.util.compress?) that fixes up the whole API to make a cleaner
programmatic interface, that's way beyond my scope.

Brandon

> Sherman
>
> Brandon Long wrote:
>> On 09/03/09 Martin Buchholz uttered the following other thing:
>>   
>>> The zlib and Deflater API is hard to understand.
>>> It seems to me that the proper place to specify the flushing behavior is
>>> when calling setInput, not when calling one of the deflate methods.
>>>
>>> I'm imagining something like
>>> setInput(byte[]b, int off, int len, int flags)
>>> or more java-esque
>>> enum FlushingBehavior {
>>> NO_FLUSH, SYNC_FLUSH, FULL_FLUSH, FINISH }
>>> setInput(byte[]b, int off, int len, FlushingBehavior)
>>>     
>>
>> Looking at DeflaterOutputStream, setInput is called in write, but flush
>> is in a very separate place.
>>
>> We're using this to implement a compressed line oriented network
>> protocol, and so we're likely to have buffering (multiple writes) and
>> then a flush when we're done, we're not building up the entire response
>> and calling write just once.
>>
>> So, I'm not sure how you'd get the expected "stream" behavior out of
>> DeflaterOutputStream if you put the flushing behavior on setInput.
>>
>> Having Deflater.deflate take a flush type more closely mimics the
>> zlib/jzlib deflate function.  Having a separate flush call more closely
>> matches finish/finished on the API.
>>
>> GNU's classpath "fixes" this behavior, and their API is a flush call on
>> Deflater (though no flushed, so they're Deflater.deflate probably loops)
>>
>> Looking at Android's JDK, it has the same bug as the OpenJDK (no flush).
>>
>> Anyways, after Martin's review, I changed it to syncFlush and passed
>> tightened up some other things.. and dropped the
>> InflaterInputStream.available change which was similar to yours (he felt
>> that would be harder to get in).  I guess it partially depends on
>> whether we're targetting OpenJDK 6 or 7.
>>
>> Brandon
>>    
>>> Martin
>>>
>>> On Thu, Aug 27, 2009 at 23:46, Xueming Shen<Xueming.Shen at sun.com> wrote:
>>>     
>>>> Another long over-due zip RFE
>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4206909
>>>> with 93 votes.
>>>>
>>>> Background info:
>>>>
>>>> zlib has following 6 allowed flush values when invoking deflate()/inflate
>>>>
>>>> #define Z_NO_FLUSH      0
>>>> #define Z_PARTIAL_FLUSH 1 /* will be removed, use Z_SYNC_FLUSH instead */
>>>> #define Z_SYNC_FLUSH    2
>>>> #define Z_FULL_FLUSH    3
>>>> #define Z_FINISH        4
>>>> #define Z_BLOCK         5
>>>>
>>>> The values are not really used by inflate(), so we only need to worry about
>>>> the
>>>> deflate().
>>>>
>>>> Our implementation now uses Z_NO_FLUSH for all deflation except when
>>>> "finish()"
>>>> is invoked, in which case we use Z_FINISH.
>>>>
>>>> When Z_NO_FLUSH is used, the deflater may accumulate/pend/hold input data in
>>>> its internal
>>>> buffer for better compression (in which the existing output data might not
>>>> be a "complete"
>>>> block), some applications (as those mentioned in the bug report)however
>>>> might need to flush
>>>> all pending data out so the output data is a "complete" block, therefor the
>>>> corresponding
>>>> de-compressor can work on those data.
>>>>
>>>> Z_SYNC_FLUSH and Z_FULL_FLUSH achieve this. The difference is that the
>>>> "full_flush" will
>>>> reset the existing hash table and start a new one for the coming new data,
>>>> which is really
>>>> not very useful. And the only difference compared to current "reset()" is
>>>> the reset() will
>>>> need to reallocate the memory, not a big deal.
>>>>
>>>> See zlib.h for more details (?)
>>>>
>>>> The change in Deflater is actually relatively easy, see the webrev
>>>>
>>>> http://cr.openjdk.java.net/~sherman/zipflush/webrev
>>>>
>>>> I would like to hear your opinion on
>>>>
>>>> (1)I'm proposing to simply add a "flush()" method to support the
>>>> Z_SYNC_FLUSH only. The
>>>> alternative is to have a flush(int) method + 2 constant sync_flush and
>>>> full_flush)
>>>>
>>>> (2)What should we do to the flush() method in DeflaterOutputStream class.
>>>> Ideally it should
>>>> be overridden like in the webrev, with API doc to explain that it does 2
>>>> things (1) flush
>>>> the deflater (2)flush the underlying outputstream, I believe this is what
>>>> the developer is
>>>> expecting and what the API should have been implemented at first place.
>>>> However this definitely
>>>> is an "incompatible" change, given the existing "flush the underlying os
>>>> only" behavior has
>>>> been in place for decade. I just wonder which approach we should take. I
>>>> tend to leave it
>>>> un-touched and do our best to "educate" the developer to go with
>>>>
>>>>  79     static class FlushableDOS extends DeflaterOutputStream {
>>>>  80         public FlushableDOS(OutputStream out) {
>>>>  81             super(out);
>>>>  82         }
>>>>  83  84         public void flush() throws IOException {
>>>>  85             int len = 0;
>>>>  86             while ((len = def.flush(buf, 0, buf.length)) > 0) {
>>>>  87                 out.write(buf, 0, len);
>>>>  88             }
>>>>  89             out.flush();
>>>>  90         }
>>>>  91     }
>>>>
>>>> We can add something in the class description to explain about the "flush"
>>>>
>>>> (3)During the debug, I noticed the InflaterInputStream.available() issue.
>>>> The method is currently specified as "0 if EOF 1 otherwise", which does not
>>>> look right. The BufferedReader.readLine() in test case Flush()is blocked in
>>>> StreamDecoder#325, the inReady() at ln#323 keeps returning true for the
>>>> inputstream (InflaterInputStream). I believe this available() should be
>>>> specified/implemented as
>>>>
>>>> 0 if EOF, in.available() if inf.needInput() is true, otherwise 1.
>>>>
>>>> Opinion? This is going to be a separate bugid, if we agree to fix it. Btw,
>>>> how to write above spec in English?:-)
>>>>
>>>> Sherman
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>       
>>
>>   

-- 
  "Have been unavoidably detained by the world.
      Expect us when you see us." -- Neil Gaiman, _Stardust_
                                              http://www.fiction.net/blong/



More information about the core-libs-dev mailing list