4206909 - adding Z_SYNC_FLUSH support to deflaters
Xueming Shen
Xueming.Shen at Sun.COM
Thu Sep 3 22:02:51 UTC 2009
Same as the proposed spec/impl change to
InflaterInputStream.isAvailable(), it is also a risky/incompatible
change to override DeflaterOutputStream.flush() method to flush the
underlying deflater. Existing apps
that are using DOS.flush() (directly or in-directly) will not be happy
to see the overall compression ratio gets
worse after migrating to JDK7. I don't have any data to back this
assumption, given DOS has been used for
decade, this risk for sure is real. This of course contradicts to the
request of this RFE which expects/assumes
that the DOS.flush() should sync_flush the underlying deflater.
Basically the change in Deflater alone is "good" enough to provide A
solution for this issue (developer currently
has to use nasty/hack workaround), it's easy to override DOS.flush
(and/or IIS.isAvailble()) to whatever you
like if the existing is not what you want. Having said that, I still
hope the consensus is that we should just do
the RIGHT thing:-)
As you said, the only benefit of adding a pair of syncFlush()/flushed is
to match the existing finish()/finished,
personally I feel the finish()/finished() control flow() is
"complicated" enough, why add another one. Still fell
the best choice is to simply expose what the zlib provides, I'm still
debating with myself if the Z_FINISH should
be included as one of the flush options as well, someone might prefer
not to use that finish()/finished() control
flow...
Sherman
Brandon Long wrote:
> On 09/03/09 Martin Buchholz uttered the following other thing:
>
>> The zlib and Deflater API is hard to understand.
>> It seems to me that the proper place to specify the flushing behavior is
>> when calling setInput, not when calling one of the deflate methods.
>>
>> I'm imagining something like
>> setInput(byte[]b, int off, int len, int flags)
>> or more java-esque
>> enum FlushingBehavior {
>> NO_FLUSH, SYNC_FLUSH, FULL_FLUSH, FINISH }
>> setInput(byte[]b, int off, int len, FlushingBehavior)
>>
>
> Looking at DeflaterOutputStream, setInput is called in write, but flush
> is in a very separate place.
>
> We're using this to implement a compressed line oriented network
> protocol, and so we're likely to have buffering (multiple writes) and
> then a flush when we're done, we're not building up the entire response
> and calling write just once.
>
> So, I'm not sure how you'd get the expected "stream" behavior out of
> DeflaterOutputStream if you put the flushing behavior on setInput.
>
> Having Deflater.deflate take a flush type more closely mimics the
> zlib/jzlib deflate function. Having a separate flush call more closely
> matches finish/finished on the API.
>
> GNU's classpath "fixes" this behavior, and their API is a flush call on
> Deflater (though no flushed, so they're Deflater.deflate probably loops)
>
> Looking at Android's JDK, it has the same bug as the OpenJDK (no flush).
>
> Anyways, after Martin's review, I changed it to syncFlush and passed
> tightened up some other things.. and dropped the
> InflaterInputStream.available change which was similar to yours (he felt
> that would be harder to get in). I guess it partially depends on
> whether we're targetting OpenJDK 6 or 7.
>
> Brandon
>
>
>> Martin
>>
>> On Thu, Aug 27, 2009 at 23:46, Xueming Shen<Xueming.Shen at sun.com> wrote:
>>
>>> Another long over-due zip RFE
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4206909
>>> with 93 votes.
>>>
>>> Background info:
>>>
>>> zlib has following 6 allowed flush values when invoking deflate()/inflate
>>>
>>> #define Z_NO_FLUSH 0
>>> #define Z_PARTIAL_FLUSH 1 /* will be removed, use Z_SYNC_FLUSH instead */
>>> #define Z_SYNC_FLUSH 2
>>> #define Z_FULL_FLUSH 3
>>> #define Z_FINISH 4
>>> #define Z_BLOCK 5
>>>
>>> The values are not really used by inflate(), so we only need to worry about
>>> the
>>> deflate().
>>>
>>> Our implementation now uses Z_NO_FLUSH for all deflation except when
>>> "finish()"
>>> is invoked, in which case we use Z_FINISH.
>>>
>>> When Z_NO_FLUSH is used, the deflater may accumulate/pend/hold input data in
>>> its internal
>>> buffer for better compression (in which the existing output data might not
>>> be a "complete"
>>> block), some applications (as those mentioned in the bug report)however
>>> might need to flush
>>> all pending data out so the output data is a "complete" block, therefor the
>>> corresponding
>>> de-compressor can work on those data.
>>>
>>> Z_SYNC_FLUSH and Z_FULL_FLUSH achieve this. The difference is that the
>>> "full_flush" will
>>> reset the existing hash table and start a new one for the coming new data,
>>> which is really
>>> not very useful. And the only difference compared to current "reset()" is
>>> the reset() will
>>> need to reallocate the memory, not a big deal.
>>>
>>> See zlib.h for more details (?)
>>>
>>> The change in Deflater is actually relatively easy, see the webrev
>>>
>>> http://cr.openjdk.java.net/~sherman/zipflush/webrev
>>>
>>> I would like to hear your opinion on
>>>
>>> (1)I'm proposing to simply add a "flush()" method to support the
>>> Z_SYNC_FLUSH only. The
>>> alternative is to have a flush(int) method + 2 constant sync_flush and
>>> full_flush)
>>>
>>> (2)What should we do to the flush() method in DeflaterOutputStream class.
>>> Ideally it should
>>> be overridden like in the webrev, with API doc to explain that it does 2
>>> things (1) flush
>>> the deflater (2)flush the underlying outputstream, I believe this is what
>>> the developer is
>>> expecting and what the API should have been implemented at first place.
>>> However this definitely
>>> is an "incompatible" change, given the existing "flush the underlying os
>>> only" behavior has
>>> been in place for decade. I just wonder which approach we should take. I
>>> tend to leave it
>>> un-touched and do our best to "educate" the developer to go with
>>>
>>> 79 static class FlushableDOS extends DeflaterOutputStream {
>>> 80 public FlushableDOS(OutputStream out) {
>>> 81 super(out);
>>> 82 }
>>> 83 84 public void flush() throws IOException {
>>> 85 int len = 0;
>>> 86 while ((len = def.flush(buf, 0, buf.length)) > 0) {
>>> 87 out.write(buf, 0, len);
>>> 88 }
>>> 89 out.flush();
>>> 90 }
>>> 91 }
>>>
>>> We can add something in the class description to explain about the "flush"
>>>
>>> (3)During the debug, I noticed the InflaterInputStream.available() issue.
>>> The method is currently specified as "0 if EOF 1 otherwise", which does not
>>> look right. The BufferedReader.readLine() in test case Flush()is blocked in
>>> StreamDecoder#325, the inReady() at ln#323 keeps returning true for the
>>> inputstream (InflaterInputStream). I believe this available() should be
>>> specified/implemented as
>>>
>>> 0 if EOF, in.available() if inf.needInput() is true, otherwise 1.
>>>
>>> Opinion? This is going to be a separate bugid, if we agree to fix it. Btw,
>>> how to write above spec in English?:-)
>>>
>>> Sherman
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>
More information about the core-libs-dev
mailing list