Enhancement proposal regarding bufferization of InputStream

Thu Apr 15 13:51:04 UTC 2021

Hi Sergey,

Are you taking into account that for many reads the data is not copied 
into the local buffer.
See the comments in BufferedInputStream.read1: 277:280?

How much is the slowdown, when BufferedInputStreams are chained?

Thanks, Roger

On 4/15/21 7:08 AM, Pavel Rappo wrote:
>> On 15 Apr 2021, at 08:33, Сергей Цыпанов <sergei.tsypanov at yandex.ru> wrote:
>>
>> Hello,
>>
>> buffering with j.i.BufferedInputStream is a handy way to improve performance of IO operations.
>> However in many cases buffering is redundant. Consider this code snippet from Spring Framework:
>>
>> static ClassReader getClassReader(Resource rsc) throws Exception {
>> try (var is = new BufferedInputStream(rsc.getInputStream())) {
>>    return new ClassReader(is);
>> }
>> }
>>
>> Interface Resource has lots of implementations some of them return buffered InputStream,
>> others don't, but all of them get wrapped with BufferedInputStream.
>>
>> Apart from obvious misuses like BufferedInputStream cascade such wrapping is not necessary,
>> e.g. when we read a huge file using FileInputStream.readAllBytes(),
>> in others cases it's harmful e.g. when we read a small (comparing to the default
>> size of buffer in BufferedInputStream) file with readAllBytes() or
>> when we read from ByteArrayInputStream which is kind of buffered one by its nature.
>>
>> I think an instance of InputStream should decide itself whether it requires buffering,
>> so I suggest to add a couple of methods into j.i.InputStream:
>>
>> // subclasses can override this
>> protected boolean needsBuffer() {
>> return true;
>> }
>>
>> public InputStream buffered() {
>> return needsBuffer() ? new BufferedInputStream(this) : this;
>> }
>>
>> this allows subclasses of InputStream to override needsBuffer() to declare buffering redundancy.
>> Let's imagine we've overridden needsBuffer() in BufferedInputStream:
>>
>> public class BufferedInputStream {
>> @Override
>> public boolean needsBuffer() {
>>    return true;
>> }
>> }
>>
>> then the code we've mentioned above should be rewritten as
>>
>> static ClassReader getClassReader(Resource rsc) throws Exception {
>> try (var is = rsc.getInputStream().buffered()) {
>>    return new ClassReader(is);
>> }
>> }
>>
>> preventing cascade of BufferedInputStreams.
>>
> When I read this part
>
>> There are also cases when the need for buffering depends on the way how we read from InputStream:
>>
>> new FileInputStream(file).buffered().readAllBytes() // buffering is redundant
> my knee-jerk reaction was that a better solution likely lies with introducing a marker interface and selectively implementing it as opposed to adding two new methods to the existing class and selectively overriding them. Let's call this interface java.io.Buffered: Bufferred is to InputStream as RandomAccess is to List.
>
> Just to be clear: I'm not proposing to actually do this. It's just a thought.
>
> -Pavel
>
>> var data = new DataInputStream(new FileInputStream(file).buffered())
>> for (int i = 0; i < 1000; i++) {
>>   data.readInt();                          // readInt() does 4 calls to InputStream.read() so buffering is needed
>> }
>>
>> here if FileInputStream.needsBuffer() is overridden and returns false (assuming most of reads from it are bulk)
>> then we won't have buffering for DataInputStream. To avoid this we can also
>> add InputStream.buffered(boolean enforceBuffering) to have manual control.
>>
>> To sum up, my proposal is to add those methods to InputStream:
>>
>> protected static boolean needsBuffer() {
>> return true;
>> }
>>
>> public InputStream buffered() {
>> return buffered(needsBuffer());
>> }
>>
>> public InputStream buffered(boolean enforceBuffering) {
>> return enforceBuffering ? new BufferedInputStream(this) : this;
>> }
>>
>> What do you think?
>>
>> With best regards,
>> Sergey Tsypanov