Enhancement proposal regarding bufferization of InputStream

Thu Apr 15 11:08:19 UTC 2021

> On 15 Apr 2021, at 08:33, Сергей Цыпанов <sergei.tsypanov at yandex.ru> wrote:
> 
> Hello,
> 
> buffering with j.i.BufferedInputStream is a handy way to improve performance of IO operations.
> However in many cases buffering is redundant. Consider this code snippet from Spring Framework:
> 
> static ClassReader getClassReader(Resource rsc) throws Exception {
> try (var is = new BufferedInputStream(rsc.getInputStream())) {
>   return new ClassReader(is);
> }
> }
> 
> Interface Resource has lots of implementations some of them return buffered InputStream,
> others don't, but all of them get wrapped with BufferedInputStream.
> 
> Apart from obvious misuses like BufferedInputStream cascade such wrapping is not necessary,
> e.g. when we read a huge file using FileInputStream.readAllBytes(),
> in others cases it's harmful e.g. when we read a small (comparing to the default 
> size of buffer in BufferedInputStream) file with readAllBytes() or
> when we read from ByteArrayInputStream which is kind of buffered one by its nature.
> 
> I think an instance of InputStream should decide itself whether it requires buffering,
> so I suggest to add a couple of methods into j.i.InputStream:
> 
> // subclasses can override this
> protected boolean needsBuffer() {
> return true;
> }
> 
> public InputStream buffered() {
> return needsBuffer() ? new BufferedInputStream(this) : this;
> }
> 
> this allows subclasses of InputStream to override needsBuffer() to declare buffering redundancy.
> Let's imagine we've overridden needsBuffer() in BufferedInputStream:
> 
> public class BufferedInputStream {
> @Override
> public boolean needsBuffer() {
>   return true;
> }
> }
> 
> then the code we've mentioned above should be rewritten as
> 
> static ClassReader getClassReader(Resource rsc) throws Exception {
> try (var is = rsc.getInputStream().buffered()) {
>   return new ClassReader(is);
> }
> }
> 
> preventing cascade of BufferedInputStreams.
> 

When I read this part

> There are also cases when the need for buffering depends on the way how we read from InputStream:
> 
> new FileInputStream(file).buffered().readAllBytes() // buffering is redundant

my knee-jerk reaction was that a better solution likely lies with introducing a marker interface and selectively implementing it as opposed to adding two new methods to the existing class and selectively overriding them. Let's call this interface java.io.Buffered: Bufferred is to InputStream as RandomAccess is to List.

Just to be clear: I'm not proposing to actually do this. It's just a thought.

-Pavel

> var data = new DataInputStream(new FileInputStream(file).buffered())
> for (int i = 0; i < 1000; i++) {
>  data.readInt();                          // readInt() does 4 calls to InputStream.read() so buffering is needed
> }
> 
> here if FileInputStream.needsBuffer() is overridden and returns false (assuming most of reads from it are bulk)
> then we won't have buffering for DataInputStream. To avoid this we can also
> add InputStream.buffered(boolean enforceBuffering) to have manual control.
> 
> To sum up, my proposal is to add those methods to InputStream:
> 
> protected static boolean needsBuffer() {
> return true;
> }
> 
> public InputStream buffered() {
> return buffered(needsBuffer());
> }
> 
> public InputStream buffered(boolean enforceBuffering) {
> return enforceBuffering ? new BufferedInputStream(this) : this;
> }
> 
> What do you think?
> 
> With best regards,
> Sergey Tsypanov