Enhancement proposal regarding bufferization of InputStream
Сергей Цыпанов
sergei.tsypanov at yandex.ru
Thu Apr 15 07:33:12 UTC 2021
Hello,
buffering with j.i.BufferedInputStream is a handy way to improve performance of IO operations.
However in many cases buffering is redundant. Consider this code snippet from Spring Framework:
static ClassReader getClassReader(Resource rsc) throws Exception {
try (var is = new BufferedInputStream(rsc.getInputStream())) {
return new ClassReader(is);
}
}
Interface Resource has lots of implementations some of them return buffered InputStream,
others don't, but all of them get wrapped with BufferedInputStream.
Apart from obvious misuses like BufferedInputStream cascade such wrapping is not necessary,
e.g. when we read a huge file using FileInputStream.readAllBytes(),
in others cases it's harmful e.g. when we read a small (comparing to the default
size of buffer in BufferedInputStream) file with readAllBytes() or
when we read from ByteArrayInputStream which is kind of buffered one by its nature.
I think an instance of InputStream should decide itself whether it requires buffering,
so I suggest to add a couple of methods into j.i.InputStream:
// subclasses can override this
protected boolean needsBuffer() {
return true;
}
public InputStream buffered() {
return needsBuffer() ? new BufferedInputStream(this) : this;
}
this allows subclasses of InputStream to override needsBuffer() to declare buffering redundancy.
Let's imagine we've overridden needsBuffer() in BufferedInputStream:
public class BufferedInputStream {
@Override
public boolean needsBuffer() {
return true;
}
}
then the code we've mentioned above should be rewritten as
static ClassReader getClassReader(Resource rsc) throws Exception {
try (var is = rsc.getInputStream().buffered()) {
return new ClassReader(is);
}
}
preventing cascade of BufferedInputStreams.
There are also cases when the need for buffering depends on the way how we read from InputStream:
new FileInputStream(file).buffered().readAllBytes() // buffering is redundant
var data = new DataInputStream(new FileInputStream(file).buffered())
for (int i = 0; i < 1000; i++) {
data.readInt(); // readInt() does 4 calls to InputStream.read() so buffering is needed
}
here if FileInputStream.needsBuffer() is overridden and returns false (assuming most of reads from it are bulk)
then we won't have buffering for DataInputStream. To avoid this we can also
add InputStream.buffered(boolean enforceBuffering) to have manual control.
To sum up, my proposal is to add those methods to InputStream:
protected static boolean needsBuffer() {
return true;
}
public InputStream buffered() {
return buffered(needsBuffer());
}
public InputStream buffered(boolean enforceBuffering) {
return enforceBuffering ? new BufferedInputStream(this) : this;
}
What do you think?
With best regards,
Sergey Tsypanov
More information about the core-libs-dev
mailing list