Enhancement proposal regarding bufferization of InputStream
Pavel Rappo
pavel.rappo at oracle.com
Thu Apr 15 11:08:19 UTC 2021
> On 15 Apr 2021, at 08:33, Сергей Цыпанов <sergei.tsypanov at yandex.ru> wrote:
>
> Hello,
>
> buffering with j.i.BufferedInputStream is a handy way to improve performance of IO operations.
> However in many cases buffering is redundant. Consider this code snippet from Spring Framework:
>
> static ClassReader getClassReader(Resource rsc) throws Exception {
> try (var is = new BufferedInputStream(rsc.getInputStream())) {
> return new ClassReader(is);
> }
> }
>
> Interface Resource has lots of implementations some of them return buffered InputStream,
> others don't, but all of them get wrapped with BufferedInputStream.
>
> Apart from obvious misuses like BufferedInputStream cascade such wrapping is not necessary,
> e.g. when we read a huge file using FileInputStream.readAllBytes(),
> in others cases it's harmful e.g. when we read a small (comparing to the default
> size of buffer in BufferedInputStream) file with readAllBytes() or
> when we read from ByteArrayInputStream which is kind of buffered one by its nature.
>
> I think an instance of InputStream should decide itself whether it requires buffering,
> so I suggest to add a couple of methods into j.i.InputStream:
>
> // subclasses can override this
> protected boolean needsBuffer() {
> return true;
> }
>
> public InputStream buffered() {
> return needsBuffer() ? new BufferedInputStream(this) : this;
> }
>
> this allows subclasses of InputStream to override needsBuffer() to declare buffering redundancy.
> Let's imagine we've overridden needsBuffer() in BufferedInputStream:
>
> public class BufferedInputStream {
> @Override
> public boolean needsBuffer() {
> return true;
> }
> }
>
> then the code we've mentioned above should be rewritten as
>
> static ClassReader getClassReader(Resource rsc) throws Exception {
> try (var is = rsc.getInputStream().buffered()) {
> return new ClassReader(is);
> }
> }
>
> preventing cascade of BufferedInputStreams.
>
When I read this part
> There are also cases when the need for buffering depends on the way how we read from InputStream:
>
> new FileInputStream(file).buffered().readAllBytes() // buffering is redundant
my knee-jerk reaction was that a better solution likely lies with introducing a marker interface and selectively implementing it as opposed to adding two new methods to the existing class and selectively overriding them. Let's call this interface java.io.Buffered: Bufferred is to InputStream as RandomAccess is to List.
Just to be clear: I'm not proposing to actually do this. It's just a thought.
-Pavel
> var data = new DataInputStream(new FileInputStream(file).buffered())
> for (int i = 0; i < 1000; i++) {
> data.readInt(); // readInt() does 4 calls to InputStream.read() so buffering is needed
> }
>
> here if FileInputStream.needsBuffer() is overridden and returns false (assuming most of reads from it are bulk)
> then we won't have buffering for DataInputStream. To avoid this we can also
> add InputStream.buffered(boolean enforceBuffering) to have manual control.
>
> To sum up, my proposal is to add those methods to InputStream:
>
> protected static boolean needsBuffer() {
> return true;
> }
>
> public InputStream buffered() {
> return buffered(needsBuffer());
> }
>
> public InputStream buffered(boolean enforceBuffering) {
> return enforceBuffering ? new BufferedInputStream(this) : this;
> }
>
> What do you think?
>
> With best regards,
> Sergey Tsypanov
More information about the core-libs-dev
mailing list