Enhancement proposal regarding bufferization of InputStream

Fri Apr 16 21:41:24 UTC 2021

Hi Roger,

> Are you taking into account that for many reads the data is not copied
> into the local buffer.
> See the comments in BufferedInputStream.read1: 277:280?

sure, I'm aware of this optimization. The buffer however is always allocated as BufferedInputStream instantiated,
waisting by default 8 kB of memory.

If I take a benchmark [1] measuring costs of reading metadata of a simple Spring bean it demonstrates [2]
measurable imrovement when buffering is dropped:

original
                                       Mode  Cnt      Score     Error   Units
read                                   avgt  100    122.041 ±   1.286   us/op
read:·gc.alloc.rate.norm               avgt  100  50795.798 ±  13.941    B/op  <<

patched
                                       Mode  Cnt      Score     Error   Units
read                                   avgt  100    119.524 ±   1.171   us/op
read:·gc.alloc.rate.norm               avgt  100  42635.578 ±  10.866    B/op <<

So in this case it's about 3 microseconds and 8 KB per bean.

When I measured the impact of change [2] on Spring Boot application I got start-up time improved
from 760 milliseconds to 718 milliseconds a memory consumption decrease from 48,09 MB to 47,47 MB.

Also see https://github.com/openjdk/jdk/pull/2992

1. https://github.com/stsypanov/ovn/blob/master/src/main/java/com/tsypanov/ovn/MetadataReaderBenchmark.java
2. https://github.com/spring-projects/spring-framework/pull/24946

Regards,
Sergey Tsypanov

> Hi Sergey,
> 
> Are you taking into account that for many reads the data is not copied
> into the local buffer.
> See the comments in BufferedInputStream.read1: 277:280?
> 
> How much is the slowdown, when BufferedInputStreams are chained?
> 
> Thanks, Roger
> 
> On 4/15/21 7:08 AM, Pavel Rappo wrote:
> 
>>> On 15 Apr 2021, at 08:33, Сергей Цыпанов <sergei.tsypanov at yandex.ru> wrote:
>>>
>>> Hello,
>>>
>>> buffering with j.i.BufferedInputStream is a handy way to improve performance of IO operations.
>>> However in many cases buffering is redundant. Consider this code snippet from Spring Framework:
>>>
>>> static ClassReader getClassReader(Resource rsc) throws Exception {
>>> try (var is = new BufferedInputStream(rsc.getInputStream())) {
>>> return new ClassReader(is);
>>> }
>>> }
>>>
>>> Interface Resource has lots of implementations some of them return buffered InputStream,
>>> others don't, but all of them get wrapped with BufferedInputStream.
>>>
>>> Apart from obvious misuses like BufferedInputStream cascade such wrapping is not necessary,
>>> e.g. when we read a huge file using FileInputStream.readAllBytes(),
>>> in others cases it's harmful e.g. when we read a small (comparing to the default
>>> size of buffer in BufferedInputStream) file with readAllBytes() or
>>> when we read from ByteArrayInputStream which is kind of buffered one by its nature.
>>>
>>> I think an instance of InputStream should decide itself whether it requires buffering,
>>> so I suggest to add a couple of methods into j.i.InputStream:
>>>
>>> // subclasses can override this
>>> protected boolean needsBuffer() {
>>> return true;
>>> }
>>>
>>> public InputStream buffered() {
>>> return needsBuffer() ? new BufferedInputStream(this) : this;
>>> }
>>>
>>> this allows subclasses of InputStream to override needsBuffer() to declare buffering redundancy.
>>> Let's imagine we've overridden needsBuffer() in BufferedInputStream:
>>>
>>> public class BufferedInputStream {
>>> @Override
>>> public boolean needsBuffer() {
>>> return true;
>>> }
>>> }
>>>
>>> then the code we've mentioned above should be rewritten as
>>>
>>> static ClassReader getClassReader(Resource rsc) throws Exception {
>>> try (var is = rsc.getInputStream().buffered()) {
>>> return new ClassReader(is);
>>> }
>>> }
>>>
>>> preventing cascade of BufferedInputStreams.
>>
>> When I read this part
>>
>>> There are also cases when the need for buffering depends on the way how we read from InputStream:
>>>
>>> new FileInputStream(file).buffered().readAllBytes() // buffering is redundant
>>
>> my knee-jerk reaction was that a better solution likely lies with introducing a marker interface and selectively implementing it as opposed to adding two new methods to the existing class and selectively overriding them. Let's call this interface java.io.Buffered: Bufferred is to InputStream as RandomAccess is to List.
>>
>> Just to be clear: I'm not proposing to actually do this. It's just a thought.
>>
>> -Pavel
>>
>>> var data = new DataInputStream(new FileInputStream(file).buffered())
>>> for (int i = 0; i < 1000; i++) {
>>> data.readInt(); // readInt() does 4 calls to InputStream.read() so buffering is needed
>>> }
>>>
>>> here if FileInputStream.needsBuffer() is overridden and returns false (assuming most of reads from it are bulk)
>>> then we won't have buffering for DataInputStream. To avoid this we can also
>>> add InputStream.buffered(boolean enforceBuffering) to have manual control.
>>>
>>> To sum up, my proposal is to add those methods to InputStream:
>>>
>>> protected static boolean needsBuffer() {
>>> return true;
>>> }
>>>
>>> public InputStream buffered() {
>>> return buffered(needsBuffer());
>>> }
>>>
>>> public InputStream buffered(boolean enforceBuffering) {
>>> return enforceBuffering ? new BufferedInputStream(this) : this;
>>> }
>>>
>>> What do you think?
>>>
>>> With best regards,
>>> Sergey Tsypanov