Request/discussion: BufferedReader reading using async API while providing sync API
Vitaly Davidovich
vitalyd at gmail.com
Thu Oct 27 21:45:36 UTC 2016
On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com> wrote:
> You are right. Even in windows it does not set the flags for async reads.
> It seems like it is windows itself that does the decision to buffer the
> contents based on its own heuristics.
>
You mean nonblocking, not async, right? Two different things.
> But... Why? Why won't it be? Why is there no API for it? How am I getting
> 100% HDD use and faster times when I fake work to delay getting more data
> and I only have a fluctuating 60-90% (always going up and down) when I use
> an InputStream?
> Is it related to how both classes cache and how frequently and how much
> each one asks for data?
>
> I really would prefer not having to read the source code because it takes
> a real long time T.T.
>
> I end up reinstating... And wondering...
>
> Why doesn't java provide a single-threaded non-block API for file reads
> for all OS that support it? I simply cannot find that information no matter
> how much I search on google, bing, duck duck go... Can any of you point me
> to whomever knows?
>
https://lwn.net/Articles/612483/ for Linux. Unfortunately, the
nonblocking file io story is complicated and messy.
> On 27/10/2016 14:11, Vitaly Davidovich wrote:
>
> I don't know about Windows specifically, but generally file systems across
> major OS's will implement readahead in their IO scheduler when they detect
> sequential scans.
>
> On Linux, you can also strace your test to confirm which syscalls are
> emitted (you should be seeing plain read()'s there, with FileInputStream
> and FileChannel).
>
> On Thu, Oct 27, 2016 at 9:06 AM, Brunoais <brunoaiss at gmail.com
> <javascript:_e(%7B%7D,'cvml','brunoaiss at gmail.com');>> wrote:
>
>> Thanks for the heads up.
>>
>> I'll try that later. These tests are still useful then. Meanwhile, I'll
>> end up also checking how FileChannel queries the OS on windows. I'm getting
>> 100% HDD reads... Could it be that the OS reads the file ahead on its
>> own?... Anyway, I'll look into it. Thanks for the heads up.
>>
>> On 27/10/2016 13:53, Vitaly Davidovich wrote:
>>
>>
>>
>> On Thu, Oct 27, 2016 at 8:34 AM, Brunoais <brunoaiss at gmail.com
>> <javascript:_e(%7B%7D,'cvml','brunoaiss at gmail.com');>> wrote:
>>
>>> Oh... I see. In that case, it means something is terribly wrong. It can
>>> be my initial tests, though.
>>>
>>> I'm testing on both linux and windows and I'm getting performance gains
>>> from using the FileChannel compared to using FileInputStream... The tests
>>> also make sense based on my predictions O_O...
>>>
>> FileInputStream requires copying native buffers holding the read data to
>> the java byte[]. If you're using direct ByteBuffer for FileChannel, that
>> whole memcpy is skipped. Try comparing FileChannel with HeapByteBuffer
>> instead.
>>
>>>
>>> On 27/10/2016 11:47, Vitaly Davidovich wrote:
>>>
>>>
>>>
>>> On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com
>>> <javascript:_e(%7B%7D,'cvml','brunoaiss at gmail.com');>> wrote:
>>>
>>>> Did you read the C code?
>>>
>>> I looked at the Linux code in the JDK.
>>>
>>>> Have you got any idea how many functions Windows or Linux (nearly all
>>>> flavors) have for the read operation towards a file?
>>>
>>> I do.
>>>
>>>>
>>>> I have already done that homework myself. I may not have read JVM's
>>>> source code but I know well that there's functions on both Windows and
>>>> Linux that provide such interface I mentioned although they require a
>>>> slightly different treatment (and different constants).
>>>
>>> You should read the JDK (native) source code instead of
>>> guessing/assuming. On Linux, it doesn't use aio facilities for files. The
>>> kernel io scheduler may issue readahead behind the scenes, but there's no
>>> nonblocking file io that's at the heart of your premise.
>>>
>>>>
>>>>
>>>> On 27/10/2016 00:06, Vitaly Davidovich wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wednesday, October 26, 2016, Brunoais <brunoaiss at gmail.com <mailto:
>>>>> brunoaiss at gmail.com>> wrote:
>>>>>
>>>>> It is actually based on the premise that:
>>>>>
>>>>> 1. The first call to ReadableByteChannel.read(ByteBuffer) sets
>>>>> the OS
>>>>> buffer size to fill in as the same size as ByteBuffer.
>>>>>
>>>>> Why do you say that? AFAICT, it issues a read syscall and that will
>>>>> block if the data isn't in page cache.
>>>>>
>>>>> 2. The consecutive calls to ReadableByteChannel.read(ByteBuffer)
>>>>> orders
>>>>> the JVM to order the OS to execute memcpy() to copy from its
>>>>> memory
>>>>> to the shared memory created at ByteBuffer instantiation (in
>>>>> java 8)
>>>>> using Unsafe and then for the JVM to update the ByteBuffer
>>>>> fields.
>>>>>
>>>>> I think subsequent reads just invoke the same read syscall, passing
>>>>> the current file offset maintained by the file channel instance.
>>>>>
>>>>> 3. The call will not block waiting for I/O and it won't take longer
>>>>> than the JNI interface if no new data exists. However, it will
>>>>> block
>>>>> waiting for the OS to execute memcpy() to the shared memory.
>>>>>
>>>>> So why do you think it won't block?
>>>>>
>>>>>
>>>>> Is my premise wrong?
>>>>>
>>>>> If I read correctly, if I don't use a DirectBuffer, there would be
>>>>> even another intermediate buffer to copy data to before giving it
>>>>> to the "user" which would be useless.
>>>>>
>>>>> If you use a HeapByteBuffer, then there's an extra copy from the
>>>>> native buffer to the Java buffer.
>>>>>
>>>>>
>>>>>
>>>>> On 26/10/2016 11:57, Pavel Rappo wrote:
>>>>>
>>>>> I believe I see where you coming from. Please correct me if
>>>>> I'm wrong.
>>>>>
>>>>> Your implementation is based on the premise that a call to
>>>>> ReadableByteChannel.read()
>>>>> _initiates_ the operation and returns immediately. The OS then
>>>>> continues to fill
>>>>> the buffer while there's a free space in the buffer and the
>>>>> channel hasn't encountered EOF.
>>>>>
>>>>> Is that right?
>>>>>
>>>>> On 25 Oct 2016, at 22:16, Brunoais <brunoaiss at gmail.com>
>>>>> wrote:
>>>>>
>>>>> Thank you for your time. I'll try to explain it. I hope I
>>>>> can clear it up.
>>>>> First of it, I made a meaning mistake between asynchronous
>>>>> and non-blocking. This implementation uses a non-blocking
>>>>> algorithm internally while providing a blocking-like
>>>>> algorithm on the surface. It is single-threaded and not
>>>>> multi-threaded where one thread fetches data and blocks
>>>>> waiting and the other accumulates it and provides to
>>>>> whichever wants it.
>>>>>
>>>>> Second of it, I had made a mistake of going after
>>>>> BufferedReader instead of going after BufferedInputStream.
>>>>> If you want me to go after BufferedReader it's ok but I
>>>>> only thought that going after BufferedInputStream would be
>>>>> more generically useful than BufferedReaderwhen I started
>>>>> the poc.
>>>>>
>>>>> On to my code:
>>>>> Short answers:
>>>>> • The sleep(int) exists because I don't know how
>>>>> to wait until more data exists in the buffer which is part
>>>>> of read()'s contract.
>>>>> • The ByteBuffer gives a buffer that is filled by
>>>>> the OS (what I believe Channels do) instead of getting
>>>>> data only by demand (what I believe Streams do).
>>>>> Full answers:
>>>>> The blockingFill(boolean) method is a method for a busy
>>>>> wait for a fill which is used exclusively by the read()
>>>>> method. All other methods use the version that does not
>>>>> sleep (fill(boolean)).
>>>>> blockingFill(boolean)'s existance like that is only
>>>>> because the read() method must not return unless either:
>>>>>
>>>>> • The stream ended.
>>>>> • The next byte is ready for reading.
>>>>> Additionally, statistically, that while loop will rarely
>>>>> evaluate to true as reads are in chunks so readPos will be
>>>>> behind writePos most of the time.
>>>>> I have no idea if an interrupt will ever happen, to be
>>>>> honest. The main reasons why I'm using a sleep is because
>>>>> I didn't want a hog onto the CPU in a full thread usage
>>>>> busy wait and because I didn't find any way of doing a
>>>>> thread sleep in order to wake up later when the buffer
>>>>> managed by native code has more data.
>>>>> The Non-blocking part is managed by the buffer the OS
>>>>> keeps filling most if not all the time. That buffer is the
>>>>> field
>>>>>
>>>>> ByteBuffer readBuffer
>>>>> That's the gaining part against the plain old Buffered
>>>>> classes.
>>>>>
>>>>>
>>>>> Did that make sense to you? Feel free to ask anything else
>>>>> you need.
>>>>>
>>>>> On 25/10/2016 20:52, Pavel Rappo wrote:
>>>>>
>>>>> I've skimmed through the code and I'm not sure I can
>>>>> see any asynchronicity
>>>>> (you were pointing at the lack of it in
>>>>> BufferedReader).
>>>>> And the mechanics of this is very puzzling to me, to
>>>>> be honest:
>>>>> void blockingFill(boolean forced) throws
>>>>> IOException {
>>>>> fill(forced);
>>>>> while (readPos == writePos) {
>>>>> try {
>>>>> Thread.sleep(100);
>>>>> } catch (InterruptedException e) {
>>>>> // An interrupt may mean more data is
>>>>> available
>>>>> }
>>>>> fill(forced);
>>>>> }
>>>>> }
>>>>> I thought you were suggesting that we should utilize
>>>>> the tools which OS provides
>>>>> more efficiently. Instead we have something that looks
>>>>> very similarly to a
>>>>> "busy loop" and... also who and when is supposed to
>>>>> interrupt Thread.sleep()?
>>>>> Sorry, I'm not following. Could you please explain how
>>>>> this is supposed to work?
>>>>>
>>>>> On 24 Oct 2016, at 15:59, Brunoais
>>>>> <brunoaiss at gmail.com>
>>>>> wrote:
>>>>> Attached and sending!
>>>>> On 24/10/2016 13:48, Pavel Rappo wrote:
>>>>>
>>>>> Could you please send a new email on this list
>>>>> with the source attached as a
>>>>> text file?
>>>>>
>>>>> On 23 Oct 2016, at 19:14, Brunoais
>>>>> <brunoaiss at gmail.com>
>>>>> wrote:
>>>>> Here's my poc/prototype:
>>>>>
>>>>> http://pastebin.com/WRpYWDJF
>>>>>
>>>>> I've implemented the bare minimum of the
>>>>> class that follows the same contract of
>>>>> BufferedReader while signaling all issues
>>>>> I think it may have or has in comments.
>>>>> I also wrote some javadoc to help guiding
>>>>> through the class.
>>>>> I could have used more fields from
>>>>> BufferedReader but the names were so
>>>>> minimalistic that were confusing me. I
>>>>> intent to change them before sending this
>>>>> to openJDK.
>>>>> One of the major problems this has is long
>>>>> overflowing. It is major because it is
>>>>> hidden, it will be extremely rare and it
>>>>> takes a really long time to reproduce.
>>>>> There are different ways of dealing with
>>>>> it. From just documenting to actually
>>>>> making code that works with it.
>>>>> I built a simple test code for it to have
>>>>> some ideas about performance and
>>>>> correctness.
>>>>>
>>>>> http://pastebin.com/eh6LFgwT
>>>>>
>>>>> This doesn't do a through test if it is
>>>>> actually working correctly but I see no
>>>>> reason for it not working correctly after
>>>>> fixing the 2 bugs that test found.
>>>>> I'll also leave here some conclusions
>>>>> about speed and resource consumption I
>>>>> found.
>>>>> I made tests with default buffer sizes,
>>>>> 5000B 15_000B and 500_000B. I noticed
>>>>> that, with my hardware, with the 1 530 000
>>>>> 000B file, I was getting around:
>>>>> In all buffers and fake work: 10~15s speed
>>>>> improvement ( from 90% HDD speed to 100%
>>>>> HDD speed)
>>>>> In all buffers and no fake work: 1~2s
>>>>> speed improvement ( from 90% HDD speed to
>>>>> 100% HDD speed)
>>>>> Changing the buffer size was giving
>>>>> different reading speeds but both were
>>>>> quite equal in how much they would change
>>>>> when changing the buffer size.
>>>>> Finally, I could always confirm that I/O
>>>>> was always the slowest thing while this
>>>>> code was running.
>>>>> For the ones wondering about the file
>>>>> size; it is both to avoid OS cache and to
>>>>> make the reading at the main use-case
>>>>> these objects are for (large streams of
>>>>> bytes).
>>>>> @Pavel, are you open for discussion now
>>>>> ;)? Need anything else?
>>>>> On 21/10/2016 19:21, Pavel Rappo wrote:
>>>>>
>>>>> Just to append to my previous email.
>>>>> BufferedReader wraps any Reader out
>>>>> there.
>>>>> Not specifically FileReader. While
>>>>> you're talking about the case of
>>>>> effective
>>>>> reading from a file.
>>>>> I guess there's one existing
>>>>> possibility to provide exactly what
>>>>> you need (as I
>>>>> understand it) under this method:
>>>>> /**
>>>>> * Opens a file for reading,
>>>>> returning a {@code BufferedReader} to
>>>>> read text
>>>>> * from the file in an efficient
>>>>> manner...
>>>>> ...
>>>>> */
>>>>> java.nio.file.Files#newBuffere
>>>>> dReader(java.nio.file.Path)
>>>>> It can return _anything_ as long as it
>>>>> is a BufferedReader. We can do it, but
>>>>> it
>>>>> needs to be investigated not only for
>>>>> your favorite OS but for other OSes as
>>>>> well. Feel free to prototype this and
>>>>> we can discuss it on the list later.
>>>>> Thanks,
>>>>> -Pavel
>>>>>
>>>>> On 21 Oct 2016, at 18:56, Brunoais
>>>>> <brunoaiss at gmail.com>
>>>>> wrote:
>>>>> Pavel is right.
>>>>> In reality, I was expecting such
>>>>> BufferedReader to use only a
>>>>> single buffer and have that Buffer
>>>>> being filled asynchronously, not
>>>>> in a different Thread.
>>>>> Additionally, I don't have the
>>>>> intention of having a larger
>>>>> buffer than before unless stated
>>>>> through the API (the constructor).
>>>>> In my idea, internally, it is
>>>>> supposed to use
>>>>> java.nio.channels.Asynchronous
>>>>> FileChannel
>>>>> or equivalent.
>>>>> It does not prevent having two
>>>>> buffers and I do not intent to
>>>>> change BufferedReader itself. I'd
>>>>> do an BufferedAsyncReader of sorts
>>>>> (any name suggestion is welcome as
>>>>> I'm an awful namer).
>>>>> On 21/10/2016 18:38, Roger Riggs
>>>>> wrote:
>>>>>
>>>>> Hi Pavel,
>>>>> I think Brunoais asking for a
>>>>> double buffering scheme in
>>>>> which the implementation of
>>>>> BufferReader fills (a second
>>>>> buffer) in parallel with the
>>>>> application reading from the
>>>>> 1st buffer
>>>>> and managing the swaps and
>>>>> async reads transparently.
>>>>> It would not change the API
>>>>> but would change the
>>>>> interactions between the
>>>>> buffered reader
>>>>> and the underlying stream. It
>>>>> would also increase memory
>>>>> requirements and processing
>>>>> by introducing or using a
>>>>> separate thread and the
>>>>> necessary synchronization.
>>>>> Though I think the formal
>>>>> interface semantics could be
>>>>> maintained, I have doubts
>>>>> about compatibility and its
>>>>> unintended consequences on
>>>>> existing subclasses,
>>>>> applications and libraries.
>>>>> $.02, Roger
>>>>> On 10/21/16 1:22 PM, Pavel
>>>>> Rappo wrote:
>>>>>
>>>>> Off the top of my head, I
>>>>> would say it's not
>>>>> possible to change the
>>>>> design of an
>>>>> _extensible_ type that has
>>>>> been out there for 20 or
>>>>> so years. All these I/O
>>>>> streams from java.io
>>>>> <http://java.io> were
>>>>> designed for simple
>>>>> synchronous use case.
>>>>> It's not that their design
>>>>> is flawed in some way,
>>>>> it's that they doesn't
>>>>> seem to
>>>>> suit your needs. Have you
>>>>> considered using
>>>>>
>>>>> java.nio.channels.AsynchronousFileChannel
>>>>> in your applications?
>>>>> -Pavel
>>>>>
>>>>> On 21 Oct 2016, at
>>>>> 17:08, Brunoais
>>>>> <brunoaiss at gmail.com>
>>>>> wrote:
>>>>> Any feedback on this?
>>>>> I'm really interested
>>>>> in implementing such
>>>>>
>>>>> BufferedReader/BufferedStreamReader
>>>>> to allow speeding up
>>>>> my applications
>>>>> without having to
>>>>> think in an
>>>>> asynchronous way or
>>>>> multi-threading while
>>>>> programming with it.
>>>>> That's why I'm asking
>>>>> this here.
>>>>> On 13/10/2016 14:45,
>>>>> Brunoais wrote:
>>>>>
>>>>> Hi,
>>>>> I looked at
>>>>> BufferedReader
>>>>> source code for
>>>>> java 9 long with
>>>>> the source code of
>>>>> the
>>>>> channels/streams
>>>>> used. I noticed
>>>>> that, like in java
>>>>> 7, BufferedReader
>>>>> does not use an
>>>>> Async API to load
>>>>> data from files,
>>>>> instead, the data
>>>>> loading is all
>>>>> done synchronously
>>>>> even when the OS
>>>>> allows requesting
>>>>> a file to be read
>>>>> and getting a
>>>>> warning later when
>>>>> the file is
>>>>> effectively read.
>>>>> Why Is
>>>>> BufferedReader not
>>>>> async while
>>>>> providing a sync
>>>>> API?
>>>>>
>>>>> <BufferedNonBlockStream.java><Tests.java>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from my phone
>>>>>
>>>>
>>>>
>>>
>>> --
>>> Sent from my phone
>>>
>>>
>>>
>>
>>
>
>
--
Sent from my phone
More information about the core-libs-dev
mailing list