Request/discussion: BufferedReader reading using async API while providing sync API
David Holmes
david.holmes at oracle.com
Thu Oct 27 21:09:03 UTC 2016
You might try discussing on net-dev rather than core-libs-dev, to get
additional historical info related to the io and nio file APIs.
David
On 28/10/2016 5:08 AM, Brunoais wrote:
> You are right. Even in windows it does not set the flags for async
> reads. It seems like it is windows itself that does the decision to
> buffer the contents based on its own heuristics.
>
> But... Why? Why won't it be? Why is there no API for it? How am I
> getting 100% HDD use and faster times when I fake work to delay getting
> more data and I only have a fluctuating 60-90% (always going up and
> down) when I use an InputStream?
> Is it related to how both classes cache and how frequently and how much
> each one asks for data?
>
> I really would prefer not having to read the source code because it
> takes a real long time T.T.
>
> I end up reinstating... And wondering...
>
> Why doesn't java provide a single-threaded non-block API for file reads
> for all OS that support it? I simply cannot find that information no
> matter how much I search on google, bing, duck duck go... Can any of you
> point me to whomever knows?
>
> On 27/10/2016 14:11, Vitaly Davidovich wrote:
>> I don't know about Windows specifically, but generally file systems
>> across major OS's will implement readahead in their IO scheduler when
>> they detect sequential scans.
>>
>> On Linux, you can also strace your test to confirm which syscalls are
>> emitted (you should be seeing plain read()'s there, with
>> FileInputStream and FileChannel).
>>
>> On Thu, Oct 27, 2016 at 9:06 AM, Brunoais <brunoaiss at gmail.com
>> <mailto:brunoaiss at gmail.com>> wrote:
>>
>> Thanks for the heads up.
>>
>> I'll try that later. These tests are still useful then. Meanwhile,
>> I'll end up also checking how FileChannel queries the OS on
>> windows. I'm getting 100% HDD reads... Could it be that the OS
>> reads the file ahead on its own?... Anyway, I'll look into it.
>> Thanks for the heads up.
>>
>>
>> On 27/10/2016 13:53, Vitaly Davidovich wrote:
>>>
>>>
>>> On Thu, Oct 27, 2016 at 8:34 AM, Brunoais <brunoaiss at gmail.com
>>> <mailto:brunoaiss at gmail.com>> wrote:
>>>
>>> Oh... I see. In that case, it means something is terribly
>>> wrong. It can be my initial tests, though.
>>>
>>> I'm testing on both linux and windows and I'm getting
>>> performance gains from using the FileChannel compared to
>>> using FileInputStream... The tests also make sense based on
>>> my predictions O_O...
>>>
>>> FileInputStream requires copying native buffers holding the read
>>> data to the java byte[]. If you're using direct ByteBuffer for
>>> FileChannel, that whole memcpy is skipped. Try comparing
>>> FileChannel with HeapByteBuffer instead.
>>>
>>>
>>> On 27/10/2016 11:47, Vitaly Davidovich wrote:
>>>>
>>>>
>>>> On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com
>>>> <mailto:brunoaiss at gmail.com>> wrote:
>>>>
>>>> Did you read the C code?
>>>>
>>>> I looked at the Linux code in the JDK.
>>>>
>>>> Have you got any idea how many functions Windows or
>>>> Linux (nearly all flavors) have for the read operation
>>>> towards a file?
>>>>
>>>> I do.
>>>>
>>>>
>>>> I have already done that homework myself. I may not have
>>>> read JVM's source code but I know well that there's
>>>> functions on both Windows and Linux that provide such
>>>> interface I mentioned although they require a slightly
>>>> different treatment (and different constants).
>>>>
>>>> You should read the JDK (native) source code instead of
>>>> guessing/assuming. On Linux, it doesn't use aio facilities
>>>> for files. The kernel io scheduler may issue readahead
>>>> behind the scenes, but there's no nonblocking file io that's
>>>> at the heart of your premise.
>>>>
>>>>
>>>>
>>>> On 27/10/2016 00:06, Vitaly Davidovich wrote:
>>>>
>>>>
>>>>
>>>> On Wednesday, October 26, 2016, Brunoais
>>>> <brunoaiss at gmail.com <mailto:brunoaiss at gmail.com>>
>>>> wrote:
>>>>
>>>> It is actually based on the premise that:
>>>>
>>>> 1. The first call to
>>>> ReadableByteChannel.read(ByteBuffer) sets the OS
>>>> buffer size to fill in as the same size as
>>>> ByteBuffer.
>>>>
>>>> Why do you say that? AFAICT, it issues a read
>>>> syscall and that will block if the data isn't in
>>>> page cache.
>>>>
>>>> 2. The consecutive calls to
>>>> ReadableByteChannel.read(ByteBuffer)
>>>> orders
>>>> the JVM to order the OS to execute memcpy()
>>>> to copy from its memory
>>>> to the shared memory created at ByteBuffer
>>>> instantiation (in
>>>> java 8)
>>>> using Unsafe and then for the JVM to update
>>>> the ByteBuffer fields.
>>>>
>>>> I think subsequent reads just invoke the same read
>>>> syscall, passing the current file offset maintained
>>>> by the file channel instance.
>>>>
>>>> 3. The call will not block waiting for I/O and
>>>> it won't take longer
>>>> than the JNI interface if no new data exists.
>>>> However, it will
>>>> block
>>>> waiting for the OS to execute memcpy() to the
>>>> shared memory.
>>>>
>>>> So why do you think it won't block?
>>>>
>>>>
>>>> Is my premise wrong?
>>>>
>>>> If I read correctly, if I don't use a
>>>> DirectBuffer, there would be
>>>> even another intermediate buffer to copy data to
>>>> before giving it
>>>> to the "user" which would be useless.
>>>>
>>>> If you use a HeapByteBuffer, then there's an extra
>>>> copy from the native buffer to the Java buffer.
>>>>
>>>>
>>>>
>>>> On 26/10/2016 11:57, Pavel Rappo wrote:
>>>>
>>>> I believe I see where you coming from.
>>>> Please correct me if
>>>> I'm wrong.
>>>>
>>>> Your implementation is based on the premise
>>>> that a call to
>>>> ReadableByteChannel.read()
>>>> _initiates_ the operation and returns
>>>> immediately. The OS then
>>>> continues to fill
>>>> the buffer while there's a free space in the
>>>> buffer and the
>>>> channel hasn't encountered EOF.
>>>>
>>>> Is that right?
>>>>
>>>> On 25 Oct 2016, at 22:16, Brunoais
>>>> <brunoaiss at gmail.com>
>>>> wrote:
>>>>
>>>> Thank you for your time. I'll try to
>>>> explain it. I hope I
>>>> can clear it up.
>>>> First of it, I made a meaning mistake
>>>> between asynchronous
>>>> and non-blocking. This implementation
>>>> uses a non-blocking
>>>> algorithm internally while providing a
>>>> blocking-like
>>>> algorithm on the surface. It is
>>>> single-threaded and not
>>>> multi-threaded where one thread fetches
>>>> data and blocks
>>>> waiting and the other accumulates it and
>>>> provides to
>>>> whichever wants it.
>>>>
>>>> Second of it, I had made a mistake of
>>>> going after
>>>> BufferedReader instead of going after
>>>> BufferedInputStream.
>>>> If you want me to go after
>>>> BufferedReader it's ok but I
>>>> only thought that going after
>>>> BufferedInputStream would be
>>>> more generically useful than
>>>> BufferedReaderwhen I started
>>>> the poc.
>>>>
>>>> On to my code:
>>>> Short answers:
>>>> • The sleep(int) exists because
>>>> I don't know how
>>>> to wait until more data exists in the
>>>> buffer which is part
>>>> of read()'s contract.
>>>> • The ByteBuffer gives a buffer
>>>> that is filled by
>>>> the OS (what I believe Channels do)
>>>> instead of getting
>>>> data only by demand (what I
>>>> believe Streams do).
>>>> Full answers:
>>>> The blockingFill(boolean) method is a
>>>> method for a busy
>>>> wait for a fill which is used
>>>> exclusively by the read()
>>>> method. All other methods use the
>>>> version that does not
>>>> sleep (fill(boolean)).
>>>> blockingFill(boolean)'s existance like that is only
>>>> because the read() method must not
>>>> return unless either:
>>>>
>>>> • The stream ended.
>>>> • The next byte is ready for
>>>> reading.
>>>> Additionally, statistically, that while
>>>> loop will rarely
>>>> evaluate to true as reads are in chunks
>>>> so readPos will be
>>>> behind writePos most of the time.
>>>> I have no idea if an interrupt will ever
>>>> happen, to be
>>>> honest. The main reasons why I'm using a
>>>> sleep is because
>>>> I didn't want a hog onto the CPU in a
>>>> full thread usage
>>>> busy wait and because I didn't find any
>>>> way of doing a
>>>> thread sleep in order to wake up later
>>>> when the buffer
>>>> managed by native code has more data.
>>>> The Non-blocking part is managed by the
>>>> buffer the OS
>>>> keeps filling most if not all the time.
>>>> That buffer is the
>>>> field
>>>>
>>>> ByteBuffer readBuffer
>>>> That's the gaining part against the
>>>> plain old Buffered
>>>> classes.
>>>>
>>>>
>>>> Did that make sense to you? Feel free to
>>>> ask anything else
>>>> you need.
>>>>
>>>> On 25/10/2016 20:52, Pavel Rappo wrote:
>>>>
>>>> I've skimmed through the code and
>>>> I'm not sure I can
>>>> see any asynchronicity
>>>> (you were pointing at the lack of it
>>>> in BufferedReader).
>>>> And the mechanics of this is very
>>>> puzzling to me, to
>>>> be honest:
>>>> void blockingFill(boolean
>>>> forced) throws
>>>> IOException {
>>>> fill(forced);
>>>> while (readPos == writePos) {
>>>> try {
>>>> Thread.sleep(100);
>>>> } catch
>>>> (InterruptedException e) {
>>>> // An interrupt may mean more data is
>>>> available
>>>> }
>>>> fill(forced);
>>>> }
>>>> }
>>>> I thought you were suggesting that
>>>> we should utilize
>>>> the tools which OS provides
>>>> more efficiently. Instead we have
>>>> something that looks
>>>> very similarly to a
>>>> "busy loop" and... also who and when
>>>> is supposed to
>>>> interrupt Thread.sleep()?
>>>> Sorry, I'm not following. Could you
>>>> please explain how
>>>> this is supposed to work?
>>>>
>>>> On 24 Oct 2016, at 15:59, Brunoais
>>>> <brunoaiss at gmail.com>
>>>> wrote:
>>>> Attached and sending!
>>>> On 24/10/2016 13:48, Pavel Rappo
>>>> wrote:
>>>>
>>>> Could you please send a new
>>>> email on this list
>>>> with the source attached as a
>>>> text file?
>>>>
>>>> On 23 Oct 2016, at
>>>> 19:14, Brunoais
>>>> <brunoaiss at gmail.com>
>>>> wrote:
>>>> Here's my poc/prototype:
>>>>
>>>> http://pastebin.com/WRpYWDJF
>>>>
>>>> I've implemented the
>>>> bare minimum of the
>>>> class that follows the same contract of
>>>> BufferedReader while signaling all issues
>>>> I think it may have or
>>>> has in comments.
>>>> I also wrote some
>>>> javadoc to help guiding
>>>> through the class.
>>>> I could have used more
>>>> fields from
>>>> BufferedReader but the names were so
>>>> minimalistic that were confusing me. I
>>>> intent to change them before sending this
>>>> to openJDK.
>>>> One of the major
>>>> problems this has is long
>>>> overflowing. It is major because it is
>>>> hidden, it will be extremely rare and it
>>>> takes a really long time to reproduce.
>>>> There are different ways of dealing with
>>>> it. From just
>>>> documenting to actually
>>>> making code that works with it.
>>>> I built a simple test
>>>> code for it to have
>>>> some ideas about
>>>> performance and correctness.
>>>>
>>>> http://pastebin.com/eh6LFgwT
>>>>
>>>> This doesn't do a
>>>> through test if it is
>>>> actually working correctly but I see no
>>>> reason for it not working correctly after
>>>> fixing the 2 bugs that test found.
>>>> I'll also leave here
>>>> some conclusions
>>>> about speed and resource consumption I found.
>>>> I made tests with
>>>> default buffer sizes,
>>>> 5000B 15_000B and 500_000B. I noticed
>>>> that, with my hardware, with the 1 530 000
>>>> 000B file, I was getting
>>>> around:
>>>> In all buffers and fake
>>>> work: 10~15s speed
>>>> improvement ( from 90% HDD speed to 100%
>>>> HDD speed)
>>>> In all buffers and no
>>>> fake work: 1~2s
>>>> speed improvement ( from 90% HDD speed to
>>>> 100% HDD speed)
>>>> Changing the buffer size was giving
>>>> different reading speeds but both were
>>>> quite equal in how much they would change
>>>> when changing the buffer
>>>> size.
>>>> Finally, I could always confirm that I/O
>>>> was always the slowest
>>>> thing while this
>>>> code was running.
>>>> For the ones wondering
>>>> about the file
>>>> size; it is both to avoid OS cache and to
>>>> make the reading at the
>>>> main use-case
>>>> these objects are for (large streams of
>>>> bytes).
>>>> @Pavel, are you open for discussion now
>>>> ;)? Need anything else?
>>>> On 21/10/2016 19:21,
>>>> Pavel Rappo wrote:
>>>>
>>>> Just to append to my previous email.
>>>> BufferedReader wraps any Reader out there.
>>>> Not specifically FileReader. While
>>>> you're talking about the case of effective
>>>> reading from a file.
>>>> I guess there's one existing
>>>> possibility to provide exactly what
>>>> you need (as I
>>>> understand it) under this method:
>>>> /**
>>>> * Opens a file for reading,
>>>> returning a {@code BufferedReader} to
>>>> read text
>>>> * from the file in an efficient
>>>> manner...
>>>> ...
>>>> */
>>>>
>>>> java.nio.file.Files#newBufferedReader(java.nio.file.Path)
>>>> It can return _anything_ as long as it
>>>> is a BufferedReader. We can do it, but it
>>>> needs to be investigated not only for
>>>> your favorite OS but for other OSes as
>>>> well. Feel free to prototype this and
>>>> we can discuss it on the list later.
>>>> Thanks,
>>>> -Pavel
>>>>
>>>> On 21 Oct 2016, at 18:56, Brunoais
>>>> <brunoaiss at gmail.com>
>>>> wrote:
>>>> Pavel is right.
>>>> In reality, I was expecting such
>>>> BufferedReader to use only a
>>>> single buffer and have that Buffer
>>>> being filled asynchronously, not
>>>> in a different Thread.
>>>> Additionally, I don't have the
>>>> intention of having a larger
>>>> buffer than before unless stated
>>>> through the API (the constructor).
>>>> In my idea, internally, it is
>>>> supposed to use
>>>> java.nio.channels.AsynchronousFileChannel
>>>> or equivalent.
>>>> It does not prevent having two
>>>> buffers and I do not intent to
>>>> change BufferedReader itself. I'd
>>>> do an BufferedAsyncReader of sorts
>>>> (any name suggestion is welcome as
>>>> I'm an awful namer).
>>>> On 21/10/2016 18:38, Roger Riggs
>>>> wrote:
>>>>
>>>> Hi Pavel,
>>>> I think Brunoais asking for a
>>>> double buffering scheme in
>>>> which the implementation of
>>>> BufferReader fills (a second
>>>> buffer) in parallel with the
>>>> application reading from the
>>>> 1st buffer
>>>> and managing the swaps and
>>>> async reads transparently.
>>>> It would not change the API
>>>> but would change the
>>>> interactions between the
>>>> buffered reader
>>>> and the underlying stream. It
>>>> would also increase memory
>>>> requirements and processing
>>>> by introducing or using a
>>>> separate thread and the
>>>> necessary synchronization.
>>>> Though I think the formal
>>>> interface semantics could be
>>>> maintained, I have doubts
>>>> about compatibility and its
>>>> unintended consequences on
>>>> existing subclasses,
>>>> applications and libraries.
>>>> $.02, Roger
>>>> On 10/21/16 1:22 PM, Pavel
>>>> Rappo wrote:
>>>>
>>>> Off the top of my head, I
>>>> would say it's not
>>>> possible to change the
>>>> design of an
>>>> _extensible_ type that has
>>>> been out there for 20 or
>>>> so years. All these I/O
>>>> streams from java.io <http://java.io>
>>>> <http://java.io> were
>>>> designed for simple
>>>> synchronous use case.
>>>> It's not that their design
>>>> is flawed in some way,
>>>> it's that they doesn't seem to
>>>> suit your needs. Have you
>>>> considered using
>>>> java.nio.channels.AsynchronousFileChannel
>>>> in your applications?
>>>> -Pavel
>>>>
>>>> On 21 Oct 2016, at
>>>> 17:08, Brunoais
>>>> <brunoaiss at gmail.com>
>>>> wrote:
>>>> Any feedback on this?
>>>> I'm really interested
>>>> in implementing such
>>>> BufferedReader/BufferedStreamReader
>>>> to allow speeding up
>>>> my applications
>>>> without having to
>>>> think in an
>>>> asynchronous way or
>>>> multi-threading while
>>>> programming with it.
>>>> That's why I'm asking
>>>> this here.
>>>> On 13/10/2016 14:45,
>>>> Brunoais wrote:
>>>>
>>>> Hi,
>>>> I looked at
>>>> BufferedReader
>>>> source code for
>>>> java 9 long with
>>>> the source code of
>>>> the
>>>> channels/streams
>>>> used. I noticed
>>>> that, like in java
>>>> 7, BufferedReader
>>>> does not use an
>>>> Async API to load
>>>> data from files,
>>>> instead, the data
>>>> loading is all
>>>> done synchronously
>>>> even when the OS
>>>> allows requesting
>>>> a file to be read
>>>> and getting a
>>>> warning later when
>>>> the file is
>>>> effectively read.
>>>> Why Is
>>>> BufferedReader not
>>>> async while
>>>> providing a sync API?
>>>>
>>>> <BufferedNonBlockStream.java><Tests.java>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -- Sent from my phone
>>>>
>>>>
>>>>
>>>>
>>>> -- Sent from my phone
>>>
>>>
>>
>>
>
More information about the core-libs-dev
mailing list