Request/discussion: BufferedReader reading using async API while providing sync API
Brunoais
brunoaiss at gmail.com
Fri Oct 28 08:16:46 UTC 2016
I'll try going back to a previous version I worked on which used the
java7's AsynchronousFileChannel and work from there. My small research
shows it can also work with AsynchronousFileChannel mostly without changes.
For now, 1 question:
Is Thread.sleep() a possible way of dealing the block requirements of
read()? Do I need to use LockSupport.park() or something like that?
I'll call back here when it is done.
On 27/10/2016 22:09, David Holmes wrote:
> You might try discussing on net-dev rather than core-libs-dev, to get
> additional historical info related to the io and nio file APIs.
>
> David
>
> On 28/10/2016 5:08 AM, Brunoais wrote:
>> You are right. Even in windows it does not set the flags for async
>> reads. It seems like it is windows itself that does the decision to
>> buffer the contents based on its own heuristics.
>>
>> But... Why? Why won't it be? Why is there no API for it? How am I
>> getting 100% HDD use and faster times when I fake work to delay getting
>> more data and I only have a fluctuating 60-90% (always going up and
>> down) when I use an InputStream?
>> Is it related to how both classes cache and how frequently and how much
>> each one asks for data?
>>
>> I really would prefer not having to read the source code because it
>> takes a real long time T.T.
>>
>> I end up reinstating... And wondering...
>>
>> Why doesn't java provide a single-threaded non-block API for file reads
>> for all OS that support it? I simply cannot find that information no
>> matter how much I search on google, bing, duck duck go... Can any of you
>> point me to whomever knows?
>>
>> On 27/10/2016 14:11, Vitaly Davidovich wrote:
>>> I don't know about Windows specifically, but generally file systems
>>> across major OS's will implement readahead in their IO scheduler when
>>> they detect sequential scans.
>>>
>>> On Linux, you can also strace your test to confirm which syscalls are
>>> emitted (you should be seeing plain read()'s there, with
>>> FileInputStream and FileChannel).
>>>
>>> On Thu, Oct 27, 2016 at 9:06 AM, Brunoais <brunoaiss at gmail.com
>>> <mailto:brunoaiss at gmail.com>> wrote:
>>>
>>> Thanks for the heads up.
>>>
>>> I'll try that later. These tests are still useful then. Meanwhile,
>>> I'll end up also checking how FileChannel queries the OS on
>>> windows. I'm getting 100% HDD reads... Could it be that the OS
>>> reads the file ahead on its own?... Anyway, I'll look into it.
>>> Thanks for the heads up.
>>>
>>>
>>> On 27/10/2016 13:53, Vitaly Davidovich wrote:
>>>>
>>>>
>>>> On Thu, Oct 27, 2016 at 8:34 AM, Brunoais <brunoaiss at gmail.com
>>>> <mailto:brunoaiss at gmail.com>> wrote:
>>>>
>>>> Oh... I see. In that case, it means something is terribly
>>>> wrong. It can be my initial tests, though.
>>>>
>>>> I'm testing on both linux and windows and I'm getting
>>>> performance gains from using the FileChannel compared to
>>>> using FileInputStream... The tests also make sense based on
>>>> my predictions O_O...
>>>>
>>>> FileInputStream requires copying native buffers holding the read
>>>> data to the java byte[]. If you're using direct ByteBuffer for
>>>> FileChannel, that whole memcpy is skipped. Try comparing
>>>> FileChannel with HeapByteBuffer instead.
>>>>
>>>>
>>>> On 27/10/2016 11:47, Vitaly Davidovich wrote:
>>>>>
>>>>>
>>>>> On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com
>>>>> <mailto:brunoaiss at gmail.com>> wrote:
>>>>>
>>>>> Did you read the C code?
>>>>>
>>>>> I looked at the Linux code in the JDK.
>>>>>
>>>>> Have you got any idea how many functions Windows or
>>>>> Linux (nearly all flavors) have for the read operation
>>>>> towards a file?
>>>>>
>>>>> I do.
>>>>>
>>>>>
>>>>> I have already done that homework myself. I may not have
>>>>> read JVM's source code but I know well that there's
>>>>> functions on both Windows and Linux that provide such
>>>>> interface I mentioned although they require a slightly
>>>>> different treatment (and different constants).
>>>>>
>>>>> You should read the JDK (native) source code instead of
>>>>> guessing/assuming. On Linux, it doesn't use aio facilities
>>>>> for files. The kernel io scheduler may issue readahead
>>>>> behind the scenes, but there's no nonblocking file io that's
>>>>> at the heart of your premise.
>>>>>
>>>>>
>>>>>
>>>>> On 27/10/2016 00:06, Vitaly Davidovich wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Wednesday, October 26, 2016, Brunoais
>>>>> <brunoaiss at gmail.com <mailto:brunoaiss at gmail.com>>
>>>>> wrote:
>>>>>
>>>>> It is actually based on the premise that:
>>>>>
>>>>> 1. The first call to
>>>>> ReadableByteChannel.read(ByteBuffer) sets the OS
>>>>> buffer size to fill in as the same size as
>>>>> ByteBuffer.
>>>>>
>>>>> Why do you say that? AFAICT, it issues a read
>>>>> syscall and that will block if the data isn't in
>>>>> page cache.
>>>>>
>>>>> 2. The consecutive calls to
>>>>> ReadableByteChannel.read(ByteBuffer)
>>>>> orders
>>>>> the JVM to order the OS to execute memcpy()
>>>>> to copy from its memory
>>>>> to the shared memory created at ByteBuffer
>>>>> instantiation (in
>>>>> java 8)
>>>>> using Unsafe and then for the JVM to update
>>>>> the ByteBuffer fields.
>>>>>
>>>>> I think subsequent reads just invoke the same read
>>>>> syscall, passing the current file offset maintained
>>>>> by the file channel instance.
>>>>>
>>>>> 3. The call will not block waiting for I/O and
>>>>> it won't take longer
>>>>> than the JNI interface if no new data exists.
>>>>> However, it will
>>>>> block
>>>>> waiting for the OS to execute memcpy() to the
>>>>> shared memory.
>>>>>
>>>>> So why do you think it won't block?
>>>>>
>>>>>
>>>>> Is my premise wrong?
>>>>>
>>>>> If I read correctly, if I don't use a
>>>>> DirectBuffer, there would be
>>>>> even another intermediate buffer to copy data to
>>>>> before giving it
>>>>> to the "user" which would be useless.
>>>>>
>>>>> If you use a HeapByteBuffer, then there's an extra
>>>>> copy from the native buffer to the Java buffer.
>>>>>
>>>>>
>>>>>
>>>>> On 26/10/2016 11:57, Pavel Rappo wrote:
>>>>>
>>>>> I believe I see where you coming from.
>>>>> Please correct me if
>>>>> I'm wrong.
>>>>>
>>>>> Your implementation is based on the premise
>>>>> that a call to
>>>>> ReadableByteChannel.read()
>>>>> _initiates_ the operation and returns
>>>>> immediately. The OS then
>>>>> continues to fill
>>>>> the buffer while there's a free space in the
>>>>> buffer and the
>>>>> channel hasn't encountered EOF.
>>>>>
>>>>> Is that right?
>>>>>
>>>>> On 25 Oct 2016, at 22:16, Brunoais
>>>>> <brunoaiss at gmail.com>
>>>>> wrote:
>>>>>
>>>>> Thank you for your time. I'll try to
>>>>> explain it. I hope I
>>>>> can clear it up.
>>>>> First of it, I made a meaning mistake
>>>>> between asynchronous
>>>>> and non-blocking. This implementation
>>>>> uses a non-blocking
>>>>> algorithm internally while providing a
>>>>> blocking-like
>>>>> algorithm on the surface. It is
>>>>> single-threaded and not
>>>>> multi-threaded where one thread fetches
>>>>> data and blocks
>>>>> waiting and the other accumulates it and
>>>>> provides to
>>>>> whichever wants it.
>>>>>
>>>>> Second of it, I had made a mistake of
>>>>> going after
>>>>> BufferedReader instead of going after
>>>>> BufferedInputStream.
>>>>> If you want me to go after
>>>>> BufferedReader it's ok but I
>>>>> only thought that going after
>>>>> BufferedInputStream would be
>>>>> more generically useful than
>>>>> BufferedReaderwhen I started
>>>>> the poc.
>>>>>
>>>>> On to my code:
>>>>> Short answers:
>>>>> • The sleep(int) exists because
>>>>> I don't know how
>>>>> to wait until more data exists in the
>>>>> buffer which is part
>>>>> of read()'s contract.
>>>>> • The ByteBuffer gives a buffer
>>>>> that is filled by
>>>>> the OS (what I believe Channels do)
>>>>> instead of getting
>>>>> data only by demand (what I
>>>>> believe Streams do).
>>>>> Full answers:
>>>>> The blockingFill(boolean) method is a
>>>>> method for a busy
>>>>> wait for a fill which is used
>>>>> exclusively by the read()
>>>>> method. All other methods use the
>>>>> version that does not
>>>>> sleep (fill(boolean)).
>>>>> blockingFill(boolean)'s existance like that is only
>>>>> because the read() method must not
>>>>> return unless either:
>>>>>
>>>>> • The stream ended.
>>>>> • The next byte is ready for
>>>>> reading.
>>>>> Additionally, statistically, that while
>>>>> loop will rarely
>>>>> evaluate to true as reads are in chunks
>>>>> so readPos will be
>>>>> behind writePos most of the time.
>>>>> I have no idea if an interrupt will ever
>>>>> happen, to be
>>>>> honest. The main reasons why I'm using a
>>>>> sleep is because
>>>>> I didn't want a hog onto the CPU in a
>>>>> full thread usage
>>>>> busy wait and because I didn't find any
>>>>> way of doing a
>>>>> thread sleep in order to wake up later
>>>>> when the buffer
>>>>> managed by native code has more data.
>>>>> The Non-blocking part is managed by the
>>>>> buffer the OS
>>>>> keeps filling most if not all the time.
>>>>> That buffer is the
>>>>> field
>>>>>
>>>>> ByteBuffer readBuffer
>>>>> That's the gaining part against the
>>>>> plain old Buffered
>>>>> classes.
>>>>>
>>>>>
>>>>> Did that make sense to you? Feel free to
>>>>> ask anything else
>>>>> you need.
>>>>>
>>>>> On 25/10/2016 20:52, Pavel Rappo wrote:
>>>>>
>>>>> I've skimmed through the code and
>>>>> I'm not sure I can
>>>>> see any asynchronicity
>>>>> (you were pointing at the lack of it
>>>>> in BufferedReader).
>>>>> And the mechanics of this is very
>>>>> puzzling to me, to
>>>>> be honest:
>>>>> void blockingFill(boolean
>>>>> forced) throws
>>>>> IOException {
>>>>> fill(forced);
>>>>> while (readPos ==
>>>>> writePos) {
>>>>> try {
>>>>> Thread.sleep(100);
>>>>> } catch
>>>>> (InterruptedException e) {
>>>>> // An interrupt may mean more data is
>>>>> available
>>>>> }
>>>>> fill(forced);
>>>>> }
>>>>> }
>>>>> I thought you were suggesting that
>>>>> we should utilize
>>>>> the tools which OS provides
>>>>> more efficiently. Instead we have
>>>>> something that looks
>>>>> very similarly to a
>>>>> "busy loop" and... also who and when
>>>>> is supposed to
>>>>> interrupt Thread.sleep()?
>>>>> Sorry, I'm not following. Could you
>>>>> please explain how
>>>>> this is supposed to work?
>>>>>
>>>>> On 24 Oct 2016, at 15:59,
>>>>> Brunoais
>>>>> <brunoaiss at gmail.com>
>>>>> wrote:
>>>>> Attached and sending!
>>>>> On 24/10/2016 13:48, Pavel Rappo
>>>>> wrote:
>>>>>
>>>>> Could you please send a new
>>>>> email on this list
>>>>> with the source attached as a
>>>>> text file?
>>>>>
>>>>> On 23 Oct 2016, at
>>>>> 19:14, Brunoais
>>>>> <brunoaiss at gmail.com>
>>>>> wrote:
>>>>> Here's my poc/prototype:
>>>>>
>>>>> http://pastebin.com/WRpYWDJF
>>>>>
>>>>> I've implemented the
>>>>> bare minimum of the
>>>>> class that follows the same contract of
>>>>> BufferedReader while signaling all issues
>>>>> I think it may have or
>>>>> has in comments.
>>>>> I also wrote some
>>>>> javadoc to help guiding
>>>>> through the class.
>>>>> I could have used more
>>>>> fields from
>>>>> BufferedReader but the names were so
>>>>> minimalistic that were confusing me. I
>>>>> intent to change them before sending this
>>>>> to openJDK.
>>>>> One of the major
>>>>> problems this has is long
>>>>> overflowing. It is major because it is
>>>>> hidden, it will be extremely rare and it
>>>>> takes a really long time to reproduce.
>>>>> There are different ways of dealing with
>>>>> it. From just
>>>>> documenting to actually
>>>>> making code that works with it.
>>>>> I built a simple test
>>>>> code for it to have
>>>>> some ideas about
>>>>> performance and correctness.
>>>>>
>>>>> http://pastebin.com/eh6LFgwT
>>>>>
>>>>> This doesn't do a
>>>>> through test if it is
>>>>> actually working correctly but I see no
>>>>> reason for it not working correctly after
>>>>> fixing the 2 bugs that test found.
>>>>> I'll also leave here
>>>>> some conclusions
>>>>> about speed and resource consumption I found.
>>>>> I made tests with
>>>>> default buffer sizes,
>>>>> 5000B 15_000B and 500_000B. I noticed
>>>>> that, with my hardware, with the 1 530 000
>>>>> 000B file, I was getting
>>>>> around:
>>>>> In all buffers and fake
>>>>> work: 10~15s speed
>>>>> improvement ( from 90% HDD speed to 100%
>>>>> HDD speed)
>>>>> In all buffers and no
>>>>> fake work: 1~2s
>>>>> speed improvement ( from 90% HDD speed to
>>>>> 100% HDD speed)
>>>>> Changing the buffer size was giving
>>>>> different reading speeds but both were
>>>>> quite equal in how much they would change
>>>>> when changing the buffer
>>>>> size.
>>>>> Finally, I could always confirm that I/O
>>>>> was always the slowest
>>>>> thing while this
>>>>> code was running.
>>>>> For the ones wondering
>>>>> about the file
>>>>> size; it is both to avoid OS cache and to
>>>>> make the reading at the
>>>>> main use-case
>>>>> these objects are for (large streams of
>>>>> bytes).
>>>>> @Pavel, are you open for discussion now
>>>>> ;)? Need anything else?
>>>>> On 21/10/2016 19:21,
>>>>> Pavel Rappo wrote:
>>>>>
>>>>> Just to append to my previous email.
>>>>> BufferedReader wraps any Reader out there.
>>>>> Not specifically FileReader. While
>>>>> you're talking about the case of effective
>>>>> reading from a file.
>>>>> I guess there's one existing
>>>>> possibility to provide exactly what
>>>>> you need (as I
>>>>> understand it) under this method:
>>>>> /**
>>>>> * Opens a file for reading,
>>>>> returning a {@code BufferedReader} to
>>>>> read text
>>>>> * from the file in an efficient
>>>>> manner...
>>>>> ...
>>>>> */
>>>>>
>>>>> java.nio.file.Files#newBufferedReader(java.nio.file.Path)
>>>>> It can return _anything_ as long as it
>>>>> is a BufferedReader. We can do it, but it
>>>>> needs to be investigated not only for
>>>>> your favorite OS but for other OSes as
>>>>> well. Feel free to prototype this and
>>>>> we can discuss it on the list later.
>>>>> Thanks,
>>>>> -Pavel
>>>>>
>>>>> On 21 Oct 2016, at 18:56, Brunoais
>>>>> <brunoaiss at gmail.com>
>>>>> wrote:
>>>>> Pavel is right.
>>>>> In reality, I was expecting such
>>>>> BufferedReader to use only a
>>>>> single buffer and have that Buffer
>>>>> being filled asynchronously, not
>>>>> in a different Thread.
>>>>> Additionally, I don't have the
>>>>> intention of having a larger
>>>>> buffer than before unless stated
>>>>> through the API (the constructor).
>>>>> In my idea, internally, it is
>>>>> supposed to use
>>>>> java.nio.channels.AsynchronousFileChannel
>>>>> or equivalent.
>>>>> It does not prevent having two
>>>>> buffers and I do not intent to
>>>>> change BufferedReader itself. I'd
>>>>> do an BufferedAsyncReader of sorts
>>>>> (any name suggestion is welcome as
>>>>> I'm an awful namer).
>>>>> On 21/10/2016 18:38, Roger Riggs
>>>>> wrote:
>>>>>
>>>>> Hi Pavel,
>>>>> I think Brunoais asking for a
>>>>> double buffering scheme in
>>>>> which the implementation of
>>>>> BufferReader fills (a second
>>>>> buffer) in parallel with the
>>>>> application reading from the
>>>>> 1st buffer
>>>>> and managing the swaps and
>>>>> async reads transparently.
>>>>> It would not change the API
>>>>> but would change the
>>>>> interactions between the
>>>>> buffered reader
>>>>> and the underlying stream. It
>>>>> would also increase memory
>>>>> requirements and processing
>>>>> by introducing or using a
>>>>> separate thread and the
>>>>> necessary synchronization.
>>>>> Though I think the formal
>>>>> interface semantics could be
>>>>> maintained, I have doubts
>>>>> about compatibility and its
>>>>> unintended consequences on
>>>>> existing subclasses,
>>>>> applications and libraries.
>>>>> $.02, Roger
>>>>> On 10/21/16 1:22 PM, Pavel
>>>>> Rappo wrote:
>>>>>
>>>>> Off the top of my head, I
>>>>> would say it's not
>>>>> possible to change the
>>>>> design of an
>>>>> _extensible_ type that has
>>>>> been out there for 20 or
>>>>> so years. All these I/O
>>>>> streams from java.io <http://java.io>
>>>>> <http://java.io> were
>>>>> designed for simple
>>>>> synchronous use case.
>>>>> It's not that their design
>>>>> is flawed in some way,
>>>>> it's that they doesn't seem to
>>>>> suit your needs. Have you
>>>>> considered using
>>>>> java.nio.channels.AsynchronousFileChannel
>>>>> in your applications?
>>>>> -Pavel
>>>>>
>>>>> On 21 Oct 2016, at
>>>>> 17:08, Brunoais
>>>>> <brunoaiss at gmail.com>
>>>>> wrote:
>>>>> Any feedback on this?
>>>>> I'm really interested
>>>>> in implementing such
>>>>> BufferedReader/BufferedStreamReader
>>>>> to allow speeding up
>>>>> my applications
>>>>> without having to
>>>>> think in an
>>>>> asynchronous way or
>>>>> multi-threading while
>>>>> programming with it.
>>>>> That's why I'm asking
>>>>> this here.
>>>>> On 13/10/2016 14:45,
>>>>> Brunoais wrote:
>>>>>
>>>>> Hi,
>>>>> I looked at
>>>>> BufferedReader
>>>>> source code for
>>>>> java 9 long with
>>>>> the source code of
>>>>> the
>>>>> channels/streams
>>>>> used. I noticed
>>>>> that, like in java
>>>>> 7, BufferedReader
>>>>> does not use an
>>>>> Async API to load
>>>>> data from files,
>>>>> instead, the data
>>>>> loading is all
>>>>> done synchronously
>>>>> even when the OS
>>>>> allows requesting
>>>>> a file to be read
>>>>> and getting a
>>>>> warning later when
>>>>> the file is
>>>>> effectively read.
>>>>> Why Is
>>>>> BufferedReader not
>>>>> async while
>>>>> providing a sync API?
>>>>>
>>>>> <BufferedNonBlockStream.java><Tests.java>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- Sent from my phone
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- Sent from my phone
>>>>
>>>>
>>>
>>>
>>
>
More information about the core-libs-dev
mailing list