Request/discussion: BufferedReader reading using async API while providing sync API
Brunoais
brunoaiss at gmail.com
Fri Oct 28 06:53:34 UTC 2016
On 27/10/2016 22:45, Vitaly Davidovich wrote:
>
>
> On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com
> <mailto:brunoaiss at gmail.com>> wrote:
>
> You are right. Even in windows it does not set the flags for async
> reads. It seems like it is windows itself that does the decision
> to buffer the contents based on its own heuristics.
>
> You mean nonblocking, not async, right? Two different things.
Ups. Mistyped. On windows docs they seem to call it async...
>
> But... Why? Why won't it be? Why is there no API for it? How am I
> getting 100% HDD use and faster times when I fake work to delay
> getting more data and I only have a fluctuating 60-90% (always
> going up and down) when I use an InputStream?
> Is it related to how both classes cache and how frequently and how
> much each one asks for data?
>
> I really would prefer not having to read the source code because
> it takes a real long time T.T.
>
> I end up reinstating... And wondering...
>
> Why doesn't java provide a single-threaded non-block API for file
> reads for all OS that support it? I simply cannot find that
> information no matter how much I search on google, bing, duck duck
> go... Can any of you point me to whomever knows?
>
> https://lwn.net/Articles/612483/ for Linux. Unfortunately, the
> nonblocking file io story is complicated and messy.
In Windows manual and Linux manual, they call asynchonous I/O for what
is non-blocking synchonous I/O for the program that runs on the OS.
http://man7.org/linux/man-pages/man3/aio_read.3.html
http://man7.org/linux/man-pages/man7/aio.7.html
http://man7.org/linux/man-pages/man7/sigevent.7.html
This does not block, the OS writes directly to the user buffer, does not
run on a different user thread and uses either signals or a function
pointer as a callback when the operation is completed. Reading the
manual, it seems it can even be the own thread. If it is with signals, I
do know it is completely non-blocking and single-threaded (from the
"user" thread's perspective). I'd like to see this in java...
I guess I only have the NIO2 for that, then with AsynchronousFileChannel.
> On 27/10/2016 14:11, Vitaly Davidovich wrote:
>> I don't know about Windows specifically, but generally file
>> systems across major OS's will implement readahead in their IO
>> scheduler when they detect sequential scans.
>>
>> On Linux, you can also strace your test to confirm which syscalls
>> are emitted (you should be seeing plain read()'s there, with
>> FileInputStream and FileChannel).
>>
>> On Thu, Oct 27, 2016 at 9:06 AM, Brunoais <brunoaiss at gmail.com
>> <javascript:_e(%7B%7D,'cvml','brunoaiss at gmail.com');>> wrote:
>>
>> Thanks for the heads up.
>>
>> I'll try that later. These tests are still useful then.
>> Meanwhile, I'll end up also checking how FileChannel queries
>> the OS on windows. I'm getting 100% HDD reads... Could it be
>> that the OS reads the file ahead on its own?... Anyway, I'll
>> look into it. Thanks for the heads up.
>>
>>
>> On 27/10/2016 13:53, Vitaly Davidovich wrote:
>>>
>>>
>>> On Thu, Oct 27, 2016 at 8:34 AM, Brunoais
>>> <brunoaiss at gmail.com
>>> <javascript:_e(%7B%7D,'cvml','brunoaiss at gmail.com');>> wrote:
>>>
>>> Oh... I see. In that case, it means something is
>>> terribly wrong. It can be my initial tests, though.
>>>
>>> I'm testing on both linux and windows and I'm getting
>>> performance gains from using the FileChannel compared to
>>> using FileInputStream... The tests also make sense based
>>> on my predictions O_O...
>>>
>>> FileInputStream requires copying native buffers holding the
>>> read data to the java byte[]. If you're using direct
>>> ByteBuffer for FileChannel, that whole memcpy is skipped.
>>> Try comparing FileChannel with HeapByteBuffer instead.
>>>
>>>
>>> On 27/10/2016 11:47, Vitaly Davidovich wrote:
>>>>
>>>>
>>>> On Thursday, October 27, 2016, Brunoais
>>>> <brunoaiss at gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','brunoaiss at gmail.com');>>
>>>> wrote:
>>>>
>>>> Did you read the C code?
>>>>
>>>> I looked at the Linux code in the JDK.
>>>>
>>>> Have you got any idea how many functions Windows or
>>>> Linux (nearly all flavors) have for the read
>>>> operation towards a file?
>>>>
>>>> I do.
>>>>
>>>>
>>>> I have already done that homework myself. I may not
>>>> have read JVM's source code but I know well that
>>>> there's functions on both Windows and Linux that
>>>> provide such interface I mentioned although they
>>>> require a slightly different treatment (and
>>>> different constants).
>>>>
>>>> You should read the JDK (native) source code instead of
>>>> guessing/assuming. On Linux, it doesn't use aio
>>>> facilities for files. The kernel io scheduler may
>>>> issue readahead behind the scenes, but there's no
>>>> nonblocking file io that's at the heart of your premise.
>>>>
>>>>
>>>>
>>>> On 27/10/2016 00:06, Vitaly Davidovich wrote:
>>>>
>>>>
>>>>
>>>> On Wednesday, October 26, 2016, Brunoais
>>>> <brunoaiss at gmail.com
>>>> <mailto:brunoaiss at gmail.com>> wrote:
>>>>
>>>> It is actually based on the premise that:
>>>>
>>>> 1. The first call to
>>>> ReadableByteChannel.read(ByteBuffer) sets the OS
>>>> buffer size to fill in as the same size
>>>> as ByteBuffer.
>>>>
>>>> Why do you say that? AFAICT, it issues a read
>>>> syscall and that will block if the data isn't
>>>> in page cache.
>>>>
>>>> 2. The consecutive calls to
>>>> ReadableByteChannel.read(ByteBuffer)
>>>> orders
>>>> the JVM to order the OS to execute
>>>> memcpy() to copy from its memory
>>>> to the shared memory created at
>>>> ByteBuffer instantiation (in
>>>> java 8)
>>>> using Unsafe and then for the JVM to
>>>> update the ByteBuffer fields.
>>>>
>>>> I think subsequent reads just invoke the same
>>>> read syscall, passing the current file offset
>>>> maintained by the file channel instance.
>>>>
>>>> 3. The call will not block waiting for I/O
>>>> and it won't take longer
>>>> than the JNI interface if no new data
>>>> exists. However, it will
>>>> block
>>>> waiting for the OS to execute memcpy()
>>>> to the shared memory.
>>>>
>>>> So why do you think it won't block?
>>>>
>>>>
>>>> Is my premise wrong?
>>>>
>>>> If I read correctly, if I don't use a
>>>> DirectBuffer, there would be
>>>> even another intermediate buffer to copy
>>>> data to before giving it
>>>> to the "user" which would be useless.
>>>>
>>>> If you use a HeapByteBuffer, then there's an
>>>> extra copy from the native buffer to the Java
>>>> buffer.
>>>>
>>>>
>>>>
>>>> On 26/10/2016 11:57, Pavel Rappo wrote:
>>>>
>>>> I believe I see where you coming from.
>>>> Please correct me if
>>>> I'm wrong.
>>>>
>>>> Your implementation is based on the
>>>> premise that a call to
>>>> ReadableByteChannel.read()
>>>> _initiates_ the operation and returns
>>>> immediately. The OS then
>>>> continues to fill
>>>> the buffer while there's a free space
>>>> in the buffer and the
>>>> channel hasn't encountered EOF.
>>>>
>>>> Is that right?
>>>>
>>>> On 25 Oct 2016, at 22:16, Brunoais
>>>> <brunoaiss at gmail.com>
>>>> wrote:
>>>>
>>>> Thank you for your time. I'll try
>>>> to explain it. I hope I
>>>> can clear it up.
>>>> First of it, I made a meaning
>>>> mistake between asynchronous
>>>> and non-blocking. This
>>>> implementation uses a non-blocking
>>>> algorithm internally while
>>>> providing a blocking-like
>>>> algorithm on the surface. It is
>>>> single-threaded and not
>>>> multi-threaded where one thread
>>>> fetches data and blocks
>>>> waiting and the other accumulates
>>>> it and provides to
>>>> whichever wants it.
>>>>
>>>> Second of it, I had made a mistake
>>>> of going after
>>>> BufferedReader instead of going
>>>> after BufferedInputStream.
>>>> If you want me to go after
>>>> BufferedReader it's ok but I
>>>> only thought that going after
>>>> BufferedInputStream would be
>>>> more generically useful than
>>>> BufferedReaderwhen I started
>>>> the poc.
>>>>
>>>> On to my code:
>>>> Short answers:
>>>> • The sleep(int) exists
>>>> because I don't know how
>>>> to wait until more data exists in
>>>> the buffer which is part
>>>> of read()'s contract.
>>>> • The ByteBuffer gives a
>>>> buffer that is filled by
>>>> the OS (what I believe Channels do)
>>>> instead of getting
>>>> data only by demand (what I
>>>> believe Streams do).
>>>> Full answers:
>>>> The blockingFill(boolean) method is
>>>> a method for a busy
>>>> wait for a fill which is used
>>>> exclusively by the read()
>>>> method. All other methods use the
>>>> version that does not
>>>> sleep (fill(boolean)).
>>>> blockingFill(boolean)'s existance like that is only
>>>> because the read() method must not
>>>> return unless either:
>>>>
>>>> • The stream ended.
>>>> • The next byte is ready
>>>> for reading.
>>>> Additionally, statistically, that
>>>> while loop will rarely
>>>> evaluate to true as reads are in
>>>> chunks so readPos will be
>>>> behind writePos most of the time.
>>>> I have no idea if an interrupt will
>>>> ever happen, to be
>>>> honest. The main reasons why I'm
>>>> using a sleep is because
>>>> I didn't want a hog onto the CPU in
>>>> a full thread usage
>>>> busy wait and because I didn't find
>>>> any way of doing a
>>>> thread sleep in order to wake up
>>>> later when the buffer
>>>> managed by native code has more data.
>>>> The Non-blocking part is managed by
>>>> the buffer the OS
>>>> keeps filling most if not all the
>>>> time. That buffer is the
>>>> field
>>>>
>>>> ByteBuffer readBuffer
>>>> That's the gaining part against the
>>>> plain old Buffered
>>>> classes.
>>>>
>>>>
>>>> Did that make sense to you? Feel
>>>> free to ask anything else
>>>> you need.
>>>>
>>>> On 25/10/2016 20:52, Pavel Rappo wrote:
>>>>
>>>> I've skimmed through the code
>>>> and I'm not sure I can
>>>> see any asynchronicity
>>>> (you were pointing at the lack
>>>> of it in BufferedReader).
>>>> And the mechanics of this is
>>>> very puzzling to me, to
>>>> be honest:
>>>> void blockingFill(boolean
>>>> forced) throws
>>>> IOException {
>>>> fill(forced);
>>>> while (readPos == writePos) {
>>>> try {
>>>> Thread.sleep(100);
>>>> } catch (InterruptedException e) {
>>>> // An interrupt may mean more data is
>>>> available
>>>> }
>>>> fill(forced);
>>>> }
>>>> }
>>>> I thought you were suggesting
>>>> that we should utilize
>>>> the tools which OS provides
>>>> more efficiently. Instead we
>>>> have something that looks
>>>> very similarly to a
>>>> "busy loop" and... also who and
>>>> when is supposed to
>>>> interrupt Thread.sleep()?
>>>> Sorry, I'm not following. Could
>>>> you please explain how
>>>> this is supposed to work?
>>>>
>>>> On 24 Oct 2016, at 15:59,
>>>> Brunoais
>>>> <brunoaiss at gmail.com>
>>>> wrote:
>>>> Attached and sending!
>>>> On 24/10/2016 13:48, Pavel
>>>> Rappo wrote:
>>>>
>>>> Could you please send a new email on this list
>>>> with the source attached as a
>>>> text file?
>>>>
>>>> On 23 Oct 2016, at 19:14, Brunoais
>>>> <brunoaiss at gmail.com>
>>>> wrote:
>>>> Here's my poc/prototype:
>>>>
>>>> http://pastebin.com/WRpYWDJF
>>>>
>>>> I've implemented the bare minimum of the
>>>> class that follows the same contract of
>>>> BufferedReader while signaling all issues
>>>> I think it may have or has in comments.
>>>> I also wrote some javadoc to help guiding
>>>> through the class.
>>>> I could have used more fields from
>>>> BufferedReader but the names were so
>>>> minimalistic that were confusing me. I
>>>> intent to change them before sending this
>>>> to openJDK.
>>>> One of the major problems this has is long
>>>> overflowing. It is major because it is
>>>> hidden, it will be extremely rare and it
>>>> takes a really long time to reproduce.
>>>> There are different ways of dealing with
>>>> it. From just documenting to actually
>>>> making code that works with it.
>>>> I built a simple test code for it to have
>>>> some ideas about performance and correctness.
>>>>
>>>> http://pastebin.com/eh6LFgwT
>>>>
>>>> This doesn't do a through test if it is
>>>> actually working correctly but I see no
>>>> reason for it not working correctly after
>>>> fixing the 2 bugs that test found.
>>>> I'll also leave here some conclusions
>>>> about speed and resource consumption I found.
>>>> I made tests with default buffer sizes,
>>>> 5000B 15_000B and 500_000B. I noticed
>>>> that, with my hardware, with the 1 530 000
>>>> 000B file, I was getting around:
>>>> In all buffers and fake work: 10~15s speed
>>>> improvement ( from 90% HDD speed to 100%
>>>> HDD speed)
>>>> In all buffers and no fake work: 1~2s
>>>> speed improvement ( from 90% HDD speed to
>>>> 100% HDD speed)
>>>> Changing the buffer size was giving
>>>> different reading speeds but both were
>>>> quite equal in how much they would change
>>>> when changing the buffer size.
>>>> Finally, I could always confirm that I/O
>>>> was always the slowest thing while this
>>>> code was running.
>>>> For the ones wondering about the file
>>>> size; it is both to avoid OS cache and to
>>>> make the reading at the main use-case
>>>> these objects are for (large streams of
>>>> bytes).
>>>> @Pavel, are you open for discussion now
>>>> ;)? Need anything else?
>>>> On 21/10/2016 19:21, Pavel Rappo wrote:
>>>>
>>>> Just to append to my previous email.
>>>> BufferedReader wraps any Reader out there.
>>>> Not specifically FileReader. While
>>>> you're talking about the case of effective
>>>> reading from a file.
>>>> I guess there's one existing
>>>> possibility to provide exactly what
>>>> you need (as I
>>>> understand it) under this method:
>>>> /**
>>>> * Opens a file for reading,
>>>> returning a {@code BufferedReader} to
>>>> read text
>>>> * from the file in an efficient
>>>> manner...
>>>> ...
>>>> */
>>>> java.nio.file.Files#newBufferedReader(java.nio.file.Path)
>>>> It can return _anything_ as long as it
>>>> is a BufferedReader. We can do it, but it
>>>> needs to be investigated not only for
>>>> your favorite OS but for other OSes as
>>>> well. Feel free to prototype this and
>>>> we can discuss it on the list later.
>>>> Thanks,
>>>> -Pavel
>>>>
>>>> On 21 Oct 2016, at 18:56, Brunoais
>>>> <brunoaiss at gmail.com>
>>>> wrote:
>>>> Pavel is right.
>>>> In reality, I was expecting such
>>>> BufferedReader to use only a
>>>> single buffer and have that Buffer
>>>> being filled asynchronously, not
>>>> in a different Thread.
>>>> Additionally, I don't have the
>>>> intention of having a larger
>>>> buffer than before unless stated
>>>> through the API (the constructor).
>>>> In my idea, internally, it is
>>>> supposed to use
>>>> java.nio.channels.AsynchronousFileChannel
>>>> or equivalent.
>>>> It does not prevent having two
>>>> buffers and I do not intent to
>>>> change BufferedReader itself. I'd
>>>> do an BufferedAsyncReader of sorts
>>>> (any name suggestion is welcome as
>>>> I'm an awful namer).
>>>> On 21/10/2016 18:38, Roger Riggs
>>>> wrote:
>>>>
>>>> Hi Pavel,
>>>> I think Brunoais asking for a
>>>> double buffering scheme in
>>>> which the implementation of
>>>> BufferReader fills (a second
>>>> buffer) in parallel with the
>>>> application reading from the
>>>> 1st buffer
>>>> and managing the swaps and
>>>> async reads transparently.
>>>> It would not change the API
>>>> but would change the
>>>> interactions between the
>>>> buffered reader
>>>> and the underlying stream. It
>>>> would also increase memory
>>>> requirements and processing
>>>> by introducing or using a
>>>> separate thread and the
>>>> necessary synchronization.
>>>> Though I think the formal
>>>> interface semantics could be
>>>> maintained, I have doubts
>>>> about compatibility and its
>>>> unintended consequences on
>>>> existing subclasses,
>>>> applications and libraries.
>>>> $.02, Roger
>>>> On 10/21/16 1:22 PM, Pavel
>>>> Rappo wrote:
>>>>
>>>> Off the top of my head, I
>>>> would say it's not
>>>> possible to change the
>>>> design of an
>>>> _extensible_ type that has
>>>> been out there for 20 or
>>>> so years. All these I/O
>>>> streams from java.io
>>>> <http://java.io>
>>>> <http://java.io> were
>>>> designed for simple
>>>> synchronous use case.
>>>> It's not that their design
>>>> is flawed in some way,
>>>> it's that they doesn't seem to
>>>> suit your needs. Have you
>>>> considered using
>>>> java.nio.channels.AsynchronousFileChannel
>>>> in your applications?
>>>> -Pavel
>>>>
>>>> On 21 Oct 2016, at
>>>> 17:08, Brunoais
>>>> <brunoaiss at gmail.com>
>>>> wrote:
>>>> Any feedback on this?
>>>> I'm really interested
>>>> in implementing such
>>>> BufferedReader/BufferedStreamReader
>>>> to allow speeding up
>>>> my applications
>>>> without having to
>>>> think in an
>>>> asynchronous way or
>>>> multi-threading while
>>>> programming with it.
>>>> That's why I'm asking
>>>> this here.
>>>> On 13/10/2016 14:45,
>>>> Brunoais wrote:
>>>>
>>>> Hi,
>>>> I looked at
>>>> BufferedReader
>>>> source code for
>>>> java 9 long with
>>>> the source code of
>>>> the
>>>> channels/streams
>>>> used. I noticed
>>>> that, like in java
>>>> 7, BufferedReader
>>>> does not use an
>>>> Async API to load
>>>> data from files,
>>>> instead, the data
>>>> loading is all
>>>> done synchronously
>>>> even when the OS
>>>> allows requesting
>>>> a file to be read
>>>> and getting a
>>>> warning later when
>>>> the file is
>>>> effectively read.
>>>> Why Is
>>>> BufferedReader not
>>>> async while
>>>> providing a sync API?
>>>>
>>>> <BufferedNonBlockStream.java><Tests.java>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from my phone
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from my phone
>>>
>>>
>>
>>
>
>
>
> --
> Sent from my phone
More information about the core-libs-dev
mailing list