NIO performance
Alan Bateman
Alan.Bateman at Sun.COM
Sat Sep 5 07:14:35 PDT 2009
John Hendrikx wrote:
> I noticed that using a FileVisitor to iterate over a single directory
> is far faster than doing a DirectoryStream + readBasicFileAttributes
> combination.
>
> Results for 9000 file directory:
>
> Simple DirectoryStream iteration = 300 ms
> DirectoryStream iteration + readBasicFileAttributes for each entry =
> 9000 ms
> FileVisitor which skips all subdirectories (but does return
> BasicFileAttributes for each entry) = 480 ms
>
> Using a FileVisitor to iterate over a single directory seems somewhat
> clumsy so I looked at the implementation to see if there was a better
> way, but I found that it is basically cheating (Path seems to be an
> instance of BasicFileAttributesHolder which obviously is a lot faster
> than doing your own Attributes.readBasicFileAttributes(path) call).
>
> I guess what I'm saying is that I didn't really expect that -- I would
> have expected that for reading a single directory (+ attributes) there
> would be a simple way to do it like DirectoryStream currently
> provides. Currently, I think that many would fall for the trap of
> iterating over a DirectoryStream and calling readBasicFileAttributes
> on each entry which is very slow. Of course now that I figured this
> out it is no real problem to just wrap a FileVisitor in my own class
> to read a single directory.
>
> I hope this feedback is useful.
Files.walkFileTree is essentially an internal iterator built on an
external iterator (DirectoryStream). So for the maxDepth == 1 case then
it is reasonable to expect the performance to be the same as using
DirectoryStream to iterate over all entries in the directory, calling
Attributes.readBasicFileAttributes to read the attributes of each file.
The anomaly you are seeing is a Windows only anomaly. Elsewhere (on
Solaris and Linux at least) the performance will be as you would expect.
For example, I did a quick test on Solaris with a directory of 9000
files and the simple iteration took 22ms, the iteration + reading the
attributes took 88ms, and walkFileTree with maxDepth==1 took 83ms. On
Windows, the anomaly (or why is Files.walkFileTree so much faster) is
because the attributes are obtained during the directory traversal so
the implementation can avoid re-reading them - if it re-read the
attributes for each file then it would take about the same time as
calling Attributes.readBasicFileAttributes for each file in the
directory, an operation that is expensive on Windows. One thing to say
is that difference isn't as obvious with NTFS - for example, I repeated
your test with a directory of 9000 files and the simple iteration took
21ms, the iteration + reading the attributes took 237ms, and
Files.walkFileTree took 20ms. With FAT32 or when the volume is remote
then the difference is very obvious - I'll guess this is what you are
testing on.
This issue does bring up the question as to if we need a method that
returns a DirectoryStream where the elements are a pair consisting of
the entry and its attributes. It's come up once or twice. From a
performance point of view it helps Windows, and maybe some custom file
systems. The other potential justification is convenience in that the
basic attributes will often be required when iterating over a directory.
It's worth looking at.
-Alan.
More information about the nio-dev
mailing list