ListFiles performance vs file system visitor
Paulo Levi
i30817 at gmail.com
Wed Feb 18 11:15:46 PST 2009
I managed massive improvement by refactoring the code somewhat in the
older 1.1 usb machine with a usb drive, and it cut the java time from
41 to 25 - fairly awesome, and in another recent dual core machine, i
can no longer tell the difference.
Testing 2,306 files and 417 folders i just got a massive win by
removing the null check and just using list files:
public static void getFiles(int levels, File[] sum, List<File>
files, List<File> directories){
Comparator<String> orderFiles = Strings.getNaturalComparator();
getFiles(levels, sum, files, directories, orderFiles);
}
private static void getFiles(int levels, File[] sum, List<File>
files, List<File> directories, Comparator<String> comp) {
int dirIndex = directories.size();
List<String[]> subFilesList = new ArrayList<String[]>(50);
for (File f : sum) {
String [] subFiles = f.list();
if (subFiles == null) {
files.add(f);
} else {
directories.add(f);
Arrays.sort(subFiles, comp);
subFilesList.add(subFiles);
}
}
if (levels > 0) {
for (int dirLen = directories.size(), subCounter = 0;
dirIndex < dirLen; dirIndex++, subCounter++) {
File current = directories.get(dirIndex);
String [] childs = subFilesList.get(subCounter);
File[] children = new File[childs.length];
createFiles(current, childs, children);
getFiles(levels - 1, children, files, directories, comp);
}
}
}
private static void createFiles(File parent, String[]
childStrings, File [] childsOut) {
for (int i = 0; i < childStrings.length; i++) {
childsOut[i] = new File(parent, childStrings[i]);
}
}
This is my junit test:
@Test
public void testGetFiles() {
List<File> files = new ArrayList<File>();
List<File> directories = new ArrayList<File>();
File [] arr = {new File("e:\\\\LargeDir")};
long time = System.currentTimeMillis();
IoUtils.getFiles(5, arr, files, directories);
System.out.println("Time indexing : "+
(System.currentTimeMillis()-time) );
}
Output : Time indexing : 2859
Don't know how to record the windows time, but its about the same 2-4s
I guess the bottle neck is in usb 1.1 ... Also strange that this
listfiles is faster ... i would think it is doing more work - being
executed both for files and directories, and saving the results on a
list (to break the time complexity up) versus just using isDirectory
in all files and using listFiles only on the directory. Ok maybe not
so strange now i write it out.
More information about the nio-discuss
mailing list