RFR 8220792: JavacFileManager.list() performance
Jonathan Gibbons
jonathan.gibbons at oracle.com
Tue Apr 2 22:29:16 UTC 2019
OK, thanks for looking into this; I also looked at the code and agree
with your assessment.
One of these days we'll have some really good unit tests for
JavacFileManager ;-) Until then, this looks good.
-- Jon
On 03/31/2019 05:04 AM, Ron Shapiro wrote:
> My initial inclination was to say that you were right, but that this
> would have been the case even before my change, since already
> ArchiveContainer did nothing to index parent directories of the
> directories that actually have files in the archive. But then I did
> some debugging to see what actually happens, and it seems that the
> archive filesystem visits all parent directories, and thus those are
> also indexed. Here's one example from running jtreg:
>
> indexing archivePath: .../asmtools.jar
> preVisit: /
> preVisit: /org
> preVisit: /org/openjdk
> preVisit: /org/openjdk/asmtools
> preVisit: /org/openjdk/asmtools/util
> preVisit: /org/openjdk/asmtools/jdis
> preVisit: /org/openjdk/asmtools/jdec
> preVisit: /org/openjdk/asmtools/jcoder
> preVisit: /org/openjdk/asmtools/jcdec
> preVisit: /org/openjdk/asmtools/jasm
> preVisit: /org/openjdk/asmtools/common
> preVisit: /org/openjdk/asmtools/asmutils
> preVisit: /META-INF
>
> I checked the code for ZipFileSystem, and it hard-codes the root
> directory as "/" for all zip files no matter the contents.
>
> So for the example above, list("org", recurse=true) should still recur
> properly.
>
> On Fri, Mar 29, 2019 at 9:21 PM Jonathan Gibbons
> <jonathan.gibbons at oracle.com <mailto:jonathan.gibbons at oracle.com>> wrote:
>
> Ron,
>
> Although I've not checked in deep detail, I think there may be an
> inherent problem with the caching you are doing, as regards to
> recursive list.
>
> The list method takes a "recurse" parameter, which can be used to
> list the contents of a package /and its subpackages/.
>
> If you have two containers, one containing package "p", and
> another containing package "p.q", if you try a recursive list of
> "p", I believe your code may miss the contents of "p.q" in the
> second container. If nothing else, this is an important case to be
> checked and tested. Although this feature is not typically used in
> javac itself, it is used in javadoc, which also uses the same file
> manager.
>
> It may be that the cache needs to be smart enough to handle the
> case when an archive has subdirectories of a package, even if it
> does not have any direct contents of the package.
>
> -- Jon
>
> On 3/29/19 7:10 AM, Ron Shapiro wrote:
>> - Added a comment explaining the ordering. LMK if you have a
>> better way of explaining it, as I think you grok it better than I do
>> - Thanks for the pointer to javacFileManagerOptions. I updated to
>> use that.
>> Both are in this updated webrev:
>> http://cr.openjdk.java.net/~ronsh/8220792/webrev.02/
>> <http://cr.openjdk.java.net/%7Eronsh/8220792/webrev.02/>
>>
>> - Regarding indexing: I don't think today anything would detect
>> updates to a jar file if a new RelativeDirectory was added.
>> Perhaps that's intentional, because we don't care/don't even want
>> to try to support someone updating their jarfile in the middle of
>> a compilation. I defer to others about indexing other containers
>> - it looks like the JRTImageContainer depends on the JRTIndex
>> class, which does a lazy computation of the index, so it looks
>> like it would be some additional work to add that to the index.
>> But perhaps it's not even worth it - IIUC, there should only be
>> one JRTImageContainer, and probably a limited amount of
>> DirectoryContainers - the performance issue that I was observing
>> was that with many jars, each needed to do some work for every
>> call to list. We've now reduced O(O(JRTImageContainer = 1) +
>> O(DirectoryContainer = likely to be small) + O(ArchiveContainer =
>> possibly large)) per list() call to O(O(JRTImageContainer = 1) +
>> O(DirectoryContainer = likely to be small) + O(overlapping
>> ArchiveContainers, which is hopefully small))
>>
>>
>>
>> On Wed, Mar 27, 2019 at 1:47 PM Jonathan Gibbons
>> <jonathan.gibbons at oracle.com
>> <mailto:jonathan.gibbons at oracle.com>> wrote:
>>
>>
>> On 3/27/19 9:06 AM, Ron Shapiro wrote:
>> >
>> > > It's not clear to me why you maintain the list of
>> PathAndContainer
>> > as a sorted list, and hence why you need to use
>> java.util.List instead
>> > of javac's List.
>> > There are some tests that ensure that the classpath that
>> javac sees is
>> > equivalent to the classpath the JVM sees vis-a-vis
>> ordering. Without
>> > sorting, javac was seeing a different order, and I presume
>> that is
>> > because the JRTImageContainer and DirectoryContainers could
>> appear
>> > anywhere. Sorting solved that issue.
>>
>> OK, I think I understand the issue and why sorting fixes it.
>>
>> While some tests may check that "the classpath that javac
>> sees is
>> equivalent to the classpath the JVM sees vis-a-vis ordering",
>> that's not
>> the primary issue. The primary issue is that the new cache
>> must preserve
>> the order in which containers are searched on the classpath
>> (i.e just
>> within javac, separate from any part set by the JVM). While
>> this is not
>> so important when looking up classes in "well-behaved"
>> packages, it is
>> important when handling split packages and resources, where
>> the contents
>> in one container may hide the contents in another container.
>>
>> I think this is subtle enough that it is worth a comment in
>> the code,
>> explaining that it is important to preserve the search order
>> used in the
>> uncached Location path.
>>
>> -- Jon
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20190402/d42e8b18/attachment.html>
More information about the compiler-dev
mailing list