--module-source-path option implementation in javac

Jonathan Gibbons jonathan.gibbons at oracle.com
Thu Oct 20 23:22:16 UTC 2016


Currently, the compiler expects sources on a source path (i.e. 
--source-path) to be organized in a specific hierarchy corresponding to 
packages and classes.   For the module source path, this is extended to 
include the enclosing module.  Although we experimented with requiring 
the module directory to be immediately enclosing the package directory, 
this proved to be too onerous in practice, and we weakened the 
requirement to be that the sources for a module should be in one or more 
directories below a directory named for the module.  The use case is for 
complex projects that have variant forms of the source code for a 
module, such as baseline shared code, and OS-specific variants, such as 
you find in OpenJDK itself.

This means that the each element on the module source path is composed 
of 3 parts:
     1. one or more paths identify the paths to the directories 
containing module directories
     2. a directory named for the module
     3. one or more paths identifying the subdirectories containing 
roots of the package hierarchies for the module's classes.

Thus, the '*' you see in the module path should not be construed as a 
wildcard, so much as it is a token to indicate where in the overall path 
the module name is expected to appear.

The 3rd component may be empty, in which case you can drop the second 
component as well; it will be inferred to be at the end of the given paths.

Here are some examples of how this can be used.

If your project has a couple of modules m1, m2, the simplest 
organization is to have a src/ directory, and put the source for m1 in 
src/m1/ and the source for m2 in src/m2/.  If you do that, your module 
source path could be like
     --module-source-path /Users/Me/MyProject/src
The compiler will be able to find everything it needs in this simple 
case under the src/ directory.  If it needs to find class p1.C1 in m1 it 
can look in /Users/Me/MyProject/src/m1/p1/C1.java, etc.

Now suppose the project is a bit more complicated, and each module has 
some OS-specific code and some OS-independent code.  You might want to 
put the Linux code in src/m1/linux, the Windows code in src/m1/windows, 
and the shared code in src/m1/shared, and ditto for m2.   You can tell 
the compiler about that using a mdule source path like one of these:
     --module-source-path /Users/Me/MyProject/src/*/{linux,shared}
     --module-source-path /Users/Me/MyProject/src/*/{windows,shared}

Now, maybe the project gets even bigger, and you start generating some 
of the code for each module.  You generate the code for m1 in 
build/gensrc/m1, and the code for m2 in build/gensrc/m2.   You can 
describe that too:
     --module-source-path 
/Users/Me/MyProject/src/*/{linux,shared}:/Users/Me/MyProject/build/gensrc/*

At this point, it is important to realize that * is more than a 
wildcard. It stands for the same module name in all the places it 
appears.    In other words, the source for m1 will be found in
/Users/Me/MyProject/src/m1/{linux,shared}:/Users/Me/MyProject/build/gensrc/m1
and the source for m2 will likewise be found in
/Users/Me/MyProject/src/m2/{linux,shared}:/Users/Me/MyProject/build/gensrc/m2

Yes, this is complicated, but so is the use case.   I go back to saying 
that the simple case is simple:   if you arrange the code in your 
modules such that you put the code for a module in an enclosing 
directory named for the module, the module source path becomes more like 
a simple path, as in
     --module-source-path /Users/Me/MyProject/src
or if it is in multiple projects, use
     --module-source-path 
/Users/Me/MyProject/src:/Users/Me/MyOtherProject/src

The requirement that the source must be in/under a directory named for 
the module is a natural extension of the existing naming conventions for 
the directories and files that contain packages and classes.


Addition responses inline.



On 10/18/2016 12:09 AM, Eugene Zhuravlev wrote:
> Hi dev. list members,
>
> We at JetBrains are working on jigsaw-related javac features support 
> in IntelliJ IDEA. Namely, the --module-source-path parameter.
> This option is important when multiple modules are compiled at the 
> same time. While the IDE compiles modules one-by-one, there are 
> certain situations where we have to use multi-module compilation. For 
> example, the case when module-info files for different modules 
> reference each other:
>
> module-info.java in module A:
> module a {
>   exports a to b;
> }
>
>
> module-info.java in module B:
> module b {
>   requires a;
> }
>
> Here we have to compile sources for module A and module B together in 
> one compile session and use --module-source-path parameter so that 
> javac is able to resolve both module descriptors.
>
> My recent investigations show that current javac implementation 
> assumes certain disk layout for the source files that form a module. 
> This leads to restrictions on the --module-source-path argument value. 
> Currently this value is a list of paths where every path may 
> optionally a "*" wildcard denoting any directory on particular file 
> system level. The code responsible for --module-source-path option 
> support is located in 
> com.sun.tools.javac.file.Locations.ModuleSourcePathLocationHandler.init()
>
> The code here works differently depending on whether the path element 
> contains an optional '*' wildcard or not. If the path contains the 
> wildcard, the directory name matching this wildcard will be assumed 
> equal to module name (which is another problem) and the path to the 
> module descriptor file is configured correctly.
> If there is no wildcard in the path, the path is not used "as is", but 
> instead its direct sub-directories are analyzed and used as roots 
> where module-info.java can be found. The latter looks more like a bug 
> than intended behavior.

The different behavior is simply the compiler treating a path element 
without a '*' as equivalent to the path with '*' appended. In other 
words, a module source path of
     /Users/Me/MyProject/src
is equivalent to one like this
     /Users/Me/MyProject/src/*

So yes, the behavior you are seeing is intentional, and not a bug.

>
> From the IDE's point of view there is no need to use "*" wildcards, 
> since the "too much typing" is not an issue for the program. Another 
> reason is that the usage of wildcards is possible only for certain 
> layouts of module A and B sources. In general case, when modules 
> contain several source roots on different file system levels, the 
> usage of wildcards is not possible.

'*' is not a wildcard, and it is not there for brevity.  It is there to 
help the compiler coordinate the different directories containing the 
source code for the module. The '*' character could equally have been 
any other token, like '%' or "MODULE-NAME". It is not a shell character, 
nor is it a file system character, it is simply a special token in the 
syntax for complex module source paths.


> So enumeration of absolute paths to source roots is the only option 
> available for the IDE. Due to the problem mentioned above this does 
> not work either. The IDE could have created the paths with wildcards 
> and this would have worked for some project layouts, but the 
> assumption that the directory name is equal to module name looks too 
> strict and should not be true for many real-life project layouts.

Currently, if you are compiling the source code for many modules at the 
same time, it is a requirement that there be a directory named for the 
module in the path. This is to partly to assist the compiler when 
looking up references to classes in other modules, and partly to 
coordinate the directories when the source code for a single module is 
spread across many directories.

>
> So the questions are:
> - Are there any changes planned for the command line interface to 
> address these issues?

The simple answer is no.  We have been discussing -when- to use the 
source path, but nothing regarding the syntax of the module source path.


> - If current command line behavior is correct and intended for some 
> certain situations only, we would kindly ask to consider making module 
> source path configuration more flexible via the compiler tooling API, 
> which is used by IDEs. However, keeping command line interface and 
> tooling API consistent is a good idea too.

It is certainly the case that apart from a slight name change, 
--module-source-path is still the same as originally designed, and it is 
the case that many other new command line options have evolved since 
then.  The most obvious suggestion would be to allow a command line 
option and API to set an explicit package-oriented path for each module 
individually.   Staying clear of the naming bike-shed for now, this could be
         --new-module-source-path MODULE=PATH
             e.g.  --new-module-source-path 
m1=/Users/Me/MyProject/src/m1 --new-module-source-path 
m2=/Users/Me/MyProject/src/m2
with corresponding API
         public void setLocationForModule(Location location, Name 
moduleName, List<Path> paths) throws IOException
             e.g. 
fileManager.setLocationForModule(StandardLocations.MODULE_SOURCE_PATH, 
"m1", List.of(Paths.get("/Users/Me/MyProject/src/m1")));
             e.g. 
fileManager.setLocationForModule(StandardLocations.MODULE_SOURCE_PATH, 
"m2", List.of(Paths.get("/Users/Me/MyProject/src/m2")));


But that is just an off-the-wall spur-of-the-moment suggestion.

> Thanks in advance for any comments on the problem,
>

-- Jon


More information about the jigsaw-dev mailing list