How to name modules, automatic and otherwise
mark.reinhold at oracle.com
mark.reinhold at oracle.com
Thu Feb 16 16:48:27 UTC 2017
This note is in reply to the concerns about automatic modules raised by
Robert Scholte and Brian Fox [1], and by Stephen Colebourne and others
[2]. I've collected my conclusions here rather than in separate messages
because there are several distinct yet intertwined issues.
Summary:
- Module names should not include Maven group identifiers, because
modules are more abstract than the artifacts that define them.
- Module names should use the reverse-domain-name-prefix convention
or, preferably, the project-name-prefix convention.
- We should not abandon automatic modules, since they are a key tool
for migration and adoption.
- We can address the problems of automatic modules with two fairly
minor technical enhancements.
If any of these points strikes you as controversial, please read on!
* * *
Module names should not include Maven group identifiers, as Robert
Scholte and Brian Fox suggest [1], even for modules declared explicitly
in `module-info.java` files. Modules in JPMS are a construct of the Java
programming language, implemented in both the compiler and the virtual
machine. As such, they are more abstract entities than the artifacts
that define them. This distinction is useful, both conceptually and
practically, hence module names should remain more abstract.
This distinction is useful conceptually because it makes it easier, as
we read source code, to think clearly about the nature of a module. We
can reason about a module's dependences, exports, services, and so forth
without cluttering our minds with the details of group identifiers and
version constraints. Today, e.g., we can write, and read:
module foo.data {
exports com.bar.foo.data;
requires hibernate.core;
requires hibernate.jcache;
requires hibernate.validator;
}
If we were to extend the syntax of module names to include group
identifiers, and encourage people to use them, then we'd be faced with
something much more verbose:
module com.bar:foo.data {
exports com.bar.foo.data;
requires org.hibernate:hibernate.core;
requires org.hibernate:hibernate.jcache;
requires org.hibernate:hibernate.validator;
}
Group identifiers make perfect sense in the context of a build system
such as Maven, where they bring necessary structure to the names of the
millions of artifacts available across different repositories. Such
structure is superfluous and distracting in the context of a module
system, where the number of relevant modules in any particular situation
is more likely to be in the tens, or hundreds, or (rarely) thousands.
All else being equal, simpler names are better.
At a practical level, the distinction between modules and artifacts is
useful because it leaves the entire problem of artifact selection to the
build system. This allows us to switch from one artifact to another
simply by editing a `pom.xml` file to adjust a version constraint or a
group identifier; if module names included group identifiers then we'd
also have to edit the `module-info.java` file. This flexibility can be
helpful if, e.g., a project is forked and a new module with the same name
and artifact identifier is published under a different group identifier.
We long ago decided not to do version selection in the module system,
which surprised some people but has worked out fairly well. We should
treat group selection in the same manner.
Another practical benefit of the module/artifact distinction is that it
keeps the module system independent of any particular build system, so
that build systems can continue to improve and evolve independently over
time. Maven-style coordinates are the most popular way to name artifacts
in repositories today, but that might not be true ten years from now. It
would be unwise to adopt Maven's naming convention for module names just
because it's popular now, and doubly so to bake Maven's group-identifier
concept into the Java programming language.
* * *
If module names don't include group identifiers, then how should modules
be named? What advice should we give to someone who's creating a new
module from scratch, or modularizing an existing component by writing a
`module-info.java` file for it? (Continue to set aside, for the moment,
the problems of automatic modules.)
In structuring any particular space of names we must balance (at least)
three fundamental tensions: We want names that are long enough to be
descriptive, short enough to be memorable, and unique enough to avoid
needless conflicts.
If you control all of the modules upon which your module depends, and
all of the modules that depend upon it, then you can of course name your
module whatever you want, and change its name at any time. If, however,
you're going to publish your module for use by others -- whether just
within your own organization or to a global repository such as Maven
Central -- then you should take more care. There are two well-known
ways to go about this.
- Choose module names that start with the reversed form of an Internet
domain name that you control, or are at least associated with. The
Java Language Specification has long suggested this convention as a
way to minimize conflicts amongst package names, and it has been
widely though not universally adopted for that purpose.
- Choose module names that start with the name of your project or
product. Module (and package) names that start with reversed domain
names are less likely to conflict but they're unnecessarily verbose,
they start with the least-important information (e.g., `com`, `org`,
or `net`), and they don't read well after exogenous changes such as
open-source donations or corporate acquisitions (e.g., `com.sun.*`).
The reversed domain-name approach was sensible in the early days of Java,
before we had development tools sophisticated enough to help us deal with
the occasional conflict. We have such tools now, so going forward the
superior readability of short module and package names that start with
project or product names is preferable to the onerous verbosity of those
that start with reversed domain names.
This advice will strike some readers as controversial. I respect those
who will choose, for the sake of tradition or an abundance of caution, to
use the reversed domain-name convention for module names and continue to
use that convention for package names. I do know, however, of at least
one major, well-known project whose developers intend to adopt the
project-name-prefix convention for their module names.
* * *
If module names don't include group identifiers, then how should automatic
modules be named? Or are automatic modules so troublesome that we should
remove them from the design?
To answer the second question first: It would be a tragic shame to drop
automatic modules, since otherwise top-down migration is impossible if
you're not willing to modify artifacts that you don't maintain, which
most people (quite sensibly) aren't. Even if you limit your use of
automatic modules to closed systems, as Stephen Colebourne suggests [2],
they're still of significant value. Let's see if we can rescue them.
The present algorithm for naming automatic modules has two problems:
(A) Conflicts are possible across large artifact repositories, since
the name of an automatic module is computed from the name of the
artifact that defines it. [1]
(B) It's risky to publish a module that `requires` some other module
that has not yet been modularized, and hence must be used as an
automatic module. If the maintainer of that module later chooses
an explicit name different from the automatic name then you must
publish a new version of your module with an updated `requires`
directive. [2]
As to (A), yes, conflicts exist, though it's worth observing that many of
the conflicts in the Maven Central data are due to poorly-chosen artifact
names: `parent`, `library`, `core`, and `common` top the list, which then
falls off in a long-tail distribution. When conflicts are detected then
build tools can rename artifacts either automatically or, preferably, to
user-specified names that map to sensible automatic-module names. If
renaming artifacts in the filesystem proves impractical then we could
extend the syntax of the `--module-path` option to allow a module name
to be specified for each specifically-named artifact, though strictly
speaking that would be a feature of the JDK rather than JPMS.
We can address (B) by enabling the maintainers of existing components to
specify the module names that should be given to their components when
used as automatic modules, without having to write `module-info.java`
files. This can be done very simply, with a single new JAR-file manifest
`Module-Name` attribute, as first suggested almost a year ago [3].
If we add this one feature then the maintainer of an existing component
that, e.g., must still build and run on JDK 7 can choose a module name
for that component, record it in the manifest by adding a few lines to
the `pom.xml` file, and tell users that they can use it as an automatic
module on JDK 9 without fear that the module name will change when the
component is properly modularized some years from now. The actual change
to the component is small and low-risk, so it can reasonably be done in
a patch release. There's no need to write a `module-info.java` file,
and in fact doing so may be inadvisable at this point if the component
depends on other components that have not yet been given module names.
This approach for (B) does add one more (optional) step to the migration
path, but it will hopefully lead to a larger number of explicitly-named
modules in the world -- and in particular in Maven Central -- sooner
rather than later.
- Mark
[1] http://mail.openjdk.java.net/pipermail/jpms-spec-experts/2017-January/000537.html
[2] http://mail.openjdk.java.net/pipermail/jigsaw-dev/2017-January/011106.html
[3] http://openjdk.java.net/projects/jigsaw/spec/issues/#ModuleNameInManifest
More information about the jpms-spec-experts
mailing list