Module use cases versus descriptors, packaging, and configuration policies
David M. Lloyd
david.lloyd at redhat.com
Thu Sep 24 15:38:28 UTC 2015
Modules as a concept apply to many different use cases:
* Standalone applications (class path replacement) using static module
path modules
* Standalone applications for small/"micro" platforms
* Containers like Java EE and OSGi
* Various custom dynamic plugin systems on client and server
* The JDK itself
Every use case has different needs in terms of the way that modules are
named, located, utilized, built, and otherwise managed. Furthermore,
each use case varies in terms of the different phases of development
(build, test, run, package, deploy).
Because of this, I feel that a one-size-fits-all approach to module
descriptors and loading policies is not in the best interest of most
users. There is an alternative strategy that offers superior
applicability to most if not all of these use cases. I will describe
this strategy in pieces.
* Naming
Each use case has different naming rules. Static module path modules,
it is more or less agreed, are well-suited towards a dot-separated,
reverse-domain-name strategy like we use for packages. OSGi and Java EE
have looser naming requirements; other systems have other needs. Thus I
propose that the core module system support opaque String module
identifiers, with each module configuration being solely responsible for
any additional name validation rules which apply to it. This allows
each use case to be internally consistent, without hampering the overall
flexibility of the module system.
* Descriptors
The modules which are shipped with the JDK have a static, "locked"
dependency relationship. Static module path modules have relationships
that might not be known until run time. Java EE and OSGi modules may
have descriptions that are fully dynamically generated from a variety of
inputs, and custom plugin systems may have other strategies for defining
modules. Thus mandating a single internal or external descriptor format
is not a desirable strategy.
Instead this should be made flexible, with each module configuration
being solely responsible for the location and description of modules
that it contains, be it by name or by globally provided service
interface. This enables module systems to use an optimal strategy for
that use case.
* Packaging
Defining a packaging format for the static module path modules is
important, but such a packaging format is very unlikely to be applicable
to other systems. Dependencies expressed by such modules are
necessarily limited to other static modules, or the JDK itself; however
(to give just one example) in higher-level configurations like Java EE
or OSGi, a corresponding alternative class of dependencies is required
to express relative relationships between other modules within that
container, and other modules within other peer containers, in addition
to the basic static dependencies that refer to the static modules used
to boot the application. Module contents may be optimally described in
very different formats.
It should be possible for configurations to choose to reuse the static
module system format, or to utilize their own custom format, as best
applies to the configuration. This allows each module configuration to
make the choice that is best for that system.
* Effect on tooling
As I previously described, build tooling is not well-applied to run time
module graphs. If the build tooling was instead optimized towards
source artifacts, this would free it from being restricted to any one
module system as has been previously proposed.
The legacy (Java 8 and earlier) JDK build tool chain is actually very
nearly already able to support modular building. There are two primary
aspects that apply here: Annotation processing and compilation.
The API for annotation processing from javax.tools is actually
module-ready as it is: the JavaCompiler.CompilationTask can already
accept any Iterable of annotation processor instances, which can easily
be derived and loaded from installed, runnable static modules. No build
system (that I know of) adequately takes advantage of this ability today
though.
Compilation for a modular world has very similar needs as today's world;
Maven is a very good conceptual example of the closeness between the
two, in the way that artifacts are transitively resolved and wired up.
However, being dependent on the file manager, JavaCompiler is still
predicated on a flat class path. While potentially adequate for many
modular projects, the inherent lack of transitive isolation can cause a
potentially destructive disparity between compilation and run time - in
the best case, resulting in missing classes, but in the worst case,
resulting in confusing linkage errors and even unexpected behavior.
The technical crux of this issue appears to center around the process of
location of dependency class data; the file manager only has the
selection of locations from the StandardLocation gamut, which limits
dependency classes as coming from (essentially) either the class path or
the platform. To solve this problem, a more flexible location concept
is necessary, which allows the compilation task to (a) know (in an
abstract way) what artifact dependencies a given artifact has, (b) know
what the package import and export rules are for each dependency
specification, and (c) notify the file manager which of these is being
examined for the purposes of locating dependency content.
Adding this isolation to the compilation phase allows (among other
things) more correct and consistent behavior, especially in regards to
situations where more than one instance of the same package may occur in
the extended dependency graph of the target module, and it also enforces
more rigorous and accurate dependency specification (since the tendency
for "lucky" transitive dependencies to appear on the compile path will
be greatly reduced). Both of these problems have been observed in the
real world.
* Tying build dependencies and run time dependencies together
As I have implied in earlier emails, there is no single easy answer to
this problem. Run time and build time dependencies can not be identical
in a system that has both long term viability and multiple possible
distribution points. This is expressed in a wide variety of operating
system and language platforms that have existed in past decades to the
present.
However these systems have also demonstrated a relatively small number
of approaches that have proven to be viable solutions to this disparity
while ensuring some degree of fidelity between the build, package, and
run phases. I will describe the two that I think are most relevant and
likely to apply well to our situation.
The first solution is what I'll call "build on the host", which is
employed by BSD ports, Gentoo ebuilds, and similar systems. In this
system, the shared module repository consists of packages, which in turn
contains buildable source code bundles (typically from an upstream
source) and environment-specific patches. On installation, packages are
downloaded from the repository and built (and possibly unit-tested)
locally on the target system before being installed into the local
environment.
The second solution, binary installation, is employed by most Linux and
other OS distributions. The repository packages are (more or less)
images of what is to be installed on the local system. These packages
are typically built on a per-distribution basis from an upstream source.
Both of these solutions generally utilize a distribution-specific
version which is usually derived from the upstream version, but with a
local modifier which may reflect build number or the disposition of
local patches in various ways. This emphasizes the reality that, for a
given package, no single version scheme can flow all the way through
from source to build to package to run.
However, this reality doesn't preclude all forms of fidelity across
phases. Maven central, for example, consists of many thousands of
already well-versioned artifacts, with clear dependency information
which is already often suitable for modular building. Modular packaging
systems could easily reuse this build version information to produce
sensible distribution- and run-time versioning data. Some systems
could, in fact, distribute (license permitting) such artifacts directly;
otherwise, the Maven ecosystem provides a fairly complete set of tooling
(in the form of Sonatype's Nexus) for managing custom build artifacts,
which is another piece of the puzzle.
Maven dependency information, combined with introspection tools that are
aware of the target distribution contents, could easily be used to set
up a very good initial module descriptor in the format appropriate to
that distribution's intended installation target.
* Summary
These concepts allow the JPMS requirements to be met with a far greater
flexibility and capability than has previously been proposed. They
enable the module system itself to support advanced features which are
not necessarily directly exploited by the static module path module
configuration, but which then can be consumed and exploited by various
containers and other module-oriented systems. Examples include module
redefinition and unloading, configurable strategies for native library
loading, customized module content loaders, etc. They also allow a
variety of useful module ecosystems to form, targeting many use cases.
--
- DML
More information about the jpms-spec-experts
mailing list