Module use cases versus descriptors, packaging, and configuration policies

Thu Sep 24 15:38:28 UTC 2015

Modules as a concept apply to many different use cases:

* Standalone applications (class path replacement) using static module 
path modules
* Standalone applications for small/"micro" platforms
* Containers like Java EE and OSGi
* Various custom dynamic plugin systems on client and server
* The JDK itself

Every use case has different needs in terms of the way that modules are 
named, located, utilized, built, and otherwise managed.  Furthermore, 
each use case varies in terms of the different phases of development 
(build, test, run, package, deploy).

Because of this, I feel that a one-size-fits-all approach to module 
descriptors and loading policies is not in the best interest of most 
users.  There is an alternative strategy that offers superior 
applicability to most if not all of these use cases.  I will describe 
this strategy in pieces.

* Naming

Each use case has different naming rules.  Static module path modules, 
it is more or less agreed, are well-suited towards a dot-separated, 
reverse-domain-name strategy like we use for packages.  OSGi and Java EE 
have looser naming requirements; other systems have other needs.  Thus I 
propose that the core module system support opaque String module 
identifiers, with each module configuration being solely responsible for 
any additional name validation rules which apply to it.  This allows 
each use case to be internally consistent, without hampering the overall 
flexibility of the module system.

* Descriptors

The modules which are shipped with the JDK have a static, "locked" 
dependency relationship.  Static module path modules have relationships 
that might not be known until run time.  Java EE and OSGi modules may 
have descriptions that are fully dynamically generated from a variety of 
inputs, and custom plugin systems may have other strategies for defining 
modules.  Thus mandating a single internal or external descriptor format 
is not a desirable strategy.

Instead this should be made flexible, with each module configuration 
being solely responsible for the location and description of modules 
that it contains, be it by name or by globally provided service 
interface.  This enables module systems to use an optimal strategy for 
that use case.

* Packaging

Defining a packaging format for the static module path modules is 
important, but such a packaging format is very unlikely to be applicable 
to other systems.  Dependencies expressed by such modules are 
necessarily limited to other static modules, or the JDK itself; however 
(to give just one example) in higher-level configurations like Java EE 
or OSGi, a corresponding alternative class of dependencies is required 
to express relative relationships between other modules within that 
container, and other modules within other peer containers, in addition 
to the basic static dependencies that refer to the static modules used 
to boot the application.  Module contents may be optimally described in 
very different formats.

It should be possible for configurations to choose to reuse the static 
module system format, or to utilize their own custom format, as best 
applies to the configuration.  This allows each module configuration to 
make the choice that is best for that system.

* Effect on tooling

As I previously described, build tooling is not well-applied to run time 
module graphs.  If the build tooling was instead optimized towards 
source artifacts, this would free it from being restricted to any one 
module system as has been previously proposed.

The legacy (Java 8 and earlier) JDK build tool chain is actually very 
nearly already able to support modular building.  There are two primary 
aspects that apply here: Annotation processing and compilation.

The API for annotation processing from javax.tools is actually 
module-ready as it is: the JavaCompiler.CompilationTask can already 
accept any Iterable of annotation processor instances, which can easily 
be derived and loaded from installed, runnable static modules. No build 
system (that I know of) adequately takes advantage of this ability today 
though.

Compilation for a modular world has very similar needs as today's world; 
Maven is a very good conceptual example of the closeness between the 
two, in the way that artifacts are transitively resolved and wired up. 
However, being dependent on the file manager, JavaCompiler is still 
predicated on a flat class path.  While potentially adequate for many 
modular projects, the inherent lack of transitive isolation can cause a 
potentially destructive disparity between compilation and run time - in 
the best case, resulting in missing classes, but in the worst case, 
resulting in confusing linkage errors and even unexpected behavior.

The technical crux of this issue appears to center around the process of 
location of dependency class data; the file manager only has the 
selection of locations from the StandardLocation gamut, which limits 
dependency classes as coming from (essentially) either the class path or 
the platform.  To solve this problem, a more flexible location concept 
is necessary, which allows the compilation task to (a) know (in an 
abstract way) what artifact dependencies a given artifact has, (b) know 
what the package import and export rules are for each dependency 
specification, and (c) notify the file manager which of these is being 
examined for the purposes of locating dependency content.

Adding this isolation to the compilation phase allows (among other 
things) more correct and consistent behavior, especially in regards to 
situations where more than one instance of the same package may occur in 
the extended dependency graph of the target module, and it also enforces 
more rigorous and accurate dependency specification (since the tendency 
for "lucky" transitive dependencies to appear on the compile path will 
be greatly reduced).  Both of these problems have been observed in the 
real world.

* Tying build dependencies and run time dependencies together

As I have implied in earlier emails, there is no single easy answer to 
this problem.  Run time and build time dependencies can not be identical 
in a system that has both long term viability and multiple possible 
distribution points.  This is expressed in a wide variety of operating 
system and language platforms that have existed in past decades to the 
present.

However these systems have also demonstrated a relatively small number 
of approaches that have proven to be viable solutions to this disparity 
while ensuring some degree of fidelity between the build, package, and 
run phases.  I will describe the two that I think are most relevant and 
likely to apply well to our situation.

The first solution is what I'll call "build on the host", which is 
employed by BSD ports, Gentoo ebuilds, and similar systems.  In this 
system, the shared module repository consists of packages, which in turn 
contains buildable source code bundles (typically from an upstream 
source) and environment-specific patches.  On installation, packages are 
downloaded from the repository and built (and possibly unit-tested) 
locally on the target system before being installed into the local 
environment.

The second solution, binary installation, is employed by most Linux and 
other OS distributions.  The repository packages are (more or less) 
images of what is to be installed on the local system.  These packages 
are typically built on a per-distribution basis from an upstream source.

Both of these solutions generally utilize a distribution-specific 
version which is usually derived from the upstream version, but with a 
local modifier which may reflect build number or the disposition of 
local patches in various ways.  This emphasizes the reality that, for a 
given package, no single version scheme can flow all the way through 
from source to build to package to run.

However, this reality doesn't preclude all forms of fidelity across 
phases.  Maven central, for example, consists of many thousands of 
already well-versioned artifacts, with clear dependency information 
which is already often suitable for modular building.  Modular packaging 
systems could easily reuse this build version information to produce 
sensible distribution- and run-time versioning data.  Some systems 
could, in fact, distribute (license permitting) such artifacts directly; 
otherwise, the Maven ecosystem provides a fairly complete set of tooling 
(in the form of Sonatype's Nexus) for managing custom build artifacts, 
which is another piece of the puzzle.

Maven dependency information, combined with introspection tools that are 
aware of the target distribution contents, could easily be used to set 
up a very good initial module descriptor in the format appropriate to 
that distribution's intended installation target.

* Summary

These concepts allow the JPMS requirements to be met with a far greater 
flexibility and capability than has previously been proposed.  They 
enable the module system itself to support advanced features which are 
not necessarily directly exploited by the static module path module 
configuration, but which then can be consumed and exploited by various 
containers and other module-oriented systems.  Examples include module 
redefinition and unloading, configurable strategies for native library 
loading, customized module content loaders, etc.  They also allow a 
variety of useful module ecosystems to form, targeting many use cases.

-- 
- DML