Mutable modules

Fri May 20 15:20:26 UTC 2016

On 05/20/2016 09:12 AM, Alan Bateman wrote:
> On 18/05/2016 22:47, David M. Lloyd wrote:
>>
>> I mean in *our* current concept of a module, we can add/remove/modify
>> the contents of a module (its "class path") at run time.  It is up to
>> the user to ensure that doing so makes sense.
> I don't think I can relate to the use case. As you probably know then
> ZIP files have historically had their central directory mapped into
> memory. Removing or replacing a file that is memory mapped will likely
> lead to processes accessing the mapped file to crash (SIGBUS usually).
> So if users are really doing such hairy things they would need a lot of
> insight into what is running and whether the file is opened before
> taking this risk.

No, we don't support replacement of JAR files; more like we symbolically 
remove the resources and add overlays.  Internally, we generally explode 
JAR files for various reasons.  It has nothing to do with JAR files, 
more to do with the logical presence and absence of class files and 
resources.

>> Our modules each correspond to their own class loader: so far so good,
>> we can just have one Module per class loader.  Problem is that we
>> support circularity, and also we support dependencies that go across
>> module systems with isolated namespaces (basically, our module loaders
>> are a higher order of the exact same concept of class loaders).
> If there are cyclic relationships between your modules then it will be
> problematic. Do you see much of this? If you've read Alex's JavaOne
> slides then you'll know that some of us like Kirk Knoernschild's book on
> Java Application Architecture and section "4.4 Cyclic Dependencies - the
> Death Knell" where he poses the question "Are Cycles Always Bad?". I
> don't want to say too much on this topic here as it is listed as an open
> issue on the JSR issues list.

We use circular dependencies in both our static module layouts and also 
in our dynamic deployment system.  I don't think we have a clear path 
out if we can't support them; it will certainly be a difficult situation.

>> Our modules support specifications including the content of the module
>> ("resource loaders") and the dependencies of the module. At run time,
>> custom ModuleLoader implementations can change the resource loader
>> list and/or the dependency list at any time, causing the module to be
>> relinked on the spot; the most useful aspect of this is the ability to
>> incrementally deploy applications which may include circular
>> dependencies.
> Aside from cycles then what other use-cases do you have here? I read
> "change the resource loader list" to mean that the set of resources in
> the module changes, which is a bit weird if those resources are class
> files that have already been loaded. Maybe there is dynamic code
> generation with class bytes generated to the file system or into
> somewhere virtual? Or maybe these resources are something else, data
> files? I'm just trying to understand what you mean as we are using
> differently terminology.

Within our module concept, the "class path" is a sort of colloquial term 
which refers to the series of resource loaders used to locate classes 
and resources within each module.  Functionally it's somewhat analogous 
to the URLClassPath concept in the JDK, in the way that resources are 
sought (i.e. by a linear search from start to end of the list).

This is separate from the module dependency list, which, when combined 
with the resource loader list, is used to construct an index by path 
(which is a superset of packages that includes not just classes but also 
resources) which refers to dependencies and/or internal resources.

When the resource list is changed, all future lookups for classes and 
resources will use the new index.  If there are already classes loaded 
from the previous list, and those classes are sufficiently incompatible 
with the new code, obviously this will result in errors; however, this 
is usually not the case when (say) making incremental changes during 
development.  This is generally an edge case, but it is one which we 
presently support.

Changing the dependency list has effects that are somewhat similar. 
Existing loaded classes which have already linked against classes of the 
previous dependency may malfunction when doing this.  But in the hot 
deployment situation, most of the time these changes are additive, so 
the effect is generally to enable previously unlinked code to become 
linkable.  This could happen, for example, when deploying a JAR some 
time after a first JAR was deployed, which resolves some missing 
dependencies in the first JAR.

>> We also support delegating "fallback" class loading decisions to
>> outside suppliers for dynamic class loading behavior (this was done to
>> support a dynamic OSGi environment).  The ongoing integrity of the
>> system is up to the party doing the relinking (the EE deployer or the
>> OSGi resolver); most of the time it can reason about what is "safe"
>> and what might cause system breakage (but still might be useful to do
>> anyway).  These are the features we can't seem to support under
>> Jigsaw, architecturally speaking.
> This sounds like class loader delegation to resolve types that are not
> in the module.

Exactly.  Our OSGi people have told me in the past that OSGi that can't 
function to spec without this ability.

>> Specifically this includes (but is not limited to) changing the
>> package set associated with a JDK module at run time, something that
>> this native code block makes impossible.  Also the ability to
>> dynamically change module dependencies is an essential ingredient to
>> making this work.
> Suppose that module m has package p and p.C has been loaded. Are you
> saying that you can drop package p from the module?

Yes - although normally you would only drop p if you knew specifically 
that p.C *hadn't* been loaded, for obvious reasons.  It's probably more 
common to *add* than to drop.

> As things currently stand in JDK 9 then packages may be added to modules
> at runtime, the main use case is the dynamic proxy to a public interface
> in a non-exported packages. So I can relate to adding packages for code
> gen cases, I'm less sure about a module starting out as an XML API and
> suddenly changing into a JDBC driver. Do you really mean the same module
> instance?

A more apt example might be to update a part of a module which hasn't 
yet been loaded.

>> In my view, architecturally speaking, most of the constraints imposed
>> by the core module framework should be layer policy.  If the system's
>> core module layer wants to maintain strict, static integrity, name
>> constraints, version syntax and semantics, etc., that's fine, but why
>> should all modules everywhere be forced to the same constraints?
> Using module names as an example, then it should be possible to develop
> a module that is deployed on the application module path or instantiated
> in a layer of modules that a container creates. The author of the module
> (that chooses the name) isn't going to know in advance how the module is
> deployed. I'm not even sure how such a module could be compiled or how
> anyone could depend on it when the characters or format can vary like
> this. I see there is an issue on the JSR issues list so I don't want to
> say any more on this topic.

The idea of a "module" that I keep referring to is one that is described 
in the JSR requirements: "... named, self-describing program components 
consisting of code and data".  A plain old JAR file meets this 
definition just as readily as the current Jigsaw concept.  Java 8 javac 
can compile things that more than meet the definition of "module".  The 
important and relevant part of a module is the ABI it exposes, not its 
name or even its internal structure (which should be encapsulated in 
numerous senses of the word).  This becomes even more clear when you are 
assembling an environment consisting of hundreds of artifacts from 
hundreds of authors.  We package many, many artifacts whose authors 
never gave a thought to modularity at all.

The responsibility of assigning a name to an ABI and behavior has to lie 
with the environment assembler; it just doesn't make sense in the hands 
of the original author, who is necessarily concerned only with the 
parameters of their problem space, and not with that of any greater 
ecosystem in which their project might find itself.  This is the very 
definition of encapsulation.

For an example of why we may need flexible naming, I can deploy an 
application into a Java EE container called "my-cool-application.ear". 
In this case, I might expect my module to be named 
"my-cool-application.ear" or perhaps "my-cool-application".  I might 
expect nested JARs to correspond to modules named by their relative JAR 
locations (including "/" separators).  Really the only constraint is the 
validity of the name on the filesystem.  Java EE certainly places no 
limitations on this today.

I might name my modules based on Maven group and artifact IDs, which 
have different syntax requirements, or by OSGi bundle name.  You get the 
idea.

The point is that, yes, you are correct: you *don't* know how a module 
is going to be deployed; not before Java 9 and likely not after it 
either.  It doesn't even make any sense to define the module name inside 
the module when the name is going to be 100% dependent on the 
environment in which the module is used.

You can't, on one hand, define a universal namespace and syntax for 
modules and their versions in the JDK or establish hard constraints on 
layer and module graph structure, and on the other hand expect other 
module systems with differing existing constraints to unify on the JDK 
module system.  You're basically cutting these systems off at the knees 
and forcing them to reinvent everything, unless you completely 
coincidentally have a system that already conforms to this structure (if 
so, you are either very fortunate or maybe starting off in a rather 
privileged position).

>> There is no way that existing containers and class loading
>> environments (other than, apparently, WebLogic) can conform to
>> Jigsaw's constraints without losing functionality (and I'm trying hard
>> to find ways to make it work).  This is where most of my raised issues
>> are coming from.
> The module system imposes surprising few constraints.  If you are using
> your own class loaders then the delegation needs to respect module
> readability, something that should not be controversial.

See earlier posts about the controversy of making "public" no longer be 
"public".  The email thread is still unresolved and unanswered.

> However it is possible that you are still at the starting line because
> your have a dependency graph with cycles and/or modules that don't have
> names that can be expressed as a Java identifier, is that right?

Yes, and also isolated module namespaces which nevertheless need to link 
with one another.  Also version syntax and schemes which are not 
compatible with the Jigsaw scheme.  But these issues are all raised in 
the document.

>> All these problems seem surmountable to me, but it becomes
>> substantially more difficult when it is necessary to report all of a
>> module's packages to the module when it is created, since this
>> information is now not easily changed.
> I'm surprised that this is an issue as module membership is critical to
> access control.

Sure, I get that, but it's only critical to *new* access control rules 
that were introduced with Jigsaw.  The classical rules wherein 
public=public that we have relied on this past decade would ease this 
situation substantially, since in this case the only reason module 
information would pass to the JVM would be for diagnostics; since each 
class has a module membership, the JVM theoretically would already have 
access to everything it needs for this purpose.

As long as we're talking access control though... the idea of replacing 
this idea with "friend packages" and using them to selectively expand 
package-private access has never been resolved or even seriously 
discussed as far as I can see.  It's been brought up several times and 
basically ignored.  But this idea would allow not only modules but *all* 
Java code to take advantage of better security by removing public 
qualifiers from things that are not public instead of relying on special 
packages (because package identity is defined by class loader and 
package name, meaning modules are not central to the concept though they 
can easily take advantage of it), which makes far more sense to me at 
least (in particular, within the JDK code itself - all those "shared 
secrets" classes!) and imposes far less risk on the access control model 
by keeping it homogeneous instead of making it bi-layered.  I would 
deeply wish to resurrect this discussion at some point.  Readability and 
exports have been nothing but a problem for users as far as I can see; 
the current security model certainly isn't doing me any favors.

-- 
- DML