Why package deps won't work (Was: Re: Converting plain JARs to Java modules)

Mon Nov 14 18:33:45 PST 2011

Hi David,

More than happy to talk about reality, though Peter already was doing
so. Nothing he said was theoretical, it came directly from his
experiences and the experiences of developers who have been using OSGi
for around 12 years.

Before digging into the points in your message, I need to address the
subject line "Why package deps won't work". Package dependencies will
work and they do work, as is proven every day by the large number of
applications running on OSGi. To claim otherwise would be wilfully
ignorant, so I will assume you are being hyperbolic and really meant
to assert that module deps are just better than package deps.
Therefore I intend to argue on that basis and claim that package deps
are better than module deps; note that I don't claim that module
dependencies "won't work" because JBoss Modules is a proof point that
they do.

On Mon, Nov 14, 2011 at 5:05 PM, David M. Lloyd <david.lloyd at redhat.com> wrote:
> The explanation is quite simple, really - each point can be pretty much
> wiped out by a heavy dose of reality.
>
> 1. "M2P is leverages the Java type system unlike m2m that must introduce new
> namespaces outside the Java type system." - this is just fluffy
> buzzwordology.  Considering packages part of the Java type system is a
> pretty liberal interpretation of the term "type system" as they're basically
> just arbitrary name spaces.  That said, there's nothing particularly "new"
> about namespacing modules.  JARs have names. Projects have names.  Maven
> uses its own namespaces for artifacts.  Even the primitive JDK extension
> mechanism uses symbolic names.

>From Wikipedia (http://en.wikipedia.org/wiki/Type_system): "A type
system associates a type with each computed value ...  the aim is to
prevent operations expecting a certain kind of value being used with
values for which that operation does not make sense."

The following will not compile:

	import java.awt.List;
	// ...
	List list = new ArrayList();
	list.iterator(); // etc

whereas the following will:

	import java.util.List;
	// ...
	List list = new ArrayList();
	list.iterator(); // etc

Package names are part of the type system because using an incorrect
package name in Java source can result in type errors during
compilation, and because the type and available operations associated
with each value varies with the package name used.

In contrast, it is impossible to obtain a type error at compilation by
incorrectly naming a JAR, project, Maven artefact etc, because none of
those things are ever referenced in Java source code.

> Even a simple convention of the parent-most package name for the name of a
> module is very simple to grasp and is in fact the solution we've used quite
> effectively thus far.  Thus implying that module names are some new alien
> concept is really not valid.

The claim is not that they are a new alien concept. The claim is that
they are not part of the Java language and type system, and as a
result of this (along with other problems) are less useful for
depending upon than packages.

> 2. "M2P can be used to break the transitive dependency chain, m2m suffers of
> excessive coupling." - How about some facts to back this up?  I've found
> "m2m" coupling to be just right.  In JBoss Modules, we do not export
> transitive dependencies by default.  This results in a very simple and clean
> dependency graph between modules.  Using package dependencies results in a
> *far more complex* dependency graph, because you need edges for every
> package even though *most of the time you want the whole module anyway*.
>  Again, real world here.

Peter is definitely also talking about the real world and so am I. In
an m2m dependency model you cannot avoid transitive dependencies,
whether you expose them to consumers or not. An importer of a module
must be assumed to depend on the whole functionality of that module
whether or not that is actually the case, and therefore all of the
transitive dependencies must be present at runtime. In an m2p world we
have the opportunity to split modules and break apart dependency
graphs, because the unit of coupling is more granular.

> If we're just going to throw dogma around, I'll put it the other way: m2p is
> a design error by the OSGi spec designers which has since been embraced as a
> religion.

Pejorative accusations about "dogma" or "religion" have no place in a
technical discussion. They are also untrue. All of the practising OSGi
developers I know have arrived at their support for package
dependencies as a result of real world experience, not because of some
willingness to bow down to the Almighty CPEG. I don't know of any
practising OSGi developer who has used both Require-Bundle (m2m) and
Import-Package (m2p) and actually prefers the former.

I do know several who started as strong supporters of Require-Bundle
and switched to being supporters of Import-Package, not because of
Peter's wrathful pontificating but because they encountered the
specific problems that he described and found that they were fixed by
using Import-Package, and indeed that everything worked so much more
cleanly that way. I'm in this camp myself, and so was Jeff McAffer
before he went over to the Dark Side (Microsoft).

> It offers no significant benefit, other than a couple of edge
> cases which are frankly just as well handled by m2m simply by adding package
> filters.  Which, by the way, I haven't seen the need to do yet in our 200+
> module environment, but which we do have the capability to do.

Substitution, module refactoring and transitive decoupling are hardly
edge cases. However I can see how these issues might not yet have come
to the fore in a module system designed for a single product with a
small number of modules, and where that product has not yet been
through the mill of multiple version evolutions.

> M2P is a solution just itching for a problem.

This is also not a very useful statement, I assure you that the
problem came before the solution. Better to show why you think the
problem is invalid or should have been solved differently.

> But you're going to have a
> tough time convincing me that users *want* to have to use special tooling
> because they want to depend on a module which has too many packages to list
> out by hand.  And before you cry "wildcards", be sure to consider that some
> modules use package names which are subordinate to other modules' packages,
> which is a perfectly normal and allowable scenario.  Using wildcards for
> package matching could cause amazing levels of havoc.

OSGi never uses wildcards at runtime, and tools such as bnd do not
need wildcards in order to express package-level dependencies. They
extract the set of packages that were actually used by the code, all
of which are available in the class files. This is possible because
packages are part of the type system (see first point above).

So it's not that I don't want to list dependencies by hand, rather I
only want to do it once. I am already forced to do it in the import
statements of my Java sources. If I had to repeat that dependency
information -- whether in m2p or m2m form -- then I would run the risk
of it getting out of step with the real dependencies.

> Using package dependencies means you either must have a master package index
> for linking, or you need a resolver which has to have analyzed every module
> you ever plan to load.  Otherwise, O(1) loading of modules is impossible,
> which is absolutely 100% a deal-breaker for JDK modules which must be
> incrementally installable.  And it forbids having packages with the same
> name in more than one JAR without bringing run-time versioning into the
> fold, which is a terrible, terrible can of worms.

Could you please explain why O(1) is the only acceptable complexity
for installing modules. OSGi does indeed support incremental install
and while I accept it is probably not O(1) for each module, it would
likely be no more than O(N), though I haven't done the maths yet to
prove this. Bear in mind that in practice, for small N, the constant
factors can result in O(1) being more expensive than O(N). I have seen
OSGi used with many thousands of modules, so unless you have some data
and a use-case showings package-level resolution as unacceptably slow,
your concern just sounds like premature optimisation.

There are two reasons to have a packages with the same name in more
than one JAR. The first is a situation called split packages, and it
is highly undesirable because it causes the runtime model of the
package to diverge from the compile-time model, and therefore things
like package-private types and members stop working correctly. For
this reason, OSGi's m2p imports support depending only upon a single
exporter of a particular package, i.e. we do not aggregate all exports
of that package.

Unfortunately split packages are sometimes unavoidable in legacy code
that cannot be refactored, e.g. the JDK. To support such scenarios
OSGi has Require-Bundle, i.e. m2m. This does not negate the problems
associated with m2m, it is simply a trade-off that we face with poorly
factored legacy code.

The second reason for multiple packages with the same name is when you
explicitly want to install multiple versions of a library/API and have
them all available within the same runtime. I wouldn't call this a
"can of worms" exactly because it can be done without too much
trouble, though for the sake of a simple life I personally avoid this
situation unless it's necessary.

> Finally it should be perfectly clear to anyone who has read the original
> requirements document that nothing in this module system should prevent OSGi
> from functioning as it is, so there is absolutely no reason to assume that
> any OSGi implementation is so threatened - especially if m2p linking is as
> superior as has been expressed.  Our module system (which is conceptually
> similar to Jigsaw in many regards) in fact does support our OSGi
> implementation quite effectively without itself implementing OSGi's
> package-to-package resolution (which like I said throws O(1) out the
> window).

I agree that Jigsaw's existence doesn't threaten OSGi's, so long as
Java 8 doesn't actually break OSGi (and if it did so, it would
likewise break many other applications and could not be considered
backwards compatible with Java 7).  The two can interoperate through
m2m-type dependencies. Tim Ellison started Project Penrose for the
purpose of investigating, testing and deepening this collaboration.

Neverthless, the point that I believe Glyn was making is the
following. We accept that m2m dependencies are probably required for
the JDK, which implies a module system like Jigsaw or
OSGi/Require-Bundle rather than OSGi/Import-Package. However is it
intended to be used for application modularisation as well? This is of
course a question for the Jigsaw team rather than you, David.

As a result of experience in developing and evolving large real-world
applications using a module system that supports BOTH m2m and m2p
dependencies, I believe it would be very unfortunate if a module
system that supports ONLY m2m were to become widely used in the
application space... not because OSGi can't handle the competition,
but because those applications will be fragile and hard to evolve.

My question for you David is as follows. I understand that you prefer
module dependencies, but do you believe that package dependencies have
no value whatsoever and therefore should not be available to
application developers in the Java 8 module system? If so, why did Red
Hat create an OSGi implementation?

Kind regards
Neil

>
> On 11/14/2011 01:49 AM, Glyn Normington wrote:
>>
>> I look forward to David's elaboration of why he thinks "using packages as
>> a dependency unit is a terrible idea" to balance Peter's clear explanation
>> of the benefits of m2p.
>>
>> Meanwhile, it's worth noting that, according to the requirements document,
>> Jigsaw is aimed at platform modularisation and the platform being
>> modularised has some non-optimal division of types across packages (see the
>> package subsets requirement) which favour m2m dependencies. (Note that
>> Apache Harmony was developed with modularity in mind and was able to exploit
>> m2p, so platform modularisation per se needn't be limited to m2m.)
>>
>> So if Jigsaw excludes m2p, it will then be applicable to certain kinds of
>> legacy code modularisation and less applicable to new module development and
>> modularisation of existing code whose division into packages suits m2p. IIRC
>> this was the original positioning of Jigsaw: for use primarily within the
>> OpenJDK codebase and only exposed for application use because it was too
>> inconvenient to hide it.
>>
>> Regards,
>> Glyn
>>
>> On 12 Nov 2011, at 11:59, Peter Kriens wrote:
>>
>>> Neither my wrath, nor the fact that I rarely if ever get angry is
>>> relevant in this discussion ... This is a technical argument that are
>>> solvable by technical people that share the same goals. I prefer package
>>> dependencies because they address the excessive type coupling problem in
>>> object oriented systems, not because they're part of OSGi. Let me argue my
>>> case.
>>>
>>> Module-to-package dependencies (m2p) are preferable over module-to-module
>>> dependencies (m2m) for many reasons but these are the most important
>>> reasons:
>>>
>>> M2P is leverages the Java type system unlike m2m that must introduce new
>>> namespaces outside the Java type system.
>>> M2P can be used to break the transitive dependency chain, m2m suffers of
>>> excessive coupling
>>>
>>> Since the first bullet's benefit should be clear I only argue the more
>>> complex second bullet.
>>>
>>> A module is in many regards like a class. A class encapsulates members,
>>> depends on other members/classes, and makes a few members accessible outside
>>> the class. A module has a similar structure but then with types/packages as
>>> members.
>>>
>>> After the initial success of Object Oriented Programming (OO) it was
>>> quickly learned that reuse did not take place at the expected scale due to
>>> excessive type coupling. The problem was that a class aggregated many
>>> dependencies to simplify its implementation but these dependencies were
>>> unrelated to the contract it implemented. Since class dependencies are
>>> transitive most applications disappointingly became an almost fully
>>> connected graph.
>>>
>>> Java's great innovation was the interface because it broke both the
>>> transitivity and aggregation of dependencies. A class could now express its
>>> dependency (use or implement) on a contract (the interface) and was
>>> therefore fully type decoupled from the opposite site.
>>>
>>> An interface can act as a contract because it names the signature of a
>>> set of methods so that the compiler can verify the client and the
>>> implementer.
>>>
>>> Since a module has a very similar structure to a class it suffers from
>>> exactly the same transitive aggregation of dependencies. This is not a
>>> theory, look at the experiences with Maven
>>> (http://www.sonatype.com/people/2011/04/how-not-to-download-the-internet/)
>>> Again, this is not that maven is bad or developers are stupid, it is the
>>> same underlying force that finally resulted in the Java interface.
>>>
>>> The parallel for the class' interface for modules is a named set of
>>> interfaces. This concept already exists in Java: a package. Looking at
>>> almost all JSRs it is clear that our industry already uses packages as
>>> "interfaces" to provider implementations.
>>>
>>> Therefore, just like a class should not depend on other implementation
>>> types, a module should preferably not depend on other modules. A module
>>> should instead depend on contracts. Since modules will be used to provide
>>> components from different sources managed with different life cycles the
>>> excessive type coupling caused by m2m is even more damaging than in c2c.
>>> Proper use of m2p creates significantly less type coupled systems than m2m,
>>> the benefits should be obvious.
>>>
>>> Since there are use cases for m2m (non-type safe languages for example) I
>>> do believe that Jigsaw should still support m2m. However, it would be
>>> greatly beneficial to our industry if we could take advantage of the lessons
>>> learned with the Java interface and realize how surprisingly important the
>>> Java package actually is in our eco system.
>>>
>>> Kind regards,
>>>
>>>        Peter Kriens
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 9 nov. 2011, at 15:04, David M. Lloyd wrote:
>>>
>>>> I'll just state now that using packages as a dependency unit is a
>>>> terrible idea, and not some architectural revelation.  That way, Peter's
>>>> wrath will be largely directed at me. :-)
>>>>
>>>> On 11/09/2011 08:02 AM, Peter Kriens wrote:
>>>>>
>>>>> I agree that tools are needed but we must be careful to not expect
>>>>> tools to stopgap an architectural issue. I think it is important to first do
>>>>> good architectural design leveraging existing tools (e.g. the Java type
>>>>> system) before you try to add new tools. It is such a pity (but all to
>>>>> common) that a design allows for classes of errors that would be impossible
>>>>> with a slightly different design.
>>>>>
>>>>> Kind regards,
>>>>>
>>>>>        Peter Kriens
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 9 nov. 2011, at 14:49, Alan Bateman wrote:
>>>>>
>>>>>> On 09/11/2011 13:04, Peter Kriens wrote:
>>>>>>>
>>>>>>> The issue is that maven problems are not caused because maven is bad
>>>>>>> or that pom authors are stupid. The reason is that the module-to-module
>>>>>>> dependency architecture in maven (and Jigsaw) is error prone ...
>>>>>>
>>>>>> This thread started out with someone asking about adding module
>>>>>> declarations to existing JAR files, and in that context, I agree it can be
>>>>>> error prone without good tools. I think things should be a lot better when
>>>>>> modules are compiled.
>>>>>>
>>>>>> -Alan.
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> - DML
>>>
>>
>
>
> --
> - DML
>