Advice + proposals regarding automodule naming
rfscholte at apache.org
Tue Jan 17 12:04:08 UTC 2017
In the end every non-jdk.* and non-java.* module in the module-info will
be a dependency in your buildtool descriptor. Such module must match
exactly one versionless dependency, or conflictId as we call it, which is
in general the groupId + artifactId (type and classifier are not relevant
for this story).
By ignoring the groupId a module can referred by multiple dependencies. So
we can expect collissions. For that reason Brian did a quick scan over
Maven Central to count the number of duplicate artifactIds.
Here's the artifactIds with 100+ groupIds:
maven_artifact_id count(DISTINCT maven_group_id) count(maven_group_id)
library 391 6854
core 312 8188
common 142 5084
ui 138 1414
In theory I could have a Maven project with 391 'library'-jars on the
classpath without any problem. And as long as they are direct dependencies
I have control over this by simply not adding 'library' as requirement to
module-info. The issues start when different 'library'-jars are transitive
dependencies and when they are marked are required in the module-info file
of my direct or transitive dependencies.
Developers of the 'library'-jars cannot use library as the module name and
are forced to pick another name. As developer of my project in the end I
decide which versions of dependencies are used. If the 'library'-jar gets
a different module name and my dependency is still referring to the old
module name, the project can't be built.
What I expect is that developers are forced to remove the requirements
from their module-info because of the mentioned issues. So instead of
increasing the number requirements it will be reduced. For that reason we
say either use a unique module name from the beginning (GA) or wait until
a dependency has its own module name before adding it as requirement.
As far as I know this is the first time the JDK/JRE decides (proposes) a
name for an entity based on another entity. There are no relations between
method-, class-, or package-names and there doesn't have to be a relation
between the module name and the filename, so please don't try to do so.
On Mon, 16 Jan 2017 16:44:03 +0100, Remi Forax <forax at univ-mlv.fr> wrote:
> Hi Robert,
> the problem with automatic modules is more general that just the name,
> automatics modules also creates a flat hierarchy which doesn't map well
> with the Maven artifact descriptor.
> I wonder why you want Maven to use automatic modules, or said
> differently Maven has a lot of information about the artifact, why do
> you want to forget all these information when fetching a Maven artifact.
> I think that one problem is that you do not want to create a
> module-info.class from the Maven POM and insert it into the jar because
> it will change the artifact*.
> This kind of modules is supported by jigsaw under the name of synthetic
> modules. A synthetic module is a module with a module descriptor not
> created by javac but by another tool.
> In my opinion, automatic modules are interesting when you have jar that
> do not come from Maven central but comes from an ad-hoc build tool and
> will be considered as a leaf of the dependency DAG.
> Otherwise, for existing module system, using a synthetic module seem to
> be a better idea.
> * given you have also the problem of split packages, you also need a way
> to merge several artifacts into one modular jar because it's the easy
> way to solve the split package problem.
> ----- Mail original -----
>> De: "Robert Scholte" <rfscholte at apache.org>
>> À: jpms-spec-experts at openjdk.java.net
>> Cc: "Apache Maven Dev" <dev at maven.apache.org>
>> Envoyé: Lundi 16 Janvier 2017 10:37:08
>> Objet: Advice + proposals regarding automodule naming
>> This is a message from Robert Scholte and Brian Fox. We both have been
>> talking about this topic several weeks with other Maven developers and
>> came to the conclusion that we should warn the jigsaw team with their
>> current approach regarding auto modules. We will share our experiences,
>> thoughts, conclusions and will suggest two proposals.
>> Traditionally, the Java ecosystem has been very mature in terms of
>> and namespacing. The reverse fqdn introduced into the java package was a
>> great choice to ensure classes don’t conflict. Popular build tools such
>> Maven and nearly all those that followed built upon that this key
>> with the introduction of “GroupId” also using the fqdn as part of the
>> to ensure the coordinates were properly namespaced.
>> We’ve seen some ecosystems diverge from this leading to new challenges
>> that ultimately had to be reversed. A great example can be seen in the “
>> tragic mistake from npm creators ”  which was to launch without a
>> namespace concept. Eventually, NPM started running out of useful names
>> had to backtrack to introduce “scopes” which is really just a namespace
>> . The real problem here is that the major change in namespace was
>> backed in after several years of momentum without it. It’s taken a long
>> time for tooling and best practice to catch up to scopes and in the
>> interim, people have been left with a dual mode, some namespaced, some
>> namespaced situation that has created chaos. 
>> The real issue at hand here as we consider behaviors in the jigsaw
>> automodule revolves around two well studied concepts.
>> The most important is the “Default effect”  which states that
>> the default behavior is will become the most prominent best practice. A
>> default that uses a filename to generate a very short, un-namespaced
>> module id effectively sets the behavior to create generic names that
>> eventually conflict...exactly what we’ve seen in npm.
>> Additionally, The switching costs introduced in overcoming a default
>> un-namespaced module id to one with a unique namespace is also
>> once you consider all the potential users. This is why API change is
>> and changing the module id after the fact from the default is
>> an API change.
>> The second principal at hand is the “Principle of least astonishment”.
>> want to find a default that doesn’t violate what most users would
>> to be the most obvious. One could argue the current auto module
>> doesn’t violate this principle, but it’s important to consider alternate
>> suggestions in this light.
>> First, lets explore the potential downsides if the default effect takes
>> hold with the currently generated auto module id. In Apache Maven, the
>> artifact id is the part of the coordinate that generates the filename.
>> This means that com.somecompany:artifact:version will become
>> artifact-version.jar, which would result in automodule id “artifact”.
>> Armed with this understanding, that does an analysis of the Maven
>> ecosystem have to say about potential conflicts in the automodule id?
>> If we ignore the groupid and version of all the components in the Maven
>> Central repository, we end up with over 13,500 (7% of the total
>> group:artifact combinations) conflicts. This does not consider conflicts
>> across other repositories, or within customer portfolios yet it is
>> telling. Conflicts will happen. In some cases, the number of conflicts
>> the same common names is well above 100. The list of conflicts as of
>> October, 2016 can be seen here. 
>> At this point, hopefully we’ve made the case for at least establishing a
>> default module id that
>> 1. Uses namespaces to minimizes id conflicts when possible
>> 2. Leverages the default effect to create a de facto best practice
>> 3. Follows the principle of least astonishment
>> We have two potential proposals that solve these goals.
>> Proposal 1: Leverage existing coordinates when available.
>> Maven is inarguably the most popular build system for Java components,
>> with Maven Central being the default and largest repository of Java
>> components in the world. By default, every jar built by Maven
>> automatically gets a simple properties file inserted into it with its
>> unique coordinates. Now, not every jar in Central was built with Maven,
>> however 94% of them were, as we can find the pom.properties file in
>> 1,806,023 of the 1,913,561 central components . Talk about the default
>> effect in action!
>> It’s further important to recognize that given a jar with a
>> declaring coordinates, it means that the project itself has chosen those
>> coordinates as their own name. In other words, this is how they refer to
>> themselves, even if other consumers may not be using Maven directly.
>> If automodule were able to peek inside a jar and generate the default id
>> using the groupid and artifactid present in the file, this would nearly
>> eliminate all instances of id conflict because a significant portion of
>> the Java ecosystem is in fact built with Maven. Additionally, the fact
>> that 1.8 million (and counting) modules would have namespace as the
>> default behavior means we’ve taken a huge step in setting the best
>> practice of picking module ids with a namepace. Additionally, since the
>> project itself has chosen these coordinates and uses them as their
>> distribution mechanism, this follows the principle of least astonishment
>> to consumers regardless of their chosen build system. Finally, since all
>> of the above are true, it’s unlikely the project would need to migrate
>> a new module id when they adopt jigsaw natively, thus avoiding an API
>> switching cost for their users.
>> Proposal 2: Drop automodules
>> Right now Jigsaw tries to calculate a module name solely based on the
>> of the jar file, which now already causes issues. Besides the fact that
>> the module name is not guaranteed unique compared with its Maven
>> coordinate, there are extra transformations which makes it even less
>> guaranteed that it is unique; e.g. dashes are replaced by dots (which
>> both valid artifactId characters), in some cases the number and their
>> following characters are stripped off. For artifacts like
>> jboss-servlet-api_4.0_spec it makes sense, however we already see issues
>> here where commons-lang, commons-lang2 and commons-lang3 get the same
>> module name,
>> even though they have different artifactIds and contain different
>> packages. Choosing different artifactIds and packages was a very wise
>> decision because it made it possible that these jars could live next to
>> each other. Removing that separation by the authors is a very unwise
>> Another known example is the jsrNNN jars, which now all get jsr as the
>> module name.
>> Is it highly unlikely there is one single rule to capture all the use
>> cases and which always result in a module name we can work with.
>> For that reason the other proposal is to simply drop automodules. Don’t
>> try to come up with a name for unnamed jars. It might look like the
>> feature of automodules makes migrating easier because every dependency
>> will get a name so can complete your module-info for all requirements,
>> we expect that once Jigsaw comes to speed the invalid module names are
>> actually blocking further development due to name collisions or forced
>> renaming by transitive modular jars.
>> The advantage of this proposal is that library builders are not forced
>> keep the proposed module name in order to maintain backwards
>> with the default.. Instead library builders can pick a more suitable
>> module name. The modular system doesn’t allow the same package to be
>> exported by multiple jars (and automodules exports every package).
>> builders can fix this is their new jars, however if end users would
>> require both jars because they were specified as requirements in
>> transitive jars, you cannot compile this project. There’s just no
>> dependency-excludes like Maven has, because “requires” in the
>> really means requires. Dropping automodules will prevent these kind of
>> issues, because a package can only be exported by a named module.
>> Sure, this means that for end users they cannot refer to every jar in
>> their module-info. But at least if they add a “requires” to their
>> module-info, they can ensure that it’ll always refer to the intended
>> modular jar. With build tools like Maven the chance of missing artifacts
>> on the classpath has already been reduced a lot. In general builds have
>> become quite stable, so we don’t expect that developers will translate
>> dependencies to the module-info file, especially if we warn them about
>> possible consequences of depending on automodules. Only referring to
>> modules and even a single “requires” is already a gain. There’s no
>> to try to speed this up and give the developer the false impression that
>> it’ll keep working when upgrading to real modular jars. Focus should be
>> the target, not on the path how to reach it.
>> Dropping the automodules will prevent a lot of discussions about what is
>> the correct way to select a module name and will give the responsibility
>> for the name back to the place where it belongs: the developer.
>>  The fact that so much of the npm ecosystem is effectively
>> not-namespaced is has actually
>> created potential build time malware injection possibilities. If I know
>> a package in use by a
>> company through log analysis, bug report analysis etc, I could
>> go register the same
>> name in the default repo with a very high semver and know that it’s very
>> likely this would be
>> picked up over the intended internally developed module because there’s
>>  https://en.wikipedia.org/wiki/Default_effect_(psychology)
>>  https://en.wikipedia.org/wiki/Principle_of_least_astonishment
>>  http://openjdk.java.net/jeps/261 #Risk and assumptions
More information about the jpms-spec-observers