Advice + proposals regarding automodule naming

Robert Scholte rfscholte at apache.org
Tue Jan 17 12:04:08 UTC 2017


Hi Rémi,

In the end every non-jdk.* and non-java.* module in the module-info will  
be a dependency in your buildtool descriptor. Such module must match  
exactly one versionless dependency, or conflictId as we call it, which is  
in general the groupId + artifactId (type and classifier are not relevant  
for this story).
By ignoring the groupId a module can referred by multiple dependencies. So  
we can expect collissions. For that reason Brian did a quick scan over  
Maven Central to count the number of duplicate artifactIds.

Here's the artifactIds with 100+ groupIds:
maven_artifact_id	count(DISTINCT maven_group_id)	count(maven_group_id)
library	391	6854
core	312	8188
common	142	5084
ui	138	1414

In theory I could have a Maven project with 391 'library'-jars on the  
classpath without any problem. And as long as they are direct dependencies  
I have control over this by simply not adding 'library' as requirement to  
module-info. The issues start when different 'library'-jars are transitive  
dependencies and when they are marked are required in the module-info file  
of my direct or transitive dependencies.

Developers of the 'library'-jars cannot use library as the module name and  
are forced to pick another name. As developer of my project in the end I  
decide which versions of dependencies are used. If the 'library'-jar gets  
a different module name and my dependency is still referring to the old  
module name, the project can't be built.

What I expect is that developers are forced to remove the requirements  
 from their module-info because of the mentioned issues. So instead of  
increasing the number requirements it will be reduced. For that reason we  
say either use a unique module name from the beginning (GA) or wait until  
a dependency has its own module name before adding it as requirement.

As far as I know this is the first time the JDK/JRE decides (proposes) a  
name for an entity based on another entity. There are no relations between  
method-, class-, or package-names and there doesn't have to be a relation  
between the module name and the filename, so please don't try to do so.

regards,
Robert

On Mon, 16 Jan 2017 16:44:03 +0100, Remi Forax <forax at univ-mlv.fr> wrote:
> Hi Robert,
> the problem with automatic modules is more general that just the name,  
> automatics modules also creates a flat hierarchy which doesn't map well  
> with the Maven artifact descriptor.
>
> I wonder why you want Maven to use automatic modules, or said  
> differently Maven has a lot of information about the artifact, why do  
> you want to forget all these information when fetching a Maven artifact.
>
> I think that one problem is that you do not want to create a  
> module-info.class from the Maven POM and insert it into the jar because  
> it will change the artifact*.
> This kind of modules is supported by jigsaw under the name of synthetic  
> modules. A synthetic module is a module with a module descriptor not  
> created by javac but by another tool.
>
> In my opinion, automatic modules are interesting when you have jar that  
> do not come from Maven central but comes from an ad-hoc build tool and  
> will be considered as a leaf of the dependency DAG.
> Otherwise, for existing module system, using a synthetic module seem to  
> be a better idea.
>
> regards,
> Rémi
>
> * given you have also the problem of split packages, you also need a way  
> to merge several artifacts into one modular jar because it's the easy  
> way to solve the split package problem.
>
> ----- Mail original -----
>> De: "Robert Scholte" <rfscholte at apache.org>
>> À: jpms-spec-experts at openjdk.java.net
>> Cc: "Apache Maven Dev" <dev at maven.apache.org>
>> Envoyé: Lundi 16 Janvier 2017 10:37:08
>> Objet: Advice + proposals regarding automodule naming
>
>> This is a message from Robert Scholte and Brian Fox. We both have been
>> talking about this topic several weeks with other Maven developers and
>> came to the conclusion that we should warn the jigsaw team with their
>> current approach regarding auto modules. We will share our experiences,
>> thoughts, conclusions and will suggest two proposals.
>>
>> Traditionally, the Java ecosystem has been very mature in terms of  
>> naming
>> and namespacing. The reverse fqdn introduced into the java package was a
>> great choice to ensure classes don’t conflict. Popular build tools such  
>> as
>> Maven and nearly all those that followed built upon that this key  
>> concept
>> with the introduction of “GroupId” also using the fqdn as part of the  
>> name
>> to ensure the coordinates were properly namespaced.
>>
>> We’ve seen some ecosystems diverge from this leading to new challenges
>> that ultimately had to be reversed. A great example can be seen in the “
>> tragic mistake from npm creators ” [1] which was to launch without a
>> namespace concept. Eventually, NPM started running out of useful names  
>> and
>> had to backtrack to introduce “scopes” which is really just a namespace
>> [2]. The real problem here is that the major change in namespace was
>> backed in after several years of momentum without it. It’s taken a long
>> time for tooling and best practice to catch up to scopes and in the
>> interim, people have been left with a dual mode, some namespaced, some  
>> not
>> namespaced situation that has created chaos. [3]
>>
>> The real issue at hand here as we consider behaviors in the jigsaw
>> automodule revolves around two well studied concepts.
>>
>> The most important is the “Default effect” [3] which states that  
>> whatever
>> the default behavior is will become the most prominent best practice. A
>> default that uses a filename to generate a very short, un-namespaced
>> module id effectively sets the behavior to create generic names that  
>> will
>> eventually conflict...exactly what we’ve seen in npm.
>>
>> Additionally, The switching costs introduced in overcoming a default
>> un-namespaced module id to one with a unique namespace is also  
>> significant
>> once you consider all the potential users. This is why API change is  
>> hard,
>> and changing the module id after the fact from the default is  
>> effectively
>> an API change.
>>
>> The second principal at hand is the “Principle of least astonishment”.  
>> We
>> want to find a default that doesn’t violate what most users would  
>> consider
>> to be the most obvious. One could argue the current auto module  
>> algorithm
>> doesn’t violate this principle, but it’s important to consider alternate
>> suggestions in this light.
>>
>> First, lets explore the potential downsides if the default effect takes
>> hold with the currently generated auto module id. In Apache Maven, the
>> artifact id is the part of the coordinate that generates the filename.
>> This means that com.somecompany:artifact:version will become
>> artifact-version.jar, which would result in automodule id “artifact”.
>> Armed with this understanding, that does an analysis of the Maven
>> ecosystem have to say about potential conflicts in the automodule id?
>>
>> If we ignore the groupid and version of all the components in the Maven
>> Central repository, we end up with over 13,500 (7% of the total
>> group:artifact combinations) conflicts. This does not consider conflicts
>> across other repositories, or within customer portfolios yet it is  
>> pretty
>> telling. Conflicts will happen. In some cases, the number of conflicts  
>> on
>> the same common names is well above 100. The list of conflicts as of
>> October, 2016 can be seen here. [6]
>>
>> At this point, hopefully we’ve made the case for at least establishing a
>> default module id that
>> 1. Uses namespaces to minimizes id conflicts when possible
>> 2. Leverages the default effect to create a de facto best practice
>> 3. Follows the principle of least astonishment
>>
>> We have two potential proposals that solve these goals.
>>
>> Proposal 1: Leverage existing coordinates when available.
>>
>> Maven is inarguably the most popular build system for Java components,
>> with Maven Central being the default and largest repository of Java
>> components in the world. By default, every jar built by Maven
>> automatically gets a simple properties file inserted into it with its
>> unique coordinates. Now, not every jar in Central was built with Maven,
>> however 94% of them were, as we can find the pom.properties file in
>> 1,806,023 of the 1,913,561 central components . Talk about the default
>> effect in action!
>>
>> It’s further important to recognize that given a jar with a  
>> pom.properties
>> declaring coordinates, it means that the project itself has chosen those
>> coordinates as their own name. In other words, this is how they refer to
>> themselves, even if other consumers may not be using Maven directly.
>>
>> If automodule were able to peek inside a jar and generate the default id
>> using the groupid and artifactid present in the file, this would nearly
>> eliminate all instances of id conflict because a significant portion of
>> the Java ecosystem is in fact built with Maven. Additionally, the fact
>> that 1.8 million (and counting) modules would have namespace as the
>> default behavior means we’ve taken a huge step in setting the best
>> practice of picking module ids with a namepace. Additionally, since the
>> project itself has chosen these coordinates and uses them as their  
>> primary
>> distribution mechanism, this follows the principle of least astonishment
>> to consumers regardless of their chosen build system. Finally, since all
>> of the above are true, it’s unlikely the project would need to migrate  
>> to
>> a new module id when they adopt jigsaw natively, thus avoiding an API
>> switching cost for their users.
>>
>> Proposal 2: Drop automodules
>> Right now Jigsaw tries to calculate a module name solely based on the  
>> name
>> of the jar file, which now already causes issues. Besides the fact that
>> the module name is not guaranteed unique compared with its Maven
>> coordinate, there are extra transformations which makes it even less
>> guaranteed that it is unique; e.g. dashes are replaced by dots (which  
>> are
>> both valid artifactId characters), in some cases the number and their
>> following characters are stripped off. For artifacts like
>> jboss-servlet-api_4.0_spec it makes sense, however we already see issues
>> here where commons-lang, commons-lang2 and commons-lang3 get the same
>> module name,
>> even though they have different artifactIds and contain different
>> packages. Choosing different artifactIds and packages was a very wise
>> decision because it made it possible that these jars could live next to
>> each other. Removing that separation by the authors is a very unwise
>> decision.
>>
>> Another known example is the jsrNNN jars, which now all get jsr as the
>> module name.
>>
>> Is it highly unlikely there is one single rule to capture all the use
>> cases and which always result in a module name we can work with.
>>
>> For that reason the other proposal is to simply drop automodules. Don’t
>> try to come up with a name for unnamed jars. It might look like the
>> feature of automodules makes migrating easier because every dependency
>> will get a name so can complete your module-info for all requirements,  
>> but
>> we expect that once Jigsaw comes to speed the invalid module names are
>> actually blocking further development due to name collisions or forced
>> renaming by transitive modular jars.
>>
>> The advantage of this proposal is that library builders are not forced  
>> to
>> keep the proposed module name in order to maintain backwards  
>> compatibility
>> with the default.. Instead library builders can pick a more suitable
>> module name. The modular system doesn’t allow the same package to be
>> exported by multiple jars (and automodules exports every package).  
>> Library
>> builders can fix this is their new jars, however if end users would
>> require both jars because they were specified as requirements in  
>> different
>> transitive jars, you cannot compile this project. There’s just no
>> dependency-excludes like Maven has, because “requires” in the  
>> module-info
>> really means requires. Dropping automodules will prevent these kind of
>> issues, because a package can only be exported by a named module.
>>
>> Sure, this means that for end users they cannot refer to every jar in
>> their module-info. But at least if they add a “requires” to their
>> module-info, they can ensure that it’ll always refer to the intended
>> modular jar. With build tools like Maven the chance of missing artifacts
>> on the classpath has already been reduced a lot. In general builds have
>> become quite stable, so we don’t expect that developers will translate  
>> all
>> dependencies to the module-info file, especially if we warn them about  
>> the
>> possible consequences of depending on automodules. Only referring to  
>> named
>> modules and even a single “requires” is already a gain. There’s no  
>> reason
>> to try to speed this up and give the developer the false impression that
>> it’ll keep working when upgrading to real modular jars. Focus should be  
>> on
>> the target, not on the path how to reach it.
>>
>> Dropping the automodules will prevent a lot of discussions about what is
>> the correct way to select a module name and will give the responsibility
>> for the name back to the place where it belongs: the developer.
>>
>> [1]
>> http://stackoverflow.com/questions/22053381/lack-of-available-module-names-on-npm
>> [2]
>> http://blog.npmjs.org/post/116936804365/solving-npms-hard-problem-naming-packages
>> [3] The fact that so much of the npm ecosystem is effectively
>> not-namespaced is has actually
>> created potential build time malware injection possibilities. If I know  
>> of
>> a package in use by a
>> company through log analysis, bug report analysis etc, I could  
>> potentially
>> go register the same
>> name in the default repo with a very high semver and know that it’s very
>> likely this would be
>> picked up over the intended internally developed module because there’s  
>> no
>> namespace.
>> [4] https://en.wikipedia.org/wiki/Default_effect_(psychology)
>> [5] https://en.wikipedia.org/wiki/Principle_of_least_astonishment
>> [6]
>> https://docs.google.com/spreadsheets/d/1TVR5uTpDYw0827AlvPRu8l95zHnFPL_g61TdPtnj
>> Q5M/edit?usp=sharing
>> [7] http://openjdk.java.net/jeps/261 #Risk and assumptions
>> [8]  
>> https://www.mail-archive.com/jigsaw-dev@openjdk.java.net/msg06623.html


More information about the jpms-spec-experts mailing list