Advice + proposals regarding automodule naming

forax at univ-mlv.fr forax at univ-mlv.fr
Tue Jan 17 22:11:11 UTC 2017


Robert,
i fully agree with you that Maven can not use automatic modules.
Automatic modules have weird name rules, everything is exported and has no dependency itself*, so they are useless if you already have already a trove of info like the Maven POM.

In my opinion, the real question is not how to map existing Maven artifacts to Java modules but more,
how Maven 4 artifacts are mapped to Java modules and then how to make the transition between Maven 3 artifacts to Maven 4 artifacts as smooth as possible.

Here is my take on what can be a Maven 4 artifact,
 - a Maven 4 artifact can only depends other Maven 4 artifact (and their are some way to see a Maven3 artifact as a Maven 4 artifact if the POM is siple enough),
 - a Maven 4 artifact do not allow split packages (a lot of Maven 3 artifact uses split packages because it's a cool way to do an after the fact modularisation
   without changing the name of the module)
 - a Maven 4 artifact info is specified with info extracted from the module-info and from the POM
   (version is in the POM, exported packages are in the module-info, ...)
 etc.

once you have the precise rules, it will be easier to see how to map a Maven 3 artifact to a Maven 4 and what are the compatibility rules.

regards,
Rémi

* apart if you want to play with configurations that mix modulepath and classpath but these kind of configurations are really hard to debug.

----- Mail original -----
> De: "Robert Scholte" <rfscholte at apache.org>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: jpms-spec-experts at openjdk.java.net, "Brian Fox" <brianf at sonatype.com>
> Envoyé: Mardi 17 Janvier 2017 13:04:08
> Objet: Re: Advice + proposals regarding automodule naming

> Hi Rémi,
> 
> In the end every non-jdk.* and non-java.* module in the module-info will
> be a dependency in your buildtool descriptor. Such module must match
> exactly one versionless dependency, or conflictId as we call it, which is
> in general the groupId + artifactId (type and classifier are not relevant
> for this story).
> By ignoring the groupId a module can referred by multiple dependencies. So
> we can expect collissions. For that reason Brian did a quick scan over
> Maven Central to count the number of duplicate artifactIds.
> 
> Here's the artifactIds with 100+ groupIds:
> maven_artifact_id	count(DISTINCT maven_group_id)	count(maven_group_id)
> library	391	6854
> core	312	8188
> common	142	5084
> ui	138	1414
> 
> In theory I could have a Maven project with 391 'library'-jars on the
> classpath without any problem. And as long as they are direct dependencies
> I have control over this by simply not adding 'library' as requirement to
> module-info. The issues start when different 'library'-jars are transitive
> dependencies and when they are marked are required in the module-info file
> of my direct or transitive dependencies.
> 
> Developers of the 'library'-jars cannot use library as the module name and
> are forced to pick another name. As developer of my project in the end I
> decide which versions of dependencies are used. If the 'library'-jar gets
> a different module name and my dependency is still referring to the old
> module name, the project can't be built.
> 
> What I expect is that developers are forced to remove the requirements
> from their module-info because of the mentioned issues. So instead of
> increasing the number requirements it will be reduced. For that reason we
> say either use a unique module name from the beginning (GA) or wait until
> a dependency has its own module name before adding it as requirement.
> 
> As far as I know this is the first time the JDK/JRE decides (proposes) a
> name for an entity based on another entity. There are no relations between
> method-, class-, or package-names and there doesn't have to be a relation
> between the module name and the filename, so please don't try to do so.
> 
> regards,
> Robert
> 
> On Mon, 16 Jan 2017 16:44:03 +0100, Remi Forax <forax at univ-mlv.fr> wrote:
>> Hi Robert,
>> the problem with automatic modules is more general that just the name,
>> automatics modules also creates a flat hierarchy which doesn't map well
>> with the Maven artifact descriptor.
>>
>> I wonder why you want Maven to use automatic modules, or said
>> differently Maven has a lot of information about the artifact, why do
>> you want to forget all these information when fetching a Maven artifact.
>>
>> I think that one problem is that you do not want to create a
>> module-info.class from the Maven POM and insert it into the jar because
>> it will change the artifact*.
>> This kind of modules is supported by jigsaw under the name of synthetic
>> modules. A synthetic module is a module with a module descriptor not
>> created by javac but by another tool.
>>
>> In my opinion, automatic modules are interesting when you have jar that
>> do not come from Maven central but comes from an ad-hoc build tool and
>> will be considered as a leaf of the dependency DAG.
>> Otherwise, for existing module system, using a synthetic module seem to
>> be a better idea.
>>
>> regards,
>> Rémi
>>
>> * given you have also the problem of split packages, you also need a way
>> to merge several artifacts into one modular jar because it's the easy
>> way to solve the split package problem.
>>
>> ----- Mail original -----
>>> De: "Robert Scholte" <rfscholte at apache.org>
>>> À: jpms-spec-experts at openjdk.java.net
>>> Cc: "Apache Maven Dev" <dev at maven.apache.org>
>>> Envoyé: Lundi 16 Janvier 2017 10:37:08
>>> Objet: Advice + proposals regarding automodule naming
>>
>>> This is a message from Robert Scholte and Brian Fox. We both have been
>>> talking about this topic several weeks with other Maven developers and
>>> came to the conclusion that we should warn the jigsaw team with their
>>> current approach regarding auto modules. We will share our experiences,
>>> thoughts, conclusions and will suggest two proposals.
>>>
>>> Traditionally, the Java ecosystem has been very mature in terms of
>>> naming
>>> and namespacing. The reverse fqdn introduced into the java package was a
>>> great choice to ensure classes don’t conflict. Popular build tools such
>>> as
>>> Maven and nearly all those that followed built upon that this key
>>> concept
>>> with the introduction of “GroupId” also using the fqdn as part of the
>>> name
>>> to ensure the coordinates were properly namespaced.
>>>
>>> We’ve seen some ecosystems diverge from this leading to new challenges
>>> that ultimately had to be reversed. A great example can be seen in the “
>>> tragic mistake from npm creators ” [1] which was to launch without a
>>> namespace concept. Eventually, NPM started running out of useful names
>>> and
>>> had to backtrack to introduce “scopes” which is really just a namespace
>>> [2]. The real problem here is that the major change in namespace was
>>> backed in after several years of momentum without it. It’s taken a long
>>> time for tooling and best practice to catch up to scopes and in the
>>> interim, people have been left with a dual mode, some namespaced, some
>>> not
>>> namespaced situation that has created chaos. [3]
>>>
>>> The real issue at hand here as we consider behaviors in the jigsaw
>>> automodule revolves around two well studied concepts.
>>>
>>> The most important is the “Default effect” [3] which states that
>>> whatever
>>> the default behavior is will become the most prominent best practice. A
>>> default that uses a filename to generate a very short, un-namespaced
>>> module id effectively sets the behavior to create generic names that
>>> will
>>> eventually conflict...exactly what we’ve seen in npm.
>>>
>>> Additionally, The switching costs introduced in overcoming a default
>>> un-namespaced module id to one with a unique namespace is also
>>> significant
>>> once you consider all the potential users. This is why API change is
>>> hard,
>>> and changing the module id after the fact from the default is
>>> effectively
>>> an API change.
>>>
>>> The second principal at hand is the “Principle of least astonishment”.
>>> We
>>> want to find a default that doesn’t violate what most users would
>>> consider
>>> to be the most obvious. One could argue the current auto module
>>> algorithm
>>> doesn’t violate this principle, but it’s important to consider alternate
>>> suggestions in this light.
>>>
>>> First, lets explore the potential downsides if the default effect takes
>>> hold with the currently generated auto module id. In Apache Maven, the
>>> artifact id is the part of the coordinate that generates the filename.
>>> This means that com.somecompany:artifact:version will become
>>> artifact-version.jar, which would result in automodule id “artifact”.
>>> Armed with this understanding, that does an analysis of the Maven
>>> ecosystem have to say about potential conflicts in the automodule id?
>>>
>>> If we ignore the groupid and version of all the components in the Maven
>>> Central repository, we end up with over 13,500 (7% of the total
>>> group:artifact combinations) conflicts. This does not consider conflicts
>>> across other repositories, or within customer portfolios yet it is
>>> pretty
>>> telling. Conflicts will happen. In some cases, the number of conflicts
>>> on
>>> the same common names is well above 100. The list of conflicts as of
>>> October, 2016 can be seen here. [6]
>>>
>>> At this point, hopefully we’ve made the case for at least establishing a
>>> default module id that
>>> 1. Uses namespaces to minimizes id conflicts when possible
>>> 2. Leverages the default effect to create a de facto best practice
>>> 3. Follows the principle of least astonishment
>>>
>>> We have two potential proposals that solve these goals.
>>>
>>> Proposal 1: Leverage existing coordinates when available.
>>>
>>> Maven is inarguably the most popular build system for Java components,
>>> with Maven Central being the default and largest repository of Java
>>> components in the world. By default, every jar built by Maven
>>> automatically gets a simple properties file inserted into it with its
>>> unique coordinates. Now, not every jar in Central was built with Maven,
>>> however 94% of them were, as we can find the pom.properties file in
>>> 1,806,023 of the 1,913,561 central components . Talk about the default
>>> effect in action!
>>>
>>> It’s further important to recognize that given a jar with a
>>> pom.properties
>>> declaring coordinates, it means that the project itself has chosen those
>>> coordinates as their own name. In other words, this is how they refer to
>>> themselves, even if other consumers may not be using Maven directly.
>>>
>>> If automodule were able to peek inside a jar and generate the default id
>>> using the groupid and artifactid present in the file, this would nearly
>>> eliminate all instances of id conflict because a significant portion of
>>> the Java ecosystem is in fact built with Maven. Additionally, the fact
>>> that 1.8 million (and counting) modules would have namespace as the
>>> default behavior means we’ve taken a huge step in setting the best
>>> practice of picking module ids with a namepace. Additionally, since the
>>> project itself has chosen these coordinates and uses them as their
>>> primary
>>> distribution mechanism, this follows the principle of least astonishment
>>> to consumers regardless of their chosen build system. Finally, since all
>>> of the above are true, it’s unlikely the project would need to migrate
>>> to
>>> a new module id when they adopt jigsaw natively, thus avoiding an API
>>> switching cost for their users.
>>>
>>> Proposal 2: Drop automodules
>>> Right now Jigsaw tries to calculate a module name solely based on the
>>> name
>>> of the jar file, which now already causes issues. Besides the fact that
>>> the module name is not guaranteed unique compared with its Maven
>>> coordinate, there are extra transformations which makes it even less
>>> guaranteed that it is unique; e.g. dashes are replaced by dots (which
>>> are
>>> both valid artifactId characters), in some cases the number and their
>>> following characters are stripped off. For artifacts like
>>> jboss-servlet-api_4.0_spec it makes sense, however we already see issues
>>> here where commons-lang, commons-lang2 and commons-lang3 get the same
>>> module name,
>>> even though they have different artifactIds and contain different
>>> packages. Choosing different artifactIds and packages was a very wise
>>> decision because it made it possible that these jars could live next to
>>> each other. Removing that separation by the authors is a very unwise
>>> decision.
>>>
>>> Another known example is the jsrNNN jars, which now all get jsr as the
>>> module name.
>>>
>>> Is it highly unlikely there is one single rule to capture all the use
>>> cases and which always result in a module name we can work with.
>>>
>>> For that reason the other proposal is to simply drop automodules. Don’t
>>> try to come up with a name for unnamed jars. It might look like the
>>> feature of automodules makes migrating easier because every dependency
>>> will get a name so can complete your module-info for all requirements,
>>> but
>>> we expect that once Jigsaw comes to speed the invalid module names are
>>> actually blocking further development due to name collisions or forced
>>> renaming by transitive modular jars.
>>>
>>> The advantage of this proposal is that library builders are not forced
>>> to
>>> keep the proposed module name in order to maintain backwards
>>> compatibility
>>> with the default.. Instead library builders can pick a more suitable
>>> module name. The modular system doesn’t allow the same package to be
>>> exported by multiple jars (and automodules exports every package).
>>> Library
>>> builders can fix this is their new jars, however if end users would
>>> require both jars because they were specified as requirements in
>>> different
>>> transitive jars, you cannot compile this project. There’s just no
>>> dependency-excludes like Maven has, because “requires” in the
>>> module-info
>>> really means requires. Dropping automodules will prevent these kind of
>>> issues, because a package can only be exported by a named module.
>>>
>>> Sure, this means that for end users they cannot refer to every jar in
>>> their module-info. But at least if they add a “requires” to their
>>> module-info, they can ensure that it’ll always refer to the intended
>>> modular jar. With build tools like Maven the chance of missing artifacts
>>> on the classpath has already been reduced a lot. In general builds have
>>> become quite stable, so we don’t expect that developers will translate
>>> all
>>> dependencies to the module-info file, especially if we warn them about
>>> the
>>> possible consequences of depending on automodules. Only referring to
>>> named
>>> modules and even a single “requires” is already a gain. There’s no
>>> reason
>>> to try to speed this up and give the developer the false impression that
>>> it’ll keep working when upgrading to real modular jars. Focus should be
>>> on
>>> the target, not on the path how to reach it.
>>>
>>> Dropping the automodules will prevent a lot of discussions about what is
>>> the correct way to select a module name and will give the responsibility
>>> for the name back to the place where it belongs: the developer.
>>>
>>> [1]
>>> http://stackoverflow.com/questions/22053381/lack-of-available-module-names-on-npm
>>> [2]
>>> http://blog.npmjs.org/post/116936804365/solving-npms-hard-problem-naming-packages
>>> [3] The fact that so much of the npm ecosystem is effectively
>>> not-namespaced is has actually
>>> created potential build time malware injection possibilities. If I know
>>> of
>>> a package in use by a
>>> company through log analysis, bug report analysis etc, I could
>>> potentially
>>> go register the same
>>> name in the default repo with a very high semver and know that it’s very
>>> likely this would be
>>> picked up over the intended internally developed module because there’s
>>> no
>>> namespace.
>>> [4] https://en.wikipedia.org/wiki/Default_effect_(psychology)
>>> [5] https://en.wikipedia.org/wiki/Principle_of_least_astonishment
>>> [6]
>>> https://docs.google.com/spreadsheets/d/1TVR5uTpDYw0827AlvPRu8l95zHnFPL_g61TdPtnj
>>> Q5M/edit?usp=sharing
>>> [7] http://openjdk.java.net/jeps/261 #Risk and assumptions
>>> [8]
> >> https://www.mail-archive.com/jigsaw-dev@openjdk.java.net/msg06623.html


More information about the jpms-spec-observers mailing list