Advice + proposals regarding automodule naming

Nicolai Parlog nipa at codefx.org
Wed Jan 25 08:42:50 UTC 2017


 Hi Robert,

I read the entire ensuing thread but I have to admit that I don't
understand all the details. I was always looking for an answer to this
question:

Why would I use the artifactId as an automatic module name?

I always assumed that the entire project name, which in a world
dominated by Maven Central means groupId:artifactId, would have to be
the automatic module name. It would hence also have to be the JAR name.

Now, it looks like this would be problematic for Maven but isn't that
"just" a problem for some particular build tool's implementation? I
don't quite get how Maven's naming strategy for JARs voids the entire
approach to automatic modules. (Now I wish I wouldn't have written
"world dominated by Maven". ;) )

 so long ... Nicolai



On 16.01.2017 10:37, Robert Scholte wrote:
> This is a message from Robert Scholte and Brian Fox. We both have
> been talking about this topic several weeks with other Maven
> developers and came to the conclusion that we should warn the
> jigsaw team with their current approach regarding auto modules. We
> will share our experiences, thoughts, conclusions and will suggest
> two proposals.
> 
> Traditionally, the Java ecosystem has been very mature in terms of 
> naming and namespacing. The reverse fqdn introduced into the java 
> package was a great choice to ensure classes don’t conflict.
> Popular build tools such as Maven and nearly all those that
> followed built upon that this key concept with the introduction of
> “GroupId” also using the fqdn as part of the name to ensure the
> coordinates were properly namespaced.
> 
> We’ve seen some ecosystems diverge from this leading to new
> challenges that ultimately had to be reversed. A great example can
> be seen in the “ tragic mistake from npm creators ” [1] which was
> to launch without a namespace concept. Eventually, NPM started
> running out of useful names and had to backtrack to introduce
> “scopes” which is really just a namespace [2]. The real problem
> here is that the major change in namespace was backed in after
> several years of momentum without it. It’s taken a long time for
> tooling and best practice to catch up to scopes and in the interim,
> people have been left with a dual mode, some namespaced, some not
> namespaced situation that has created chaos. [3]
> 
> The real issue at hand here as we consider behaviors in the jigsaw 
> automodule revolves around two well studied concepts.
> 
> The most important is the “Default effect” [3] which states that 
> whatever the default behavior is will become the most prominent
> best practice. A default that uses a filename to generate a very
> short, un-namespaced module id effectively sets the behavior to
> create generic names that will eventually conflict...exactly what
> we’ve seen in npm.
> 
> Additionally, The switching costs introduced in overcoming a
> default un-namespaced module id to one with a unique namespace is
> also significant once you consider all the potential users. This is
> why API change is hard, and changing the module id after the fact
> from the default is effectively an API change.
> 
> The second principal at hand is the “Principle of least
> astonishment”. We want to find a default that doesn’t violate what
> most users would consider to be the most obvious. One could argue
> the current auto module algorithm doesn’t violate this principle,
> but it’s important to consider alternate suggestions in this
> light.
> 
> First, lets explore the potential downsides if the default effect
> takes hold with the currently generated auto module id. In Apache
> Maven, the artifact id is the part of the coordinate that generates
> the filename. This means that com.somecompany:artifact:version will
> become artifact-version.jar, which would result in automodule id
> “artifact”. Armed with this understanding, that does an analysis of
> the Maven ecosystem have to say about potential conflicts in the
> automodule id?
> 
> If we ignore the groupid and version of all the components in the
> Maven Central repository, we end up with over 13,500 (7% of the
> total group:artifact combinations) conflicts. This does not
> consider conflicts across other repositories, or within customer
> portfolios yet it is pretty telling. Conflicts will happen. In some
> cases, the number of conflicts on the same common names is well
> above 100. The list of conflicts as of October, 2016 can be seen
> here. [6]
> 
> At this point, hopefully we’ve made the case for at least
> establishing a default module id that 1. Uses namespaces to
> minimizes id conflicts when possible 2. Leverages the default
> effect to create a de facto best practice 3. Follows the principle
> of least astonishment
> 
> We have two potential proposals that solve these goals.
> 
> Proposal 1: Leverage existing coordinates when available.
> 
> Maven is inarguably the most popular build system for Java
> components, with Maven Central being the default and largest
> repository of Java components in the world. By default, every jar
> built by Maven automatically gets a simple properties file inserted
> into it with its unique coordinates. Now, not every jar in Central
> was built with Maven, however 94% of them were, as we can find the
> pom.properties file in 1,806,023 of the 1,913,561 central
> components . Talk about the default effect in action!
> 
> It’s further important to recognize that given a jar with a 
> pom.properties declaring coordinates, it means that the project
> itself has chosen those coordinates as their own name. In other
> words, this is how they refer to themselves, even if other
> consumers may not be using Maven directly.
> 
> If automodule were able to peek inside a jar and generate the
> default id using the groupid and artifactid present in the file,
> this would nearly eliminate all instances of id conflict because a
> significant portion of the Java ecosystem is in fact built with
> Maven. Additionally, the fact that 1.8 million (and counting)
> modules would have namespace as the default behavior means we’ve
> taken a huge step in setting the best practice of picking module
> ids with a namepace. Additionally, since the project itself has
> chosen these coordinates and uses them as their primary
> distribution mechanism, this follows the principle of least 
> astonishment to consumers regardless of their chosen build system. 
> Finally, since all of the above are true, it’s unlikely the
> project would need to migrate to a new module id when they adopt
> jigsaw natively, thus avoiding an API switching cost for their
> users.
> 
> Proposal 2: Drop automodules Right now Jigsaw tries to calculate a
> module name solely based on the name of the jar file, which now
> already causes issues. Besides the fact that the module name is not
> guaranteed unique compared with its Maven coordinate, there are
> extra transformations which makes it even less guaranteed that it
> is unique; e.g. dashes are replaced by dots (which are both valid
> artifactId characters), in some cases the number and their
> following characters are stripped off. For artifacts like 
> jboss-servlet-api_4.0_spec it makes sense, however we already see
> issues here where commons-lang, commons-lang2 and commons-lang3 get
> the same module name, even though they have different artifactIds
> and contain different packages. Choosing different artifactIds and
> packages was a very wise decision because it made it possible that
> these jars could live next to each other. Removing that separation
> by the authors is a very unwise decision.
> 
> Another known example is the jsrNNN jars, which now all get jsr as
> the module name.
> 
> Is it highly unlikely there is one single rule to capture all the
> use cases and which always result in a module name we can work
> with.
> 
> For that reason the other proposal is to simply drop automodules.
> Don’t try to come up with a name for unnamed jars. It might look
> like the feature of automodules makes migrating easier because
> every dependency will get a name so can complete your module-info
> for all requirements, but we expect that once Jigsaw comes to speed
> the invalid module names are actually blocking further development
> due to name collisions or forced renaming by transitive modular
> jars.
> 
> The advantage of this proposal is that library builders are not
> forced to keep the proposed module name in order to maintain
> backwards compatibility with the default.. Instead library builders
> can pick a more suitable module name. The modular system doesn’t
> allow the same package to be exported by multiple jars (and
> automodules exports every package). Library builders can fix this
> is their new jars, however if end users would require both jars
> because they were specified as requirements in different transitive
> jars, you cannot compile this project. There’s just no
> dependency-excludes like Maven has, because “requires” in the
> module-info really means requires. Dropping automodules will
> prevent these kind of issues, because a package can only be
> exported by a named module.
> 
> Sure, this means that for end users they cannot refer to every jar
> in their module-info. But at least if they add a “requires” to
> their module-info, they can ensure that it’ll always refer to the
> intended modular jar. With build tools like Maven the chance of
> missing artifacts on the classpath has already been reduced a lot.
> In general builds have become quite stable, so we don’t expect that
> developers will translate all dependencies to the module-info file,
> especially if we warn them about the possible consequences of
> depending on automodules. Only referring to named modules and even
> a single “requires” is already a gain. There’s no reason to try to
> speed this up and give the developer the false impression that
> it’ll keep working when upgrading to real modular jars. Focus
> should be on the target, not on the path how to reach it.
> 
> Dropping the automodules will prevent a lot of discussions about
> what is the correct way to select a module name and will give the
> responsibility for the name back to the place where it belongs: the
> developer.
> 
> [1] 
> http://stackoverflow.com/questions/22053381/lack-of-available-module-names-on-npm
>
>  [2] 
> http://blog.npmjs.org/post/116936804365/solving-npms-hard-problem-naming-packages
>
>  [3] The fact that so much of the npm ecosystem is effectively 
> not-namespaced is has actually created potential build time malware
> injection possibilities. If I know of a package in use by a company
> through log analysis, bug report analysis etc, I could potentially
> go register the same name in the default repo with a very high
> semver and know that it’s very likely this would be picked up over
> the intended internally developed module because there’s no
> namespace. [4]
> https://en.wikipedia.org/wiki/Default_effect_(psychology) [5]
> https://en.wikipedia.org/wiki/Principle_of_least_astonishment [6] 
> https://docs.google.com/spreadsheets/d/1TVR5uTpDYw0827AlvPRu8l95zHnFPL_g61TdPtnj
>
>  Q5M/edit?usp=sharing [7] http://openjdk.java.net/jeps/261 #Risk
> and assumptions [8]
> https://www.mail-archive.com/jigsaw-dev@openjdk.java.net/msg06623.html
>
> 
-- 

PGP Key:
    http://keys.gnupg.net/pks/lookup?op=vindex&search=0xCA3BAD2E9CCCD509

Web:
    http://codefx.org
        a blog about software development
    https://www.sitepoint.com/java
        high-quality Java/JVM content
    http://do-foss.de
        Free and Open Source Software for the City of Dortmund

Twitter:
    https://twitter.com/nipafx


More information about the jpms-spec-observers mailing list