Advice + proposals regarding automodule naming

Robert Scholte rfscholte at apache.org
Thu Jan 19 15:43:39 UTC 2017


>> Hi Rémi,
>
> Robert,
>
>>
>> I'm getting a JavaOne 2015 déjà vu :)
>
> i was not at JavaOne, so let say we're progressing, at least, i may  
> start to understand your problem better ...
>
>>
>> It seems like you expect there will be a new pom-definition to support
>> these kind of extra information.
>> The current POM modelVersion (4.0.0) is not only used by Maven but by a
>> lot of tools, probably even more than we know of. We wonder if they do  
>> XSD
>> checking, so we must be very, very careful with every adjustment. So
>> pom-4.0.0 is a fact with all its restrictions. We are working on  
>> pom-5.0.0
>> but we will always make sure there will also be a pom-4.0.0 available
>> (either pre-generated or runtime transformed) for the current tools.  
>> Also,
>> its definition should work for any software technology, not just for  
>> Java.
>> In the beginning I had the idea of working with new scopes to decide if  
>> a
>> dependency belongs to the modulepath or classpath, but there's a strict
>> set of scopes in pom-4.0.0, so again no option. And by now I know this  
>> is
>> not required, the info is already there once I can read all module-info
>> files.
>> It would have helped if a modular jar had a different extension, so  
>> every
>> can see from the *outside* what kind of jar it is.
>
> Testing if a jar is a modular jar or not is easy BTW,
>    ModuleFinder.of(Paths.get("my.jar")).findAll().iterator().next().descriptor().isAutomatic()
>
> true means it's a plain old jar, false means it's a modular jar.
>

I know, I'm already using this trick in the maven-dependency-plugin

>>
>> There's no such thing as a Maven4 artifact: any artifact is a file  
>> (often
>> jar) with a coordinate and an extra file with dependency declarations.
>
> for me, Maven4 artifact == jar + POM v5
>
>> During dependency resolution all build-information is ignored! The  
>> problem
>> with the module-info file is comparable with the java bytecode version:
>> you have to go in the jar to get this information.
>
> yes,
> but you do not need to know if it's a modular jar or a plain old jar  
> during the dependency resolution, you can trust the Maven Central info,
> and then when installing, you can decide which jars should go the  
> classpath, which ones should go in the modulepath
> (or which one should be upgraded from a plain jar to a modular jar  
> because you can use the POM info to generate a compatible  
> module-info.class)
>

There is absolutely no reason to introduce a new POM version for this. The  
required information is inside the jar; even when jars are built with  
other tools, the info is there. This is actually a very good thing that  
Java9 doesn't require a brand new Maven. I personally advertised that our  
challenge was to make it all work with Maven 3.0 and it does.

bq. "i fully agree with you that Maven can not use automatic modules."
Well, Maven could do it, but we don't want to because we cannot trace it  
back to the right dependency AND ensure for 100% this was indeed the  
intended dependency to be the automodule.
But this is how we look at it from a Maven perspective. Any other  
buildtool is free to decide what their strategy will be. As long as  
developers can refer to an automatic module in their module-info, such jar  
can end up in Maven Central or any other repository and all build tools  
must be able to handle it.

In our case Maven Central could think of adding rules to verify that the  
module-info never refers to auto modules, but that's just one repository.  
As long as there's support for auto modules, they will show up anywhere  
and will become another dependency for a Maven project. If such dependency  
becomes a requirement in the module-info, the build will fail since Maven  
has detected an auto module and cannot be 100% sure which dependency is  
related to it. So we must advice: don't "require" that module, which is  
the opposite of what we want to achieve: best practice should be to add as  
much *valid* requirements to the module-info as possible.

regards,
Robert

>>
>> At the moment I'm pretty far with the maven-compiler-plugin, but now  
>> every
>> dependency acts like an automodule. My next step would probably be to
>> analyze every module-info file and decide if jars belong to the  
>> classpath
>> or modulepath, only allowing modular jars on the module path because of
>> our concerns.
>
> yes,
> as i said in the previous paragraph, you can also decide that with the  
> help of the POM info, you can try to upgrade the jar to make it modular.
>
>>
>> regards,
>> Robert
>
> regards,
> Rémi
>
>
>>
>> On Tue, 17 Jan 2017 23:11:11 +0100, <forax at univ-mlv.fr> wrote:
>>
>>> Robert,
>>> i fully agree with you that Maven can not use automatic modules.
>>> Automatic modules have weird name rules, everything is exported and has
>>> no dependency itself*, so they are useless if you already have already  
>>> a
>>> trove of info like the Maven POM.
>>>
>>> In my opinion, the real question is not how to map existing Maven
>>> artifacts to Java modules but more,
>>> how Maven 4 artifacts are mapped to Java modules and then how to make
>>> the transition between Maven 3 artifacts to Maven 4 artifacts as smooth
>>> as possible.
>>>
>>> Here is my take on what can be a Maven 4 artifact,
>>>  - a Maven 4 artifact can only depends other Maven 4 artifact (and  
>>> their
>>> are some way to see a Maven3 artifact as a Maven 4 artifact if the POM
>>> is siple enough),
>>>  - a Maven 4 artifact do not allow split packages (a lot of Maven 3
>>> artifact uses split packages because it's a cool way to do an after the
>>> fact modularisation
>>>    without changing the name of the module)
>>>  - a Maven 4 artifact info is specified with info extracted from the
>>> module-info and from the POM
>>>    (version is in the POM, exported packages are in the module-info,  
>>> ...)
>>>  etc.
>>>
>>> once you have the precise rules, it will be easier to see how to map a
>>> Maven 3 artifact to a Maven 4 and what are the compatibility rules.
>>>
>>> regards,
>>> Rémi
>>>
>>> * apart if you want to play with configurations that mix modulepath and
>>> classpath but these kind of configurations are really hard to debug.
>>>
>>> ----- Mail original -----
>>>> De: "Robert Scholte" <rfscholte at apache.org>
>>>> À: "Remi Forax" <forax at univ-mlv.fr>
>>>> Cc: jpms-spec-experts at openjdk.java.net, "Brian Fox"
>>>> <brianf at sonatype.com>
>>>> Envoyé: Mardi 17 Janvier 2017 13:04:08
>>>> Objet: Re: Advice + proposals regarding automodule naming
>>>
>>>> Hi Rémi,
>>>>
>>>> In the end every non-jdk.* and non-java.* module in the module-info  
>>>> will
>>>> be a dependency in your buildtool descriptor. Such module must match
>>>> exactly one versionless dependency, or conflictId as we call it, which
>>>> is
>>>> in general the groupId + artifactId (type and classifier are not
>>>> relevant
>>>> for this story).
>>>> By ignoring the groupId a module can referred by multiple  
>>>> dependencies.
>>>> So
>>>> we can expect collissions. For that reason Brian did a quick scan over
>>>> Maven Central to count the number of duplicate artifactIds.
>>>>
>>>> Here's the artifactIds with 100+ groupIds:
>>>> maven_artifact_id	count(DISTINCT maven_group_id)	count(maven_group_id)
>>>> library	391	6854
>>>> core	312	8188
>>>> common	142	5084
>>>> ui	138	1414
>>>>
>>>> In theory I could have a Maven project with 391 'library'-jars on the
>>>> classpath without any problem. And as long as they are direct
>>>> dependencies
>>>> I have control over this by simply not adding 'library' as requirement
>>>> to
>>>> module-info. The issues start when different 'library'-jars are
>>>> transitive
>>>> dependencies and when they are marked are required in the module-info
>>>> file
>>>> of my direct or transitive dependencies.
>>>>
>>>> Developers of the 'library'-jars cannot use library as the module name
>>>> and
>>>> are forced to pick another name. As developer of my project in the  
>>>> end I
>>>> decide which versions of dependencies are used. If the 'library'-jar
>>>> gets
>>>> a different module name and my dependency is still referring to the  
>>>> old
>>>> module name, the project can't be built.
>>>>
>>>> What I expect is that developers are forced to remove the requirements
>>>> from their module-info because of the mentioned issues. So instead of
>>>> increasing the number requirements it will be reduced. For that reason
>>>> we
>>>> say either use a unique module name from the beginning (GA) or wait
>>>> until
>>>> a dependency has its own module name before adding it as requirement.
>>>>
>>>> As far as I know this is the first time the JDK/JRE decides  
>>>> (proposes) a
>>>> name for an entity based on another entity. There are no relations
>>>> between
>>>> method-, class-, or package-names and there doesn't have to be a
>>>> relation
>>>> between the module name and the filename, so please don't try to do  
>>>> so.
>>>>
>>>> regards,
>>>> Robert
>>>>
>>>> On Mon, 16 Jan 2017 16:44:03 +0100, Remi Forax <forax at univ-mlv.fr>
>>>> wrote:
>>>>> Hi Robert,
>>>>> the problem with automatic modules is more general that just the  
>>>>> name,
>>>>> automatics modules also creates a flat hierarchy which doesn't map  
>>>>> well
>>>>> with the Maven artifact descriptor.
>>>>>
>>>>> I wonder why you want Maven to use automatic modules, or said
>>>>> differently Maven has a lot of information about the artifact, why do
>>>>> you want to forget all these information when fetching a Maven
>>>>> artifact.
>>>>>
>>>>> I think that one problem is that you do not want to create a
>>>>> module-info.class from the Maven POM and insert it into the jar  
>>>>> because
>>>>> it will change the artifact*.
>>>>> This kind of modules is supported by jigsaw under the name of  
>>>>> synthetic
>>>>> modules. A synthetic module is a module with a module descriptor not
>>>>> created by javac but by another tool.
>>>>>
>>>>> In my opinion, automatic modules are interesting when you have jar  
>>>>> that
>>>>> do not come from Maven central but comes from an ad-hoc build tool  
>>>>> and
>>>>> will be considered as a leaf of the dependency DAG.
>>>>> Otherwise, for existing module system, using a synthetic module seem  
>>>>> to
>>>>> be a better idea.
>>>>>
>>>>> regards,
>>>>> Rémi
>>>>>
>>>>> * given you have also the problem of split packages, you also need a
>>>>> way
>>>>> to merge several artifacts into one modular jar because it's the easy
>>>>> way to solve the split package problem.
>>>>>
>>>>> ----- Mail original -----
>>>>>> De: "Robert Scholte" <rfscholte at apache.org>
>>>>>> À: jpms-spec-experts at openjdk.java.net
>>>>>> Cc: "Apache Maven Dev" <dev at maven.apache.org>
>>>>>> Envoyé: Lundi 16 Janvier 2017 10:37:08
>>>>>> Objet: Advice + proposals regarding automodule naming
>>>>>
>>>>>> This is a message from Robert Scholte and Brian Fox. We both have  
>>>>>> been
>>>>>> talking about this topic several weeks with other Maven developers  
>>>>>> and
>>>>>> came to the conclusion that we should warn the jigsaw team with  
>>>>>> their
>>>>>> current approach regarding auto modules. We will share our
>>>>>> experiences,
>>>>>> thoughts, conclusions and will suggest two proposals.
>>>>>>
>>>>>> Traditionally, the Java ecosystem has been very mature in terms of
>>>>>> naming
>>>>>> and namespacing. The reverse fqdn introduced into the java package
>>>>>> was a
>>>>>> great choice to ensure classes don’t conflict. Popular build tools
>>>>>> such
>>>>>> as
>>>>>> Maven and nearly all those that followed built upon that this key
>>>>>> concept
>>>>>> with the introduction of “GroupId” also using the fqdn as part of  
>>>>>> the
>>>>>> name
>>>>>> to ensure the coordinates were properly namespaced.
>>>>>>
>>>>>> We’ve seen some ecosystems diverge from this leading to new  
>>>>>> challenges
>>>>>> that ultimately had to be reversed. A great example can be seen in
>>>>>> the “
>>>>>> tragic mistake from npm creators ” [1] which was to launch without a
>>>>>> namespace concept. Eventually, NPM started running out of useful  
>>>>>> names
>>>>>> and
>>>>>> had to backtrack to introduce “scopes” which is really just a
>>>>>> namespace
>>>>>> [2]. The real problem here is that the major change in namespace was
>>>>>> backed in after several years of momentum without it. It’s taken a
>>>>>> long
>>>>>> time for tooling and best practice to catch up to scopes and in the
>>>>>> interim, people have been left with a dual mode, some namespaced,  
>>>>>> some
>>>>>> not
>>>>>> namespaced situation that has created chaos. [3]
>>>>>>
>>>>>> The real issue at hand here as we consider behaviors in the jigsaw
>>>>>> automodule revolves around two well studied concepts.
>>>>>>
>>>>>> The most important is the “Default effect” [3] which states that
>>>>>> whatever
>>>>>> the default behavior is will become the most prominent best  
>>>>>> practice.
>>>>>> A
>>>>>> default that uses a filename to generate a very short, un-namespaced
>>>>>> module id effectively sets the behavior to create generic names that
>>>>>> will
>>>>>> eventually conflict...exactly what we’ve seen in npm.
>>>>>>
>>>>>> Additionally, The switching costs introduced in overcoming a default
>>>>>> un-namespaced module id to one with a unique namespace is also
>>>>>> significant
>>>>>> once you consider all the potential users. This is why API change is
>>>>>> hard,
>>>>>> and changing the module id after the fact from the default is
>>>>>> effectively
>>>>>> an API change.
>>>>>>
>>>>>> The second principal at hand is the “Principle of least  
>>>>>> astonishment”.
>>>>>> We
>>>>>> want to find a default that doesn’t violate what most users would
>>>>>> consider
>>>>>> to be the most obvious. One could argue the current auto module
>>>>>> algorithm
>>>>>> doesn’t violate this principle, but it’s important to consider
>>>>>> alternate
>>>>>> suggestions in this light.
>>>>>>
>>>>>> First, lets explore the potential downsides if the default effect
>>>>>> takes
>>>>>> hold with the currently generated auto module id. In Apache Maven,  
>>>>>> the
>>>>>> artifact id is the part of the coordinate that generates the  
>>>>>> filename.
>>>>>> This means that com.somecompany:artifact:version will become
>>>>>> artifact-version.jar, which would result in automodule id  
>>>>>> “artifact”.
>>>>>> Armed with this understanding, that does an analysis of the Maven
>>>>>> ecosystem have to say about potential conflicts in the automodule  
>>>>>> id?
>>>>>>
>>>>>> If we ignore the groupid and version of all the components in the
>>>>>> Maven
>>>>>> Central repository, we end up with over 13,500 (7% of the total
>>>>>> group:artifact combinations) conflicts. This does not consider
>>>>>> conflicts
>>>>>> across other repositories, or within customer portfolios yet it is
>>>>>> pretty
>>>>>> telling. Conflicts will happen. In some cases, the number of  
>>>>>> conflicts
>>>>>> on
>>>>>> the same common names is well above 100. The list of conflicts as of
>>>>>> October, 2016 can be seen here. [6]
>>>>>>
>>>>>> At this point, hopefully we’ve made the case for at least
>>>>>> establishing a
>>>>>> default module id that
>>>>>> 1. Uses namespaces to minimizes id conflicts when possible
>>>>>> 2. Leverages the default effect to create a de facto best practice
>>>>>> 3. Follows the principle of least astonishment
>>>>>>
>>>>>> We have two potential proposals that solve these goals.
>>>>>>
>>>>>> Proposal 1: Leverage existing coordinates when available.
>>>>>>
>>>>>> Maven is inarguably the most popular build system for Java  
>>>>>> components,
>>>>>> with Maven Central being the default and largest repository of Java
>>>>>> components in the world. By default, every jar built by Maven
>>>>>> automatically gets a simple properties file inserted into it with  
>>>>>> its
>>>>>> unique coordinates. Now, not every jar in Central was built with
>>>>>> Maven,
>>>>>> however 94% of them were, as we can find the pom.properties file in
>>>>>> 1,806,023 of the 1,913,561 central components . Talk about the  
>>>>>> default
>>>>>> effect in action!
>>>>>>
>>>>>> It’s further important to recognize that given a jar with a
>>>>>> pom.properties
>>>>>> declaring coordinates, it means that the project itself has chosen
>>>>>> those
>>>>>> coordinates as their own name. In other words, this is how they  
>>>>>> refer
>>>>>> to
>>>>>> themselves, even if other consumers may not be using Maven directly.
>>>>>>
>>>>>> If automodule were able to peek inside a jar and generate the  
>>>>>> default
>>>>>> id
>>>>>> using the groupid and artifactid present in the file, this would
>>>>>> nearly
>>>>>> eliminate all instances of id conflict because a significant portion
>>>>>> of
>>>>>> the Java ecosystem is in fact built with Maven. Additionally, the  
>>>>>> fact
>>>>>> that 1.8 million (and counting) modules would have namespace as the
>>>>>> default behavior means we’ve taken a huge step in setting the best
>>>>>> practice of picking module ids with a namepace. Additionally, since
>>>>>> the
>>>>>> project itself has chosen these coordinates and uses them as their
>>>>>> primary
>>>>>> distribution mechanism, this follows the principle of least
>>>>>> astonishment
>>>>>> to consumers regardless of their chosen build system. Finally, since
>>>>>> all
>>>>>> of the above are true, it’s unlikely the project would need to  
>>>>>> migrate
>>>>>> to
>>>>>> a new module id when they adopt jigsaw natively, thus avoiding an  
>>>>>> API
>>>>>> switching cost for their users.
>>>>>>
>>>>>> Proposal 2: Drop automodules
>>>>>> Right now Jigsaw tries to calculate a module name solely based on  
>>>>>> the
>>>>>> name
>>>>>> of the jar file, which now already causes issues. Besides the fact
>>>>>> that
>>>>>> the module name is not guaranteed unique compared with its Maven
>>>>>> coordinate, there are extra transformations which makes it even less
>>>>>> guaranteed that it is unique; e.g. dashes are replaced by dots  
>>>>>> (which
>>>>>> are
>>>>>> both valid artifactId characters), in some cases the number and  
>>>>>> their
>>>>>> following characters are stripped off. For artifacts like
>>>>>> jboss-servlet-api_4.0_spec it makes sense, however we already see
>>>>>> issues
>>>>>> here where commons-lang, commons-lang2 and commons-lang3 get the  
>>>>>> same
>>>>>> module name,
>>>>>> even though they have different artifactIds and contain different
>>>>>> packages. Choosing different artifactIds and packages was a very  
>>>>>> wise
>>>>>> decision because it made it possible that these jars could live next
>>>>>> to
>>>>>> each other. Removing that separation by the authors is a very unwise
>>>>>> decision.
>>>>>>
>>>>>> Another known example is the jsrNNN jars, which now all get jsr as  
>>>>>> the
>>>>>> module name.
>>>>>>
>>>>>> Is it highly unlikely there is one single rule to capture all the  
>>>>>> use
>>>>>> cases and which always result in a module name we can work with.
>>>>>>
>>>>>> For that reason the other proposal is to simply drop automodules.
>>>>>> Don’t
>>>>>> try to come up with a name for unnamed jars. It might look like the
>>>>>> feature of automodules makes migrating easier because every  
>>>>>> dependency
>>>>>> will get a name so can complete your module-info for all  
>>>>>> requirements,
>>>>>> but
>>>>>> we expect that once Jigsaw comes to speed the invalid module names  
>>>>>> are
>>>>>> actually blocking further development due to name collisions or  
>>>>>> forced
>>>>>> renaming by transitive modular jars.
>>>>>>
>>>>>> The advantage of this proposal is that library builders are not  
>>>>>> forced
>>>>>> to
>>>>>> keep the proposed module name in order to maintain backwards
>>>>>> compatibility
>>>>>> with the default.. Instead library builders can pick a more suitable
>>>>>> module name. The modular system doesn’t allow the same package to be
>>>>>> exported by multiple jars (and automodules exports every package).
>>>>>> Library
>>>>>> builders can fix this is their new jars, however if end users would
>>>>>> require both jars because they were specified as requirements in
>>>>>> different
>>>>>> transitive jars, you cannot compile this project. There’s just no
>>>>>> dependency-excludes like Maven has, because “requires” in the
>>>>>> module-info
>>>>>> really means requires. Dropping automodules will prevent these kind  
>>>>>> of
>>>>>> issues, because a package can only be exported by a named module.
>>>>>>
>>>>>> Sure, this means that for end users they cannot refer to every jar  
>>>>>> in
>>>>>> their module-info. But at least if they add a “requires” to their
>>>>>> module-info, they can ensure that it’ll always refer to the intended
>>>>>> modular jar. With build tools like Maven the chance of missing
>>>>>> artifacts
>>>>>> on the classpath has already been reduced a lot. In general builds
>>>>>> have
>>>>>> become quite stable, so we don’t expect that developers will  
>>>>>> translate
>>>>>> all
>>>>>> dependencies to the module-info file, especially if we warn them  
>>>>>> about
>>>>>> the
>>>>>> possible consequences of depending on automodules. Only referring to
>>>>>> named
>>>>>> modules and even a single “requires” is already a gain. There’s no
>>>>>> reason
>>>>>> to try to speed this up and give the developer the false impression
>>>>>> that
>>>>>> it’ll keep working when upgrading to real modular jars. Focus should
>>>>>> be
>>>>>> on
>>>>>> the target, not on the path how to reach it.
>>>>>>
>>>>>> Dropping the automodules will prevent a lot of discussions about  
>>>>>> what
>>>>>> is
>>>>>> the correct way to select a module name and will give the
>>>>>> responsibility
>>>>>> for the name back to the place where it belongs: the developer.
>>>>>>
>>>>>> [1]
>>>>>> http://stackoverflow.com/questions/22053381/lack-of-available-module-names-on-npm
>>>>>> [2]
>>>>>> http://blog.npmjs.org/post/116936804365/solving-npms-hard-problem-naming-packages
>>>>>> [3] The fact that so much of the npm ecosystem is effectively
>>>>>> not-namespaced is has actually
>>>>>> created potential build time malware injection possibilities. If I
>>>>>> know
>>>>>> of
>>>>>> a package in use by a
>>>>>> company through log analysis, bug report analysis etc, I could
>>>>>> potentially
>>>>>> go register the same
>>>>>> name in the default repo with a very high semver and know that it’s
>>>>>> very
>>>>>> likely this would be
>>>>>> picked up over the intended internally developed module because
>>>>>> there’s
>>>>>> no
>>>>>> namespace.
>>>>>> [4] https://en.wikipedia.org/wiki/Default_effect_(psychology)
>>>>>> [5] https://en.wikipedia.org/wiki/Principle_of_least_astonishment
>>>>>> [6]
>>>>>> https://docs.google.com/spreadsheets/d/1TVR5uTpDYw0827AlvPRu8l95zHnFPL_g61TdPtnj
>>>>>> Q5M/edit?usp=sharing
>>>>>> [7] http://openjdk.java.net/jeps/261 #Risk and assumptions
>>>>>> [8]
>>>> >>
>> >>  
>> https://www.mail-archive.com/jigsaw-dev@openjdk.java.net/msg06623.html


More information about the jpms-spec-observers mailing list