Bryan's comments

Wed May 23 12:52:47 PDT 2007

Stanley M. Ho wrote:
> Hi Bryan,
>
> Bryan Atsatt wrote:
>>> "mandatory" would mean something that we can check and enforce. Since
>>> version is declared using annotation in a superpackage, I don't think we
>>> can enforce it at build time. That said, we can still enforce it at
>>> deployment time if this is what you meant.
>>
>> Yes, the 277 spec should describe it as mandatory, and we should enforce
>> it at runtime. If it becomes possible to enforce it a compile time, we
>> would do so.
>
> Based on the updated specification, the module system will provide a
> default version for the module definition which has no version declared
> in the metadata, so each module definition will always have a version.
> Is there any strong reason why we should reject all the module
> definitions which have no version declared in the metadata rather than
> giving them a default version number?

It is more philosophical than technical: if developers are forced to
declare it, they must be conscious of the implications of that choice.

>
>>>> 2.7.4
>>>>
>>>> There may well be cases in which a module wants to re-export a
>>>> subset of
>>>> imported classes/resources. We should consider supporting this case.
>>>
>>> Could you describe the actual use cases for this?
>>
>> Package a.b contains and exports classes C and D.
>> Package x.y imports a.b, but only wants to re-export C.
>
> I have the same concern as Richard. Unless there are real use cases for
> this, I suggest we should go with re-export at the module granularity as
> we discussed before.
>
>>>> Refactoring in this way results in a perhaps unexpected runtime/memory
>>>> overhead. A pure wrapper module that re-exports must have its own class
>>>> loader, and, even though it won't define any classes, the VM *cache*
>>>> will be updated as that loader is used.
>
> If there is no support for re-exporting subset of imported
> classes/resources, this becomes a non-issue.

Sorry, but this doesn't follow. The VM class cache is updated (at least)
when:

loader.defineClass() is called.
VM calls loader.internalLoadClass()
Class.forName() is called.

The cache records that the initiating loader returned a specific class,
regardless of what loader defined it. This enables
loader.findLoadedClass() to work as expected.

Given that:

a. Module has-a single ClassLoader, and
b. A specific Module re-exports classes from an imported Module.

we have two scenarios to consider:

1. Module has both member and re-exported classes. The loader will both
define member classes and the VM cache will be updated with mappings
from Module loader to imported-module-Class instances.

2. Module has no member classes, only re-exported classes. The cache
will be updated with mappings from Module loader to
imported-module-Class instances.

In the latter case, if there is only a single imported module, an
optimization could be that this view module returns the imported
module's loader rather than having it's own.

But this is not possible when there are imports from more than one
module: a unique loader instance must be created to satisfy the api
contract module.getClassLoader(). Even though it will never define any
classes, the VM will still update the cache.

This latter case is what I am referring to as overhead. Of course, with
appropriate updates to the VM/caching behavior, this overhead could be
eliminated. But this would have to be addressed or the overhead *will*
exist.

>
>> I think we need to think hard about this issue. The OSGi model of import
>> by *package name* decouples the importer from any explicit binding to a
>> bundle/module name. Refactoring under that model is *much* cleaner, and
>> far more natural. As is the usage model. After all, Foo.java import
>> statements contain package/class names, *not* module names. Programmers
>> think in terms of classes and packages.
>>
>> Peter makes this point pretty strongly, and I have to say I agree
>> wholeheartedly:
>>
>> http://www.aqute.biz/Blog/2006-04-29
>
> I agreed that in some situations it is much better to have dependency
> that is loosely coupled. You may want to check out the service-providers
> strawman that I just sent out, and it deals with the exact issue around
> API vs implementation.

That's great, but sort of beside the point :^).

I am specifically referring to the syntax and semantics of "import"
declarations. At the moment, 277 supports only the "Reguire Bundle"
style semantics defined by OSGi. This model is is inherently
tightly-coupled, and its use is greatly discouraged in the OSGi world.
It was not even present in the initial releases.

I strongly believe that it would be a huge mistake for 277 to support
only this model. If we want to simplify things and support only one
model, then we should choose import by package name/version, *not*
import by module name/version.

>
>>>> 5.6
>>>>
>> Sure, for the manifest of a .jam file. But the statement:
>>
>> "All packages defined in a module definition are inherently sealed, and
>> this entry is ignored by the module system."
>>
>> is pretty broad, and would seem to indicate that it applies to the
>> definitions produced by any module system.
>
> I will see if I can clarify it.
>
>>>> 6.2.3/6.3
>>>>
>>>> For EE and other similar systems, it may be useful to have different
>>>> VisibilityPolicy instances per Repository. We may want to have a
>>>> getter/setter here, with the default implementation of get returning
>>>> the
>>>> default policy.
>>>
>>> Could you describe the use cases you had in mind for this?
>>
>> It creates a nice way to create wrapper repository instances that
>> provide a customized view...
>>
>> EE systems are required to isolate applications from each other. And
>> each may have very different external dependencies. If each repository
>> instance can have its own VisibilityPolicy, then a wrapper Repository
>> can be constructed for each application, using a different policy.
>>
>> Same goes for ImportOverridePolicy.
>
> Interesting. I will look into this further.
>
>>>> 6.2.6
>>
>> I'm just considering the simple case of deploying a new module to a
>> LocalRepository. I think it will be a bit surprising if some special act
>> is required to make it available. For example, this would fail:
>>
>>   localRepos.install(aModuleNamedSue);
>>   ModuleDefinition sue = localRepos.find("sue");
>>
>> And it is pretty awkward to insert a call to reload() here. Given that
>> the primary client of Repository is ModuleSystem, I guess the
>> preparation logic can call reload() and re-try IFF a suitable definition
>> cannot be found.
>
> I see your point. The good news is that the updated specification does
> not require "reload" to be invoked in your example because a module
> definition would be available for subsequence searches after it has been
> installed, see Section 6.2.4 in the specification. I hope this would
> address your concern.
>
>>>> 6.5.1
>>
>> Yes, but LocalRepository is a specific implementation, and one that
>> requires an expansion directory for each instance, right? We should make
>> that clear.
>
> No. If two LocalRepository instances load module definitions from the
> same source location, the implementation may share the same expansion
> directory, or maybe not. I think the specification should leave this up
> to the implementors, and I don't see good reason to dictate one
> implementation approach over the other.

Ah. You are thinking that two different JVM vendors might implement this
differently. Fair enough. But this means that the actual location of the
expansion directory must also be defined by the JVM.

But LocalRepository instances will likely need to be created by IDEs,
and other environments. And they will want to control the location of
the expansion directory. So I am suggesting that LocalRepository needs
an appropriate ctor to enable the caller to specify the location of the
expansion directory.

I think it makes much more sense to assume that LocalRepository (and
URLRepostory) has a constant, well-defined model. JVM vendors are free
to create variant classes, and to instantiate those by default.

>
>>>> 7.1.1/7.1.2
>>>>
>> Thinking about this a bit more, the idea of a "default" module system is
>> a bit odd. Certainly there will be one module system used by the JRE
>> itself, and this is what you were thinking about here. And clearly
>> LocalRepository and URLRepository are hard-wired to this same module
>> system, so it makes some sense to be able to say in the javadoc for
>> these classes that they use the "default" module (which needs to be
>> added, btw). Do you have any other uses in mind? Perhaps we should be
>> more explicit:
>>
>>    public static ModuleSystem getJREModuleSystem();
>>
>> Regardless, what I was really thinking about before is ModuleSystem
>> initialization.
>>
>> ModuleDefinition has a getModuleSystem() method, but how is it
>> implemented? Our model so far appears to assume that ModuleSystem
>> instances will be shared, which is certainly reasonable, but... how? One
>> simple model is that it is module system dependent, e.g. each module
>> system implementation provides a singleton or equivalent. But this just
>> pushes the problem up a layer, since the runtime type of
>> ModuleDefinition will vary based on module system.
>
> As I mentioned previously, the APIs are still evolving significantly,
> and they do not fully reflect what I have in mind. More specifically,
> I've been planning to make ModuleSystem a service so its providers can
> be discovered through the java.util.ServiceLoader API. (Now the service
> providers strawman is available, it should help our discussion.) This
> will make the static method above go away, and the repository
> implementations can simply make use of the ServiceLoader API to discover
> the specific ModuleSystem provider they need.

Perfect.

>
>> I see that you added a ctor arg/getModuleSystem() method to Repository
>> as well. In this model, the type problem is pushed up one more layer. It
>> is the Repository implementation that must know the ModuleDefinition
>> type, and therefore the ModuleSystem type.
>>
>> But we need to specify exactly how the initial repositories are
>> configured/initialized, using the correct types, so that the JRE can
>> instantiate them. (I took a cut at this in the class loading doc, if you
>> recall.) This should be added to Appendix E--even if some of the details
>> are JVM vendor specific, the requirements must at least be defined.
>
> I expect the repository configuration in the default implementation will
> be documented as part of the JDK documents, similar to what has been
> done to the security policy and other configuration files in the JDK:
>
> http://java.sun.com/javase/6/docs/technotes/guides/security/PolicyFiles.html
>
>
> This way, different JRE implementations are allowed to innovate and we
> don't have to put these implementation-specific details into the
> specification.

Ok, but the spec should at least say that. And there are some
assumptions we are making about this mechanism, particularly that it
will be possible to specify the runtime type of the repository classes.
Shouldn't we say something about that as well?

>
>> But... didn't we decide long ago that a repository should not be
>> restricted to a single module system? With a multi-repository search
>> model, this is less of an issue, but why impose this limit? Composite
>> implementations might have a hard time implementing getModuleSystem().
>
> Right. Keep in mind that the APIs are still evolving. I expect this will
> be changed when the interop proposal is in place.
>
>> Perhaps we should consider a model in which we:
>>
>> 1. Require ModuleSystem implementations to have a globally unique name.
>> (There aren't likely to be hundreds of them, so this shouldn't be much
>> of a hardship!)
>>
>> 2. Have a persistent configuration mechanism for ModuleSystems, like the
>> one for the initial Repository instances. In addition to the runtime
>> types for ModuleSystem subclasses, the instances themselves will likely
>> need configuration, just like Repositories (logging, security,
>> import/visibility policies, etc.)
>>
>> 3. During startup, the JRE initializes the registered ModuleSystems,
>> then the registered Repositories.
>>
>> 4. Add to ModuleSystem:
>>
>>    public abstract String getName();
>>
>>    public static ModuleSystem getModuleSystem(String name);
>>    public static List<ModuleSystem> getModuleSystems();
>>
>> Now repository implementations are not involved in *instantiating*
>> ModuleSystems; they look them up by name.
>>
>> (and having a centralized lookup mechanism may enable other interesting
>> behaviors)
>
> If we make ModuleSystem a service type, you will be able to perform this
> kind of discovery easily using the ServiceLoader API, without requiring
> a different set of API to be created for similar purpose. There is also
> no need to have persistent configuration mechanism for ModuleSystem; if
> a custom module system implementation is deployed as a service-provider
> module into the system repository, you would be able to use the API to
> look it up.

Great.

>
>>>> 7.3.2.2
>>>
>>>> We should also say something about the protocol for resource URLs.
>>
>> Shouldn't we? Like whether there will be a new protocol for them, or not.
>
> My current thinking is that the protocol in the resource URLs would be
> entirely up to each repository implementation. Since there will be
> custom repository implementations, we can't foresee what their
> requirements are, so it would be difficult to define a standard protocol
> that is suitable across all possible repository implementations. If you
> think otherwise, let me know. I will clarify this in the next revision
> of the specification.

I have two concerns:

1. I have seen a number of systems that *assume* a specific protocol
(e.g. "jar"). They do this because they want to parse the url, either to
construct new urls or to glean information from them (such as the jar
path). This practice clearly limits implementors. The API contract of
getResource/s does not say anything about the protocol, which makes
perfect sense. But I think it should be even stronger, and say that
callers *must not* assume anything about the protocol.

2. The "jar" protocol as implemented by the JRE includes a jar cache.
This cache is problematic for two reasons:

a. It forces a second ZipFile to be opened (the one managed by the
cache). The repository will already have one open, so using this
protocol is very inefficient.

b. The lifecycle of the jars in the cache cannot be controlled by the
repository implementor. On Windows, files cannot be deleted if they are
open, so undeploy/uninstall can fail.

So either we should clean up the jar protocol to eliminate these issues,
or Repository implementations should be encouraged *not* to use it.
Which makes #1 even more important.

>
> - Stanley
>