Module isolation

Wed Jun 20 16:11:40 PDT 2007

Stanley M. Ho wrote:
> Hi Bryan,
>
> Sorry, I was sidetracked in the last two days, will follow up other
> threads soon.
>
> Bryan Atsatt wrote:
>> This use case seems to presume that the IDE can/will ensure that there
>> is only *one* consumer of the module: itself. What if a different plugin
>> has a dependency on it, and has already been resolved? Is the intention
>> that the IDE will be wired into the module system deeply enough to
>> manage this correctly?
>
> It won't affect existing plugins that has also been initialized with the
> dependency. My assumption is that the IDE should know what it's doing
> when using the APIs.
>
>> An application server, for example, will need to keep one application
>> from accessing the modules of another. And it must *know* that there is
>> no sharing, so that the application lifecycles can remain independent.
>>
>> So we need some notion of a context. A purely private repository
>> instance is one (probably good) possibility. Another is the wrapper
>> Repository approach, but this requires definition copies (and management
>> of sharing content, lifecycle, etc).
>
> The way I see it is that the repository is the context that you want to
> use for isolation. The APIs allow developers only to walk up the parent
> repository instance from a child repository instance, but not vice
> versa. If the repository instance is only used in a specific domain
> (e.g. webapp) and the repository instance is hidden from other
> applications (e.g. other webapps), this would effectively hide the
> modules in that repository and the repository could be considered
> private. Whether such repository is constructed using the wrapper
> approach or reusing existing ModuleDefinitionContent is an
> implementation detail. Do you agree?

Yes, but it is a very critical detail! The wrapper approach, if
required, drags in the copy, concurrency and lifecycle issues; all of
which need solutions. I am groping here for a model in which none of
this is required under normal circumstances.

I think the purely private repository model works fine, where the
definitions and module instances are also private.

And, as I said in the app server thread, using the private repository
approach for web-modules literally means a repository containing a
*single definition*. This may be an acceptable solution, but I want to
ensure that everyone is aware of this implication.

>> Regardless of what approach we take, the releaseModule() idea is too
>> simplistic. Having originally created the detach() method, thinking
>> along similar lines as you are with the plugin case, I do understand the
>> idea; I just no longer think it is sufficient :^).
>>
>> The only "safe" time to release a module is when there are *zero*
>> importers, and, even then, you must hide/release atomically, ensuring
>> that no new imports spring up during the operation.
>
> Releasing a module simply means that its reference is released from the
> module system's cache; it does not affect any existing importer. Also,
> hide and release does not need to happen atomically - I think hiding it
> first would be sufficient to prevent new imports to be resolved, but
> I'll need to double check it with the RI.

I understand what release does :^). The issue is not the release itself,
but the potential subsequent instantiation of a Module from that same
definition. This instance is now a duplicate, with duplicate classes,
which can easily lead to failures: the original instance may still be in
use.

>
> Note that I'm not arguing the releaseModule() approach is perfect, but I
> do think this is a use case that we can ignore; I would welcome better
> suggestion to handle this.
>
>> Other than Module instances, what other "runtime information" is there
>> to keep track of? Caches of exported packages?
>
> The Module instances that have been initialized, the ModuleDefinitions
> that are being instantiated and the corresponding Module instances that
> are being initialized (could be triggered simultaneously from multiple
> threads), and the ModuleDefinitions that have been disabled (e.g. the
> repository has been shutdown and no new Module instance should be created).

Right. But only the first of these is long lived, the others are
transient, needed only during resolution. So again, if releaseModule()
either did not exist, *or* if we did not keep references to released
instances, a simple field cache could be used. Just trying to see
through some of the fog here :^)

>
>> For example, if we were to eliminate the releaseModule() method (in
>> favor of some more complete mechanism), then there really is always a
>> 1:1 for Module:ModuleDefinition, and the model is simple and obvious.
>> (And therefore a field cache *could* be used).
>
> To keep this discussion focus, I think we could assume
> Module:ModuleDefinition is 1:1 for now. releaseModule() is just a
> special case. That said, even if the relationship is 1:1, it still
> doesn't mean we *should* put the state in ModuleDefinition, see above. ;-)

Sure. (OTOH, it *would* make ModuleSystem a bit simpler by eliminating
the need for a silly mapping, and would bake the 1:1 model into the
design. And stashing a field like this isn't much different than an
immutable String having a hash field that is lazily stored. But clearly
if you want to track released/stopped instances, this becomes more
involved! I'm going to shut up about this aspect now ;^)

>
>> So, in effect, we have 100% *private repository* instances.
>>
>> I've been thinking that we need an intermediate somewhere between a
>> shared/public repo instance and an entirely private one, but... that now
>> strikes me as too fuzzy, and I can't see a real use case :^)
>>
>> So an application server would have to create, say, a private
>> LocalRepository instance to hold the modules of a single application.
>> And it would have to ensure that no other application could get it's
>> grubby paws on that repository instance.
>>
>> Ok. That works for me. And it eliminates the need for cloning AND for
>> releaseModule():
>>
>> 1. Any given Repository instance is either 100% shared or 100% private,
>> with *no* in-between.
>>
>> 2. The lifecycle of a shared repository instance is that of the process.
>>
>> 3. The lifecycle of a private repository instance is entirely up to the
>> creator of that instance.
>
> I agreed that distinguishing the repository instances for sharing or
> private usages is more easy to understand, but the notion of shared and
> private really depends on the usage context, and it's not an attribute
> of the repository itself. Suppose there is a repository with two child
> repositories; this repository would be considered "shared" from the
> perspective of the modules in the child repositories, but this
> repository might still be considered private from the other applications
> in the same JVM.
>
> I think your first three points can be combined as follows:
>
> 1. Any given Repository instance could be used for sharing or private
> purpose.
>
> 2. The lifetime of a repository instance is managed by its creator.

Yes, but this doesn't go far enough. For example, the JRE will create
the system repository, and, by this rule, could shut it down at any
time. Clearly this would cause major havoc. Unless we have a complete
lifecycle model for Modules, so that it is possible to ensure that no
active user exists, such "global" modules must live for the lifetime of
the process. Otherwise, we risk collision failures.

>
>> 4. The lifecycle of a ModuleDefinition/Module is at most that of the
>> enclosing Repository instance, and at least is bounded by
>> install/uninstall (no finer granularity).
>
> Yes.

But releaseModule() violates #4. Use of that method enables unlimited
numbers of Module instances from the same definition, during the
lifetime of the enclosing repository.

>
>> It does leave open the issue of dependencies *within* a private
>> repository. The simple model would be to treat the entire repository as
>> atomic, with any change requiring a new Repository instance. This is
>> probably too simplistic, however.
>>
>> In an EE app, web-modules are supposed to be isolated from each other
>> and from other parts of the app (ejb, connectors, etc.). So this
>> requires either a further partitioning of the app into multiple
>> repositories, or some form of access control.
>
> The web-modules should probably be in its own repository if isolation is
> needed. There is no access control for ModuleDefinitions - accessibility
> of a ModuleDefinition is the same as visibility of ModuleDefinition;
> this is also a very different issue. I think we should focus our
> discussion on using repository instance for isolation unless we find
> this approach insufficient.

Sure, again, as long as a single module repository is deemed an
acceptable solution.

>
>> Further, it is possible to re-start or re-deploy/re-start only a single
>> web-module, *without* restarting the rest of the app. The re-start case
>> could use releaseModule(), though a "real" stop method would be
>> preferable. But this is a very special case in which the specs
>> essentially dictate the possible dependencies between modules. In the
>> general case where the dependencies are not dictated, releaseModule() is
>> problematic.
>> The re-deploy case would use uninstall/install (but would still like
>> stop!).
>
> Whether the solution is stop() or releaseModule(), it still shows that
> we need to support the use case of long-lived repository with
> short-lived modules. Do you agree?

Sure.

>
>> If you *really* want the releaseModule() functionality, I would suggest
>> that we introduce a PrivateRepository type, and support release *only*
>> on that type.
>
> The notion of private or shared depends on the usage context of the
> repository instance, so it won't be appropriate to surface the concept
> at the API level.

If the semantics of releaseModule() limit its use to very special
circumstances, I think it would be appropriate to capture this in the API.

In the spec, you rule out the use of this method against any java.* or
bootstrap repository module.

But is it ever appropriate to call releaseModule() on a Module from the
system repository? Or any public, global repository?

I was simply exploring the possibility that this functionality only
makes sense on a private repository instance. But I don't care if we
generalize that to say that the creator should be able to control this
behavior.

If so, I'd like that to be manifest in the API somehow. I don't really
care how. As another possibility, why not expose this an attribute of
Repository, just as we do read-only status?

    public abstract boolean supportsRelease();

And then document ModuleSystem.releaseModule() as a no-op or failure
when the underlying repo returns false.

>
> Bryan, there are many topics in this thread, but I think what you really
> want is to discuss the notion of module isolation. As we discussed so
> far, I think we all agreed that repository is the appropriate context to
> make module isolation possible. Could you summarize any outstanding
> design concern you have with this approach in a paragraph or two?

My only remaining design concern (on the topic of isolation) is the
semantics of releaseModule(). As long as we nail down the *correct*
usage of this method, I'll be happy.

Since the current design does not provide a means to determine or
control use of released modules, class duplication failures and/or large
memory leaks remain a strong possibility. As I've said from the very
beginning, we have solved this problem in our application server by
choosing to disable the class loaders of "released" modules, and it has
proven extremely useful. I would like to see this supported here as an
optional behavior.

>
> Thanks,
> - Stanley
>