[External] : Re: Inconsistency with service loading by layer or by class loader

Wed Dec 18 15:09:29 UTC 2024

On Wed, Dec 18, 2024 at 3:55 AM Alan Bateman <alan.bateman at oracle.com>
wrote:

> On 17/12/2024 17:21, David Lloyd wrote:
>
> :
>
> I was using it as more of an example about how a thing may be possible and
> allowed by the platform, and thus is achievable, yet not specifically
> presented with a convenient API. That said, I've opened
> https://bugs.openjdk.org/browse/JDK-8346439 as a way to continue the
> discussion, framed as a specific feature request which covers what we need,
> and it does in fact include an `addUses` method.
>
>
> We decided in 2017 to not add these methods. I've trying to see if there
> is any new insight that would motivate adding these methods now.
>
> ML.Controller::addUses would be a no-op for an automatic module so this
> method will only add a "uses" edge for an explicit module.
>
> If the module has been compiled with references to the service class and
> is calling ServiceLoader.load with that service class then its module
> descriptor should have the appropriate `uses` in the module-info already.
> Has the module author neglected to add this, didn't test, and the
> ML.Controller method will be used to fix this?
>

No. Since we are late-binding all modules, every module we would load would
start with no `requires`, and we use `addReads` on the controller to wire
in the dependencies when the module is lazily linked. This means that any
`uses` declarations present on the descriptor which refer to packages not
found within the module itself will trigger a validation error, so we must
strip them out as well. In my prototype, I have to generate a method stub
in the target module to call `addUses` for this purpose; so, it is already
possible for me to do this but it would be nice to be able to do it
non-stupidly.

This lazy-linking design already seems to work very well, with reasonably
fast startup and correct linking, if you ignore service loading. Right now
I have to force service-providing *and* service-using modules to be unnamed
for services to load (albeit incompletely, since again the `provides`
method would not work in this case).

The other scenario, and the motivation for Module::addUses, is where the
> service is not known at compile-time, maybe code in the module is doing
> service loading on behalf of another module. In that case, code in the
> module itself should be calling Module::addUses method to add the transient
> `uses` edge. Maybe the module author is not calling Module:;addUses and the
> ML.Controller method will be used to fix that?
>

That is correct, as far as it goes. But only because we have to define the
modules with descriptors that do not include the `uses`.

ML.Controller::addProvides is also puzzling. A service provider module can
> only be compiled if the provider class is in the module and the service
> class is accessible to it. Has the module author neglected to add the
> `provides` and the ML.Controller method will be used to fix this?
>

It's the same thing. The module descriptor can only reference classes found
within packages that in the module itself or a dependency, or else
validation will fail. Since we have no `requires` (because we cannot have
eager graph resolution), the set of packages is reduced to only those of
the module itself. Thus this mechanism cannot be used to declare a service
provider if the provider API exists outside of the module (this is the
common case AFAICT).

> Or maybe this is about instrumentation or code generation where the
> container adds a provider implementation to the module? In that case, why
> didn't the container augment the module-info at the same time? Maybe the
> code generation to add the provider implementation happens after the module
> has been loaded?
>

The point of all of this is to define modules that are bound late, so that
we can continue to:

- Resolve only the parts of the module graph that are actually used at run
time
- Resolve modules only when they are used
- Have short or long cycles in the module dependency graph
- Have multiple versions of a given module in the module dependency graph
- Isolate modules from each other so that each module "sees" only the base
layer (well, ideally, only `java.base`, but that isn't possible AFAIK) and
its own dependency set (which may include other modules from the base layer
as well as modules from sibling layers)
- Dynamically add more modules to the graph during run time (and remove
them too, at least if they exist in islands that can be safely unloaded)
- Ensure that upon loading/usage, each module is correct from a
local/relative point of view, rather than a global point of view, much like
classes

Finally, just to say that your prototype addProvides doesn't specify any
> validation. It looks like it can be used to add any random class and
> implementation class. If a method were to be added then it would minimally
> need to check that the implementation class is in the module and that it
> extends the service class.
>

Yes and no. (I assume you imply that if the implementation provides a
`provider` method, then it's the method return type that would need to be
checked).

Firstly, the services are actually internally registered by class *name*
rather than class object, which seems weaker than necessary (maybe to avoid
a strong class reference?) and might allow any validation to be tricked or
bypassed somehow: there's only an imperfect guarantee that the service and
provider will actually be the *same* as what was registered. This seems to
undermine any validation, though we could just do it anyway I suppose.

Secondly, any layer-per-module architecture must be able to define
providers outside of their own module, otherwise there is no way to find
these providers. When loading a service from a module, unless the module
shares a layer with its implementations, the module itself must be told
where its providers are so that service loading can be done by layer. An
alternative would be to allow a module *layer*'s provider set to be
manipulated instead (which is essentially what this method does in effect
anyway - basically, just drop the `module` argument), which would be an OK
alternative from our POV; it would just be a bit oddly asymmetrical with
respect to `addUses` then. But that might be a good way to satisfy the
"letter of the law".

I tend to believe that the principles behind the restriction of requiring
service implementations to live in the same module as the `provides`
declaration are really only applicable to static, JDK-managed application
layers (those on the boot path, or those created via e.g.
`java.lang.module.ModuleFinder#of(Path...)`). Our app server module system,
which was conceived back in 2010 when Java 6 was the latest Java, relies on
the ability to dynamically resolve `META-INF/services` files so that every
module (which each has its own isolated class loader) has its own view of
what service providers were available for *all* other modules, which works
very well. Resolving and binding whole groups of modules at once to
determine the service graph is essentially infeasible for architectures
like this. This is somewhat analogous to supporting e.g. `URLClassLoader`
for simple applications versus non-hierarchical class loading in advanced
application containers.

-- 
- DML • he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jigsaw-dev/attachments/20241218/92140b00/attachment-0001.htm>