Discussion: #LazyConfigurationAndInstantiation
David M. Lloyd
david.lloyd at redhat.com
Wed Jul 13 17:39:42 UTC 2016
On 07/13/2016 09:31 AM, mark.reinhold at oracle.com wrote:
> Reference: http://openjdk.java.net/projects/jigsaw/spec/issues/#LazyConfigurationAndInstantiation
>
> 2016/3/11 13:55:56 -0800, david.lloyd at redhat.com:
>> On 03/11/2016 11:47 AM, mark.reinhold at oracle.com wrote:
>>> 2016/3/2 19:22 -0800, david.lloyd at redhat.com:
>>>> It looks as though the instantiation of a Layer causes a complete load
>>>> of all the modules in the layer.
>>>
>>> What do you mean by "complete load"? Loading all the classes?
>>
>> No I mean locating and loading all of the descriptors and building all
>> the wiring information for every module.
>
> Yes, that's what happens. Resolving a Configuration requires reading all
> of the relevant module descriptors. Instantiating a Configuration into a
> Layer builds a complete run-time module graph. Both of these operations
> are linear in the number of modules in the graph.
>
>>>> Apart from the consequent lack of
>>>> dynamicism, I think this is very likely to cause problems for large
>>>> applications, particularly if such an application might have many
>>>> operation modes, some of which do not require *all* of the bundled
>>>> modules to be loaded every single time.
>>>>
>>>> Can there not instead be an incremental resolution algorithm, akin to
>>>> how classes are lazily loaded?
>>>
>>> Configuring a set of modules and instantiating that configuration as a
>>> layer requires no more than reading the modules' descriptors. Nothing
>>> else from any module definition will be read until it's actually needed.
>>
>> But it does require that all the module descriptors from a given layer
>> be available, and that the load time for the first load of a module in a
>> layer will always be bounded by the size of the layer, rather than just
>> by the dependency subgraph of the module being loaded.
>
> Correct.
>
>> Based on
>> application server deployments that I know about, I think the far upper
>> bound for a realistic number of modules in a layer will probably lie in
>> the thousands to ten thousands range (though there is always the outlier
>> case where someone has to push it to see how far it goes...), which
>> might translate into a substantial startup overhead.
>
> In our experience with the prototype so far, the time required to resolve
> a Configuration and instantiate it as a Layer is dominated by the time
> required to locate and read module descriptors, typically from inside
> individual artifacts in the filesystem.
>
> One fairly straightforward way to speed that up is to link your modules
> into a custom run-time image, so that the descriptors are all in one
> optimized artifact [1]. If that's not feasible then you could achieve
> much the same effect in the build, installation, or startup process of a
> large application: Construct an optimized cache of the descriptors of all
> your modules, and use a custom ModuleFinder to load the descriptors from
> the cache but other module content from more-traditional artifacts.
>> Another potenti
>> issue is requiring that all modules be present, though this is more
>> closely related to #MutableConfigurations now I think; I suspect this
>> issue could be mitigated with an acceptable solution to that issue.
>
> As I wrote in my reply re. #MutableConfigurations [2], I think this
> approach is at odds with our goal to provide reliable configuration.
I disagree. "Reliable" is only defined in the agreed-upon requirements
by stipulating that modules may have dependence relationships which are
declared. I interpreted that goal simply as an explicit rejection of
various poorly-defined customized class loader behaviors, and a move
towards clearer and more predictable behavior. By my interpretation and
experience, I consider, for example, the ability to change the explicit
dependence relationships to still be "reliable" as long as the effects
of doing so are well-defined, just as I consider the behavior of lazy
class loading to be "reliable" in that it is well-defined and
predictable, and has rules which make sense (referring to the way that
classes are loaded, resolved, and initialized in separate phases, which
allows for lazy on-demand-only progression through those phases but also
allows for almost completely arbitrary interconnection of classes, while
remaining basically predictable).
The first time that "reliable" is specified in terms of concrete
behavior is in the SOTMS document, and these issues are being raised
exactly against the state of the module system, so I think that this
issue as well as #MutableConfigurations serve to directly challenge the
validity of the definition of "reliable" which is used by the proposed
implementation, rather than being invalid due to the presumed validity
of that definition (which was not agreed upon by the EG).
Even the JSR definition (which I acknowledge is quite out of date at
this point) states that OSGi does address the problem of "reliable"
configuration, despite the fact that the OSGi solution allows for
dynamic loading and relinking. This is contrary to the definition in
the SOTMS, and also despite the fact that Jigsaw's more strict
interpretation of this actually invalidates those behaviors of OSGi,
were OSGi to try to merge the bundle concept with the Jigsaw module
concept somehow.
>>> If you really want to avoid configuring all modules in certain operation
>>> modes, do layers not provide sufficient flexibility? Load the core of
>>> your application into the boot layer, figure out which modules you need
>>> for the requested operation mode, and then create an additional layer to
>>> load those modules.
>>
>> That can work in some cases, but only if things are loaded one
>> additional time (rather than on-demand as configuration changes, for
>> example) or in such a way that you could just keep adding layers. But
>> in the event of one large intermeshed layer, this won't work without
>> #MutableConfigurations and/or #NonHierarchicalLayers and something to
>> coordinate the load.
>
> Suppose that we allow #NonHierarchicalLayers [3]. Are there realistic
> use cases in which layers would not then provide sufficient flexibility?
I'm not sure that it will (in all cases). In our current system, a
module isn't loaded until it is referenced, and it is not linked until
it is used, like the aforementioned class linking rules. Even if we
have separate (non-hierarchical) layers for every single module, because
of the way modules are defined in Jigsaw, I think, given module graph G,
we'd still have to aggressively traverse all of G in order to load any
module that is a part of G, even if nodes in that graph are never
actually referenced at run time, which puts us back at square one.
--
- DML
More information about the jpms-spec-experts
mailing list