Discussion: #LazyConfigurationAndInstantiation

Wed Jul 13 17:39:42 UTC 2016

On 07/13/2016 09:31 AM, mark.reinhold at oracle.com wrote:
> Reference: http://openjdk.java.net/projects/jigsaw/spec/issues/#LazyConfigurationAndInstantiation
>
> 2016/3/11 13:55:56 -0800, david.lloyd at redhat.com:
>> On 03/11/2016 11:47 AM, mark.reinhold at oracle.com wrote:
>>> 2016/3/2 19:22 -0800, david.lloyd at redhat.com:
>>>> It looks as though the instantiation of a Layer causes a complete load
>>>> of all the modules in the layer.
>>>
>>> What do you mean by "complete load"?  Loading all the classes?
>>
>> No I mean locating and loading all of the descriptors and building all
>> the wiring information for every module.
>
> Yes, that's what happens.  Resolving a Configuration requires reading all
> of the relevant module descriptors.  Instantiating a Configuration into a
> Layer builds a complete run-time module graph.  Both of these operations
> are linear in the number of modules in the graph.
>
>>>>                                    Apart from the consequent lack of
>>>> dynamicism, I think this is very likely to cause problems for large
>>>> applications, particularly if such an application might have many
>>>> operation modes, some of which do not require *all* of the bundled
>>>> modules to be loaded every single time.
>>>>
>>>> Can there not instead be an incremental resolution algorithm, akin to
>>>> how classes are lazily loaded?
>>>
>>> Configuring a set of modules and instantiating that configuration as a
>>> layer requires no more than reading the modules' descriptors.  Nothing
>>> else from any module definition will be read until it's actually needed.
>>
>> But it does require that all the module descriptors from a given layer
>> be available, and that the load time for the first load of a module in a
>> layer will always be bounded by the size of the layer, rather than just
>> by the dependency subgraph of the module being loaded.
>
> Correct.
>
>>                                                          Based on
>> application server deployments that I know about, I think the far upper
>> bound for a realistic number of modules in a layer will probably lie in
>> the thousands to ten thousands range (though there is always the outlier
>> case where someone has to push it to see how far it goes...), which
>> might translate into a substantial startup overhead.
>
> In our experience with the prototype so far, the time required to resolve
> a Configuration and instantiate it as a Layer is dominated by the time
> required to locate and read module descriptors, typically from inside
> individual artifacts in the filesystem.
>
> One fairly straightforward way to speed that up is to link your modules
> into a custom run-time image, so that the descriptors are all in one
> optimized artifact [1].  If that's not feasible then you could achieve
> much the same effect in the build, installation, or startup process of a
> large application: Construct an optimized cache of the descriptors of all
> your modules, and use a custom ModuleFinder to load the descriptors from
> the cache but other module content from more-traditional artifacts.
>>                                                        Another potenti
>> issue is requiring that all modules be present, though this is more
>> closely related to #MutableConfigurations now I think; I suspect this
>> issue could be mitigated with an acceptable solution to that issue.
>
> As I wrote in my reply re. #MutableConfigurations [2], I think this
> approach is at odds with our goal to provide reliable configuration.

I disagree.  "Reliable" is only defined in the agreed-upon requirements 
by stipulating that modules may have dependence relationships which are 
declared.  I interpreted that goal simply as an explicit rejection of 
various poorly-defined customized class loader behaviors, and a move 
towards clearer and more predictable behavior.  By my interpretation and 
experience, I consider, for example, the ability to change the explicit 
dependence relationships to still be "reliable" as long as the effects 
of doing so are well-defined, just as I consider the behavior of lazy 
class loading to be "reliable" in that it is well-defined and 
predictable, and has rules which make sense (referring to the way that 
classes are loaded, resolved, and initialized in separate phases, which 
allows for lazy on-demand-only progression through those phases but also 
allows for almost completely arbitrary interconnection of classes, while 
remaining basically predictable).

The first time that "reliable" is specified in terms of concrete 
behavior is in the SOTMS document, and these issues are being raised 
exactly against the state of the module system, so I think that this 
issue as well as #MutableConfigurations serve to directly challenge the 
validity of the definition of "reliable" which is used by the proposed 
implementation, rather than being invalid due to the presumed validity 
of that definition (which was not agreed upon by the EG).

Even the JSR definition (which I acknowledge is quite out of date at 
this point) states that OSGi does address the problem of "reliable" 
configuration, despite the fact that the OSGi solution allows for 
dynamic loading and relinking.  This is contrary to the definition in 
the SOTMS, and also despite the fact that Jigsaw's more strict 
interpretation of this actually invalidates those behaviors of OSGi, 
were OSGi to try to merge the bundle concept with the Jigsaw module 
concept somehow.

>>> If you really want to avoid configuring all modules in certain operation
>>> modes, do layers not provide sufficient flexibility?  Load the core of
>>> your application into the boot layer, figure out which modules you need
>>> for the requested operation mode, and then create an additional layer to
>>> load those modules.
>>
>> That can work in some cases, but only if things are loaded one
>> additional time (rather than on-demand as configuration changes, for
>> example) or in such a way that you could just keep adding layers.  But
>> in the event of one large intermeshed layer, this won't work without
>> #MutableConfigurations and/or #NonHierarchicalLayers and something to
>> coordinate the load.
>
> Suppose that we allow #NonHierarchicalLayers [3].  Are there realistic
> use cases in which layers would not then provide sufficient flexibility?

I'm not sure that it will (in all cases).  In our current system, a 
module isn't loaded until it is referenced, and it is not linked until 
it is used, like the aforementioned class linking rules.  Even if we 
have separate (non-hierarchical) layers for every single module, because 
of the way modules are defined in Jigsaw, I think, given module graph G, 
we'd still have to aggressively traverse all of G in order to load any 
module that is a part of G, even if nodes in that graph are never 
actually referenced at run time, which puts us back at square one.

-- 
- DML