Feedback on proposal for #ReflectiveAccessToNonExportedTypes

Mon Jul 18 21:30:04 UTC 2016

2016/7/13 20:27:45 -0700, jason.greene at redhat.com:
> Thanks for you reply! My thoughts are inline. I apologize in advance for
> the length/verbosity. Also, as a general disclaimer, I realize that you
> are all experts; in many of my arguments, I occaisionally restate certain
> concepts that I know you are all intimately aware of to frame the
> argument. Corrections are, as always, welcome.

No worries -- we'll try, as always, to be gentle!

> On Jul 13, 2016, at 4:47 PM, mark.reinhold at oracle.com wrote:
>> To put what Alex wrote in a somewhat different way, I'd say that the
>> tension here is between explicit configuration (as one finds today in,
>> e.g., the Maven world) and implicit configuration (IoC).
> 
> Just a small nit: IoC can also be explicit, its just that the
> explicitness is decoupled from the module, and controlled by another
> party, allowing for more flexibility in the assembled system.

Sure -- I was just trying to characterize the situation from the
standpoint of the module developer rather than that of the assembler
or deployer or container implementor (or any other role).

>> Both approaches
>> are important.  The former is typical of standalone Java SE applications
>> while the latter is typical of Java EE applications, though the two
>> approaches are often intermixed.
> 
> I agree they are certainly intermixed elements of a system, but Iâ€™d also
> argue IoC is pervasive in SE applications as well (e.g. inclusion of 330
> and 250 in SE are examples of a desire for SE usage). I canâ€™t refute that
> it has greater usage in EE, since its part of the spec, and thus
> effectively every EE application.

FYI, JSR 330 (DI annotations) is not in Java SE, though it's certainly
used in Java SE applications in combination with various DI frameworks.

JSR 250 ("common" annotations) specifies 14 annotations, but just five
of them are in Java SE.  They're really only there to support JAX-WS, a
component shared with Java EE.  So far as I know they're not used much
in SE applications except in conjunction with JAX-WS.

> I think a better use case categorization of this problem is static
> linkage vs dynamic invocation. In static linkage an explicit symbol
> mapping resolved by the language itself is ideal as it avoids ambiguity,
> and by definition is static. On the other hand with dynamic invocation
> itâ€™s common for the caller to utilize introspection and discovery as part
> of the natural flow of executing a dynamic call. Resolving ambiguity is
> not an issue in this case, since it is already handled by the caller as
> part of introspection.

This dichotomy of implementation techniques corresponds well to the
developer-point-of-view dichotomy of explicit vs. implicit configuration
which I described earlier.

>> ...
>> 
>> If I understand correctly, your view of the present proposal is that:
>> 
>> (1) It induces too much boilerplate, requiring developers to write
>>    `exports dynamic P` for every single package `P` that's subject
>>    to reflection by a framework, and
> 
> Thatâ€™s an accurate summary of this point. ...
> 
>> (2) It weakens encapsulation too much, by making the types in such a
>>    package available for reflection at run time by any module in the
>>    system.
> 
> Sorry for the confusion, what I was trying to say on this point was a bit
> different. What I was trying to say was:
> 
> (2') It weakens encapsulation by forcing the introduction of exports
>      introducing potential conflicts that break applications.
> 
> As an example, assume I have three modules with classloader-per-module
> isolation (A, B, and Victim)
> 
> - A exports foo, and has a non-exported package â€œbar"
> 
> - B exports bar
> 
> - Victim has a module-info with requires A; requires B
> 
> Now A decides to use IoC on some of its classes in bar, so itâ€™s
> definition is changed to:
> 
> { exports foo; exports dynamic bar; }
> 
> Since exports dynamic is internally a normal export at runtime, module
> resolution fails when loading Victim, because its now including a
> duplicate package, even though A had no intention of publishing its
> internal bar package for linkage.

Got it.  Thanks for clarifying this -- I agree that it's a problem.

Fortunately I think we can address it simply by revising the semantics of
`exports dynamic p` to omit the package-conflict constraint.  This would
allow split packages to occur more readily at run time, though still
really only in fairly obscure situations involving poorly-written class
loaders.

> Qualified exports could in theory address this problem, but they are
> problematic in a dynamic environment since the module is simply not in a
> position to know all of the various modules which would/could enhance
> it. ...

I completely agree.  In general I think it's inappropriate for the author
of a module to write qualified exports except in some very special cases,
and this is most definitely not one of them.

>> These observations lead to your suggestion to allow declarative module
>> boundaries to be overridden by "trusted" framework code.  It's far from
>> clear how to define such a facility in a way that would still allow us to
>> achieve one of our primary goals, namely strong encapsulation, i.e., the
>> ability of the author of a module to declare which types are accessible
>> by other components, and which are not.
> 
> My understanding was the underlying driver was a security concern (thatâ€™s
> were I was going with that suggestion). Is that accurate?

It's a question of both security (i.e., preventing vulnerabilities) and
integrity (i.e., respecting the intent of a module's author).

For security, the problem with introducing an explicit notion of trust
into any system, or in this case an additional explicit notion of trust,
is that you then have to figure out how to prevent it from being used as
an attack vector.

Yes, we could define a notion of "trusted" modules whose code can reflect
arbitrarily.  Even if we implement and use it completely correctly in the
JDK, however, there would be an API for it, and that API would be used by
external library and application code, and eventually somebody, somewhere,
would make a mistake that leads to a CVSS 10 in some production code.

In the Jigsaw design so far, therefore, we've tried instead to leverage
existing implicit notions of trust wherever possible.

We already trust whoever has control over the invocation of the Java
run-time environment.  If you can edit the `java` command line, or its
equivalent, then you can already do pretty much anything you want, so
there's little additional risk in providing command line options (as
we do [1]) to break module-encapsulation boundaries.

We also already trust whoever writes systems that explicitly load
classes, i.e., container developers.  If you can modify bytecodes on
their way into the JVM then you already have significant power, so
there's little additional risk in giving you the ability to control how
modules are defined and related in the class loaders that you create.
(In fact you already have this ability, since you can rewrite module
descriptors prior to the configuration of a layer, so there's no need
to create a special API for it.)

A natural consequence of this approach is that we need not place total
trust in a framework in order to use it.  A container can arrange for a
framework to have reflective access to just the packages of just the
modules that the framework is going to support, by adding qualified
dynamic exports as needed, rather than grant it full reflective access
to all code in the system.

This approach does mean that container developers have a bit more work
to do, but that seems a reasonable tradeoff if in return we're able to
keep the platform simpler, and thus both easier to understand and easier
to secure.

As to integrity, if the author of a module decides not to export a
specific package then they should be able to expect that decision to be
respected, and not overridden lightly by any random code in the system
that uses reflection.  Whoever controls the `java` command line or sets
up the container in which the module ultimately runs can override that
encapsulation decision, and they may have very good and legitimate
reasons to do so.  At that point, however, they take on the burden of
bearing the consequences of any violation of the module author's intent.
If a future version of the module, e.g., removes an encapsulated package
in a way that causes reflective code to fail then that is not the fault
of the module author but, rather, of whoever decided to break into the
package.

> I think the goal you list above is laudable but Iâ€™m hoping there is room
> for nuance.
> 
> One of the other points I was making in my earlier writeup is that you
> have two different access roles at play here. There is interaction with a
> module through an API/contract, and you have runtimes which
> enhance/augment the implementation of a module. Encapsulation is very
> important for the former, but itâ€™s counterproductive for the latter. ...

Understood -- this is one more way to characterize the dichotomy at the
root of this discussion.

>>                                * * *
>> 
>> To point (1), we all know that the most common way for developers to
>> write Java code today is with a rich and powerful IDE.  These tools
>> already have plenty of built-in cleverness for generating POJO classes,
>> deriving precise `import` directives, and ameliorating other kinds of
>> boilerplate.  I don't think it would be at all a stretch for such tools
>> to generate precise `exports dynamic` directives on demand, based upon
>> the presence of IoC-style annotations, and maintain their consistency
>> over time.  Just as precise `import` directives in class and interface
>> declarations document dependences upon specific types, so precise
>> `exports dynamic` directives in a module declaration would document
>> the exposition of specific types for reflection at run time.
>> 
>> If we think it likely that some modules will need to export dozens or
>> hundreds of packages, leading to extremely long module declarations, then
>> one possible refinement would be to allow a wildcard: `exports dynamic *`
>> would export all of a module's packages for reflection at run time.  This
>> would likely be straightforward.
> 
> Itâ€™s certainly true that tooling can address, and even negate the
> issue. However it does require that the IDE understand the specific
> framework in use. For large standards like Java EE, thatâ€™s a reasonable
> expectation. Although, there are many different frameworks and runtimes
> that do this, so coverage will likely be incomplete and or/lag. I suppose
> an IDE could just default to exporting everything but then that makes
> conflicts more likely.

I think a meta-annotation along the lines of what Stephane Epardaud
suggested nearby [2] is definitely worth considering, so that IDEs and
can detect missing dynamic-export declarations and offer to insert them.
It wouldn't help with configuration-based IoC, as you pointed out, but if
we remove the present package-conflict constraint, as suggested above,
then a container could easily insert whatever configuration-driven
dynamic exports are needed.

>>                                * * *
>> 
>> To point (2), if some packages in a user module need to be exported for
>> reflection at run time, and a container wishes to ensure that only select
>> "trusted" framework modules can access the types in those packages, then
>> that's already expressible today.  We can also ensure that the set of
>> packages exported by a module is the same whether it's used standalone
>> on Java SE versus inside a container, which as you observe elsewhere in
>> this thread [1] could be problematic.
>> 
>> Suppose, e.g., we have an application module that's written against JPA,
>> rather than any specific JPA implementation, and exports the package
>> containing its entity classes for reflection at run time:
>> 
>>  module com.foo.data {
>>      requires java.persistence;
>>      exports dynamic com.foo.data.model;
>>  }
>> 
>> When used standalone, outside of a container, this module will export the
>> package containing its entity classes for reflection at run time.  The
>> classes will be accessible to every other module, but from a security and
>> integrity standpoint we assume that whoever invokes the run-time system,
>> i.e., whoever provides the command-line arguments to the `java` launcher
>> or its equivalent, is trusted to ensure that no adversarial modules are
>> present.
>> 
>> When used inside a container, the container already has the power to
>> prevent an adversarial module from accessing the module's entity classes.
>> That's because we expect containers to load every application into a
>> unique layer [2], and a container can rewrite module descriptors when
>> configuring a layer.  
> 
> Right, for the reasons I listed above, this is really only workable if
> the container rewrites these values and/or adds them itself. However,
> there is still some challenges with this. Another capability could come
> online in a running system in a hot fashion, after the qualified list was
> computed during initialization. So the set would need to be expanded to
> not just what is required now, but all possible consumers that could be
> required, and these may not be known yet. So as an example, the container
> administrator hot deploys a new service which snapshots internal
> state. The module implementation is not known until that code is
> deployed, forcing the container to bounce all deployments just to
> recompute the export list.
> 
> Perhaps this problem could be addressed by addressed by an indirection
> that represents a role or an actor (e.g. something like "export dynamic *
> to role runtimeâ€).  I havenâ€™t thought that through though.

Happily, I don't think we need to go that far.

A container is in complete control of the class loaders that it creates,
so it already has the power to load arbitrary classes into arbitrary
modules.  If a container needs to add a dynamic export at run time then
it can synthesize a tiny class whose static initializer invokes the
java.lang.Module::addExports method as needed, and then define that class
in the target module.

We already use this technique in the JDK's Nashorn JavaScript engine,
which loads script classes into modules that are defined completely at
run time.  (As Alex mentioned, there will be a talk on this at the
upcoming JVM Language Summit, and the video will be available shortly
thereafter.  We'll post a link here when it's available.)

>> ...
>> 
>>                                * * *
>> 
>> To sum up, for (1) I agree that unnecessary boilerplate is a bad thing.
>> Asking the author of a module to be explicit about which packages are
>> exported for reflection at run time, however, is of high value when
>> trying to understand how the module fits into a larger system.  The cost
>> of such explicitness can, moreover, easily be reduced by the tools that
>> almost all Java developers already use.
>> 
>> For (2), I share your concern and I think it can be addressed within the
>> scope of the present design.  At this point I don't see a strong need to
>> introduce a way to enable framework code to violate module boundaries
>> arbitrarily at run time, and I don't know how to do that without,
>> essentially, giving up on one of our primary goals.
> 
> Thanks for sharing your perspective. I can respect pushing hard on an
> ideal. IMHO tweaking, or perhaps slightly reinterpreting, this goal
> doesnâ€™t mean it wasnâ€™t achieved, itâ€™s just adapting to accommodate a very
> valuable set of use cases. Another way to look at it, is that once you
> have containers modifying and generating descriptors you have already
> transferred authority from the module to the runtime, so why not
> formalize that in a mechanism that best enables the use case? Iâ€™m hopeful
> we can find a way to do so.

As I've tried to explain above, I think we can support these use cases
with the present design once we remove the package-conflict constraint
for dynamically-exported packages, and we can do so in a way that's
simpler and more secure than a "trusted" module mechanism.

>> We could make it easier to rewrite module descriptors, by providing an
>> API for that purpose rather than expecting container developers to use
>> libraries such as ASM, and perhaps that's worth doing, but it's a
>> different issue.
> 
> It would be nice if there was a way to provide and/or alter a module
> definition in something other than in bytecode, mainly for optimal
> generation reasons, but I certainly understand that some things will have
> to wait as future enhancements.

No bytecode rewriting is, in fact, necessary today.  A container need
only implement the java.lang.reflect.module.ModuleFinder interface [3]
with a custom class that uses the existing API to load module-info.class
files into ModuleDescriptor [4] objects and, from those, create revised
ModuleDescriptor objects as needed.  Such objects are immutable, and
today we only provide a builder API for creating them, so rewriting them
is possible but not exactly convenient.  It wouldn't be difficult to add
some convenience APIs to make ModuleDescriptor rewriting easier -- if you
or others have suggestions about how to do that, we'd be happy to consider
them.

- Mark

[1] http://openjdk.java.net/jeps/261#Breaking-encapsulation
[2] http://mail.openjdk.java.net/pipermail/jigsaw-dev/2016-July/008644.html
[3] http://cr.openjdk.java.net/~mr/jigsaw/spec/api/java/lang/module/ModuleFinder.html
[4] http://cr.openjdk.java.net/~mr/jigsaw/spec/api/java/lang/module/ModuleDescriptor.html