Feedback on proposal for #ReflectiveAccessToNonExportedTypes

Tue Sep 6 15:50:12 UTC 2016

(Finally getting back to this thread, now that vacation season is over.)

2016/7/21 10:01:11 -0700, jason.greene at redhat.com:
> On Jul 18, 2016, at 4:30 PM, mark.reinhold at oracle.com wrote:
>> 2016/7/13 20:27:45 -0700, jason.greene at redhat.com:
>>> ...
>>> 
>>> Sorry for the confusion, what I was trying to say on this point was a bit
>>> different. What I was trying to say was:
>>> 
>>> (2') It weakens encapsulation by forcing the introduction of exports
>>>   introducing potential conflicts that break applications.
>>> 
>>> As an example, assume I have three modules with classloader-per-module
>>> isolation (A, B, and Victim)
>>> 
>>> - A exports foo, and has a non-exported package “bar"
>>> 
>>> - B exports bar
>>> 
>>> - Victim has a module-info with requires A; requires B
>>> 
>>> Now A decides to use IoC on some of its classes in bar, so it’s
>>> definition is changed to:
>>> 
>>> { exports foo; exports dynamic bar; }
>>> 
>>> Since exports dynamic is internally a normal export at runtime, module
>>> resolution fails when loading Victim, because its now including a
>>> duplicate package, even though A had no intention of publishing its
>>> internal bar package for linkage.
>> 
>> Got it.  Thanks for clarifying this -- I agree that it's a problem.
>> 
>> Fortunately I think we can address it simply by revising the semantics of
>> `exports dynamic p` to omit the package-conflict constraint.  This would
>> allow split packages to occur more readily at run time, though still
>> really only in fairly obscure situations involving poorly-written class
>> loaders.
> 
> That would help, but there is also class visibility issues that would
> need to be addressed as well.
> 
> Example 1 (Ambiguous class names):
> 
> Both A and B export “bar”, and both define “bar.MyClass” which have
> differing definitions. Victim could load the supposed to be hidden A’s
> MyClass instead of the intended B’s MyClass.
> 
> There is also a variant of this where the conflict is between Victim
> and A if A also exports another hidden package that is present in
> Victim itself.

Alan addressed this in a nearby message.

A high-level point worth emphasizing here is that visibility issues are
class-loader issues, and Jigsaw (for the most part) does not dictate how
custom class loaders should work.  It's a good idea for class loaders to
respect the readability relationships set up by the module system, but if
they don't then there's nothing that the module system can really do
about it.

> Example 2 (Unintentional discovery):
> 
> Victim uses ClassLoader.getResources (plural), looking for a standard
> configuration file or class name, and receives entries for both A and
> B. A’s was not intended to be discovered by victim, and leads to a
> failure state. As an example perhaps the configuration file in B
> specifies a class name in B’s dependency, which is not visible to
> Victim. Or, perhaps A’s config leads to duplicate runtime actions being
> configured (since the file was really only indented for A, which also
> processes it)

... which you later amended:

> Sorry for the confusion, this should read:
> 
> " As an example perhaps the configuration file in [A] specifies a class
> name in [A]’s dependency, which is not visible to Victim. Or, perhaps A’s
> config leads to duplicate runtime actions being configured (since the
> file was really only indented for A, which also processes it)"

Alan also addressed this, in a couple of different nearby messages.
Again, what the class loaders do in this example is not really up to the
module system.  This example can, however, be seen as an argument in
favor of #ResourceEncapsulation, as Alan noted.  If module A contains a
resource that's strictly internal to A, and if the module system gives
the author of A a way to encapsulate such resources, then the module
system could help out in this case by refusing to locate that resource.
(Resource location is one way in which Jigsaw may well change how all
class loaders work.)

> You can potentially address 1 with precedence, but not 2. 
> 
> I think you would need to say that export dynamic is only utilizable
> for reflection permissions and has no other similarity with “export”
> (although perhaps that’s what you meant?)

No, that's not what I meant.  `exports dynamic` allows static (i.e.,
bytecode) references too; it just doesn't allow compile-time access.
Anyway, this isn't an accessibility issue, as noted.

> If you combine that approach with a wildcard capability like you
> mentioned earlier then I’ll admit its very hard for me to quibble over
> a one line additional requirement in module-info.java.

Glad to hear it.

> Although, for completeness, let me (re?)introduce one other
> consideration that was briefly mentioned (although with sparing
> details) earlier in the thread
> 
> If you have a custom serialization framework that is supposed to be
> identical to Java serialization in contract, then it becomes impossible
> to mirror using the only available standard means (core reflection),
> since that mechanism disallows non-exported packages. Currently a
> custom serialization framework only needs to handle one non-standard
> case (missing no-arg constructor). Going forward it would need to use
> Unsafe for everything.

If all the classes of all the objects to be serialized by such a
framework are on the class path, or in automatic modules, or in exported
or dynamically-exported packages of explicit modules, then it will just
work.

If any of the classes are strongly encapsulated in explicit modules, by
the choice of the module's author -- as will be the case for most if not
all JDK modules -- then the framework will indeed need to use Unsafe to
serialize those classes.

Most of the external serialization frameworks that we've seen already use
Unsafe to achieve good performance, and Unsafe isn't going away in JDK 9,
so in the near to medium term I don't see this as a problem.  They might
have to make a few adjustments to work properly on 9, but given that
they're hacking into the internals of the system that shouldn't really
surprise anyone.

In the longer term we really need a better story for serialization so
that external serializers can serialize instances of JDK classes without
relying upon JDK internals.  We have some ideas about how to do that, but
they're completely out of scope for this release.

> ...
>>>> 
>>>> These observations lead to your suggestion to allow declarative module
>>>> boundaries to be overridden by "trusted" framework code.  It's far from
>>>> clear how to define such a facility in a way that would still allow us to
>>>> achieve one of our primary goals, namely strong encapsulation, i.e., the
>>>> ability of the author of a module to declare which types are accessible
>>>> by other components, and which are not.
>>> 
>>> My understanding was the underlying driver was a security concern (that’s
>>> were I was going with that suggestion). Is that accurate?
>> 
>> It's a question of both security (i.e., preventing vulnerabilities) and
>> integrity (i.e., respecting the intent of a module's author).
>> 
>> For security, the problem with introducing an explicit notion of trust
>> into any system, or in this case an additional explicit notion of trust,
>> is that you then have to figure out how to prevent it from being used as
>> an attack vector.
>> 
>> Yes, we could define a notion of "trusted" modules whose code can reflect
>> arbitrarily.  Even if we implement and use it completely correctly in the
>> JDK, however, there would be an API for it, and that API would be used by
>> external library and application code, and eventually somebody, somewhere,
>> would make a mistake that leads to a CVSS 10 in some production code.
> 
> I think it could be done in a fairly clean manner. For example, you
> could utilize something like JCE providers, and code which needs to
> obtain this trust has to be signed by a framework provider with a
> certificate that’s either been signed by a central JDK CA, or has been
> explicitly deployed as a trust within the JVM. There would be a burden
> for framework developers, but I assume they would prefer this over a
> limited model.

Anything having to do with PKI would be, I suspect, a rather extreme
burden for framework developers.

An approach that requires an end user somehow to install a framework
module into a JVM so that it's trusted will have many of the same
downsides that you point out below for command-line options.

> You could also reuse the security manager infrastructure, and just add
> a new permission.  That would mean giving up on protection even without
> a security manager, but if the user isn’t using a security manager then
> that implies they are on a platform will all trusted code.

One of the central goals of this entire project is to improve the
integrity of the platform so that it's less costly to maintain and --
more importantly -- easier to evolve for the future.  Enforcing module
boundaries even when a security manager is not present, and providing
only limited ways to override those boundaries, is critical to achieving
that goal.

>> In the Jigsaw design so far, therefore, we've tried instead to leverage
>> existing implicit notions of trust wherever possible.
>> 
>> We already trust whoever has control over the invocation of the Java
>> run-time environment.  If you can edit the `java` command line, or its
>> equivalent, then you can already do pretty much anything you want, so
>> there's little additional risk in providing command line options (as
>> we do [1]) to break module-encapsulation boundaries.
>> 
>> We also already trust whoever writes systems that explicitly load
>> classes, i.e., container developers.  If you can modify bytecodes on
>> their way into the JVM then you already have significant power, so
>> there's little additional risk in giving you the ability to control how
>> modules are defined and related in the class loaders that you create.
>> (In fact you already have this ability, since you can rewrite module
>> descriptors prior to the configuration of a layer, so there's no need
>> to create a special API for it.)
> 
> So the main issue with these solutions is really the problems we list
> above.
> 
> The command line approach has issues with compatibility with all of the
> various launch mechanism. For example you can’t just bundle a script
> for the user, because IDEs launch the VM too, and ensuring those are
> all in sync is brittle. The command line is also too early in the boot
> process (unless its just export everything in all modules).
> 
> The class loading approach also has the problem of requiring a
> particular launch mechanism; you can’t, for example, support a standard
> SE launch unless you require a particular agent on the command line.

Yes, I agree that these approaches are limited but that is, as noted, by
design.  It will be simpler for everyone if we can devise a system that
does not require placing trust in a framework, or using command-line
flags, or using custom launchers, or whatever, except in extenuating
circumstances or in the case of containers, which are expected to do
such things and already have the power to do so.

>> A natural consequence of this approach is that we need not place total
>> trust in a framework in order to use it.  A container can arrange for a
>> framework to have reflective access to just the packages of just the
>> modules that the framework is going to support, by adding qualified
>> dynamic exports as needed, rather than grant it full reflective access
>> to all code in the system.
>> 
>> This approach does mean that container developers have a bit more work
>> to do, but that seems a reasonable tradeoff if in return we're able to
>> keep the platform simpler, and thus both easier to understand and easier
>> to secure.
> 
> I think its fair to require a few more steps of additional burden to an
> advanced capability used by a framework developer, my concern is really
> more in when it spills over into the end user.

I share your concern.  In the near term, end users will be impacted when
they try to run existing code that reaches into JDK internals.  There's
no way to avoid that if we're going to improve the integrity of the
platform for the long haul, which is why we've been warning people about
it for years and providing a tool (jdeps) to aid in the transition.

This isn't just an issue for the JDK, of course -- giving all module
authors the ability to strongly encapsulate internals in ways that are
difficult to break will be of benefit to everyone in the long run.

>> As to integrity, if the author of a module decides not to export a
>> specific package then they should be able to expect that decision to be
>> respected, and not overridden lightly by any random code in the system
>> that uses reflection.  Whoever controls the `java` command line or sets
>> up the container in which the module ultimately runs can override that
>> encapsulation decision, and they may have very good and legitimate
>> reasons to do so.  At that point, however, they take on the burden of
>> bearing the consequences of any violation of the module author's intent.
>> If a future version of the module, e.g., removes an encapsulated package
>> in a way that causes reflective code to fail then that is not the fault
>> of the module author but, rather, of whoever decided to break into the
>> package.
> 
> Wouldn’t you agree though that we already have this balance today? You
> have to use setAccessible and a security permission to override the
> accessibility contract expressed by the developer.

No, I don't think we do have that balance today.  History shows that
`setAccessible` and a security permission are woefully ineffective
barriers.  For twenty years we've been extremely conservative about
changing the internals of the JDK, since we have no idea what random
but customer-critical code out there might be using reflection to dig
into it.  Every release is a gamble in which we have to be prepared for
last-minute P1 bug reports of major applications failing to run because
they use some long-unmaintained library that relies upon JDK internals.

It's time for this to stop, and for us all to move on.

(Sorry for the rant.  Horror stories available on demand, over beer.)

> ...
> 
>>> 
>>> Perhaps this problem could be addressed by addressed by an indirection
>>> that represents a role or an actor (e.g. something like "export dynamic *
>>> to role runtime”).  I haven’t thought that through though.
>> 
>> Happily, I don't think we need to go that far.
> 
> I agree that modifying exports dynamic to avoid side effects is a
> superior solution to the indirection idea.

Good.

>> A container is in complete control of the class loaders that it creates,
>> so it already has the power to load arbitrary classes into arbitrary
>> modules.  If a container needs to add a dynamic export at run time then
>> it can synthesize a tiny class whose static initializer invokes the
>> java.lang.Module::addExports method as needed, and then define that class
>> in the target module.
>> 
>> We already use this technique in the JDK's Nashorn JavaScript engine,
>> which loads script classes into modules that are defined completely at
>> run time.  (As Alex mentioned, there will be a talk on this at the
>> upcoming JVM Language Summit, and the video will be available shortly
>> thereafter.  We'll post a link here when it's available.)
> 
> Thanks that would be useful to read. 

Here's a link to the video: https://www.youtube.com/watch?v=Zk6a6jNZAt0

- Mark