Disallowing the dynamic loading of agents by default

Fri Mar 31 09:46:21 UTC 2017

Hi Mark,

On 30/03/17 16:38, mark.reinhold at oracle.com wrote:
> // Moving the general discussion to jigsaw-dev for the record;
> // bcc'ing {hotspot-runtime,serviceability}-dev for reference.
> 
> Andrew,
> 
> Thanks for your feedback on this topic [1][2][3].

... and thank you for your considered reply.

> First, we apologize for the way in which this topic was raised.  Our
> intent was to post a proposal for discussion prior to code review, but
> unfortunately that review was posted prematurely (as is evident by its
> inclusion of Oracle-internal e-mail addresses and URLs).

Hmm, yes! I must say I didn't notice that. I appreciate the apology but
it's not really necessary. I certainly didn't expect any explanation to
omit some element of miscommunication and/or cock-up :-)

> Second, I agree with your earlier analysis as to the security impact of
> this change.  If an attack is possible via this vector then closing the
> vector would only slow the attack, not prevent it.

Good, I am glad to hear there is not some terrible loop-hole at play
that I am not aware of.

> The motivation for this change is, however, not merely to improve the
> security of the platform but to improve its integrity, which is one of
> the principal goals of the entire modularity effort.  ...

Ok, I understand the motive here although I'm still not personally
convinced by it. I'll come to the practical considerations below. Before
that I'd like to address the question of integrity at a more abstract level.

I'm certainly not against providing -XX+/-EnableDynamicAgentLoading as a
command line option. I agree that it's probably useful for some users to
have the option to completely lock down the platform to guarantee its
integrity. It seems from what you say above that this lock-down option
is only there to provide 'belt and braces'. In other words, it is only
necessary to guard against a security breach that could be managed by
other means (e.g. a failure to control what jars go into your classpath;
a failure to control access to the JVM uid on on the JVM host machine).
I cannot fault the idea of a belt and braces lockdown per se but I am
still not convinced why that extra protection needs to be enabled /by
default/.

You specifically bring up the scenario where rogue code, once entered
into the JVM, might use the attach API to raise its privilege level.

"As things stand today, code in any JAR file can use the
`VirtualMachine.loadAgent` API to load an agent into the JVM in which
it's running and, via that agent, break into any module it likes."

Yet, you also acknowledge above that this merely constitutes an
opportunistic escalation of a situation that is already a serious
security breach in its own right. I don't think I follow the logic here.

Are you saying that we need the extra braces because there is a real
danger here? one that users cannot rightly always be expected to guard
against? Or are you just being extra cautious. This is really the crux
of the matter because that extra caution has to be weighed against the
extra cost of lost opportunities to deploy agents in abnormal situations.

n.b. I know in the case of Red Hat's middleware that this is a real cost
which will definitely arise no matter how hard we work to educate users
about the necessary advance preparation required. It is also a
significant cost because it will damage our ability to resolve certain
very difficult support issues where only an agent can provide the
information needed. And that is above above and beyond the cost of the
re-education task itself. I don't doubt other companies will be affected
similarly.

My mention above of 'abnormal' situations underlines why your argument
about integrity is somewhat moot (to me). Yes, it is important to know
that encapsulation means encapsulation -- at least, I agree that is so
in /normal/ circumstances. However, agents are clearly not normal code
performing the normal program operations of an application. Many agents
are specifically designed fro deployment in abnormal situations and
perform abnormal actions. That is precisely what provides the impetus to
deploy agents dynamically.

It is highly valuable in such circumstances, and only in those
circumstances, to be able to allow privileged agent code to
/selectively/ remove certain integrity barriers, even if -- perhaps,
especially because -- any dismantling of the normal rules of operation
only happens modulo the specific licence the agent has been crafted and
configured to grant. Useful agents clearly scope the degree to which
they perturb normality to achieve abnormal results. Careful and
thoughtful users can (must) still feel safe that an agent is not going
to do catastrophic damage to the running application and the integrity
of its data and operation. Ironically, this means that deployment of my
agent is actually a relatively normal (even if infrequent) procedure for
many of our users.

So, while I agree that platform (or even application) integrity is a
valuable property to maintain in normal program operation, I don't think
those concerns are warranted in the case of an agent that has been
deliberately and carefully deployed by those in charge of an
application. I suspect we are probably not going to agree about the
proposed default on these grounds (and I also suspect I will not be the
only one to disagree with your position). So, perhaps we would be better
off moving on to pragmatic concerns.

> I understand your points about the practical difficulties of having to
> educate users about this new option and enhance startup scripts to use
> the option only when invoking JDK 9.  Isn't it already the case, however,
> that migrating existing applications to JDK 9 is often going to require
> the use of a few new options anyway, in order to expose internal APIs?
> If so then would it really be that much more burdensome for users also
> to think explicitly, at the same time, about whether they want to enable
> dynamic agent loading?

If the default is reset to allow dynamic loading then I am happy to
fully endorse this change and see no significant consequences. If this
change is going to happen with your proposed default then I would very
much prefer it to be staged: introduce the flag in 9 but with the
default being to allow dynamic loading of agents (i.e. default to the
status quo); reset the default in 10 to disable loading. The benefit of
that is

  aware JDK9 users can still use or ignore the option as they see fit

  unaware JDK9 users will not get hit by the change by surprise in JDK9

  unaware JDK10 users may still get hit by surprise but by that stage
any configuration option they add to their JDK10 scripts will be
compatible if they need to switch back and forth between JDK10 and JDK9

  implementers of agents and implementers of middleware that might
benefit from using those agents have more time to prepare their users,
limiting the potential for any such nasty surprise in JDK10

> This change would be disruptive to some but it's the best way we've
> found, so far, to preserve platform integrity in the face of dynamic
> agent loading.  If there's a better way to do that, we'd like to know.

No, I don't think there is a better mechanism, only a better default.
That reflects my belief that, while 'preserving platform integrity' is a
highly desirable goal, for most users it does not merit being pursued
'in the face of dynamic agent loading'.

regards,

Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander