Disallowing the dynamic loading of agents by default

Sun Apr 2 23:39:16 UTC 2017

I agree with Andrew's position that if the argument is added in JDK9, it should default to allow dynamic loading of agents.

Arguing from the position "Isn't it already the case, however, that migrating existing applications to JDK 9 is often going to require the use of a few new options anyway, in order to expose internal APIs" isn't a valid argument IMO.  Although migration to JDK 9 will be painful, I think that we will get to zero JDK 9 command line arguments.  As proposed, this new argument will never go away.

It's highly likely that customers will have scripts that they migrate from JDK 8 to JDK 9.  We don't control that.
And many developers don't use any scripts because for many cases, they don't care about the garbage collector or memory or whatever the scripts provide.
But they do care about product functionality provided by an agent.

-----Original Message-----
From: Andrew Dinn [mailto:adinn at redhat.com] 
Sent: Friday, March 31, 2017 5:46 AM
To: Mark Reinhold
Cc: jigsaw-dev at openjdk.java.net
Subject: Re: Disallowing the dynamic loading of agents by default

Hi Mark,

On 30/03/17 16:38, mark.reinhold at oracle.com wrote:
> // Moving the general discussion to jigsaw-dev for the record; // 
> bcc'ing {hotspot-runtime,serviceability}-dev for reference.
> 
> Andrew,
> 
> Thanks for your feedback on this topic [1][2][3].

... and thank you for your considered reply.

> First, we apologize for the way in which this topic was raised.  Our 
> intent was to post a proposal for discussion prior to code review, but 
> unfortunately that review was posted prematurely (as is evident by its 
> inclusion of Oracle-internal e-mail addresses and URLs).

Hmm, yes! I must say I didn't notice that. I appreciate the apology but it's not really necessary. I certainly didn't expect any explanation to omit some element of miscommunication and/or cock-up :-)

> Second, I agree with your earlier analysis as to the security impact 
> of this change.  If an attack is possible via this vector then closing 
> the vector would only slow the attack, not prevent it.

Good, I am glad to hear there is not some terrible loop-hole at play that I am not aware of.

> The motivation for this change is, however, not merely to improve the 
> security of the platform but to improve its integrity, which is one of 
> the principal goals of the entire modularity effort.  ...

Ok, I understand the motive here although I'm still not personally convinced by it. I'll come to the practical considerations below. Before that I'd like to address the question of integrity at a more abstract level.

I'm certainly not against providing -XX+/-EnableDynamicAgentLoading as a command line option. I agree that it's probably useful for some users to have the option to completely lock down the platform to guarantee its integrity. It seems from what you say above that this lock-down option is only there to provide 'belt and braces'. In other words, it is only necessary to guard against a security breach that could be managed by other means (e.g. a failure to control what jars go into your classpath; a failure to control access to the JVM uid on on the JVM host machine).
I cannot fault the idea of a belt and braces lockdown per se but I am still not convinced why that extra protection needs to be enabled /by default/.

You specifically bring up the scenario where rogue code, once entered into the JVM, might use the attach API to raise its privilege level.

"As things stand today, code in any JAR file can use the `VirtualMachine.loadAgent` API to load an agent into the JVM in which it's running and, via that agent, break into any module it likes."

Yet, you also acknowledge above that this merely constitutes an opportunistic escalation of a situation that is already a serious security breach in its own right. I don't think I follow the logic here.

Are you saying that we need the extra braces because there is a real danger here? one that users cannot rightly always be expected to guard against? Or are you just being extra cautious. This is really the crux of the matter because that extra caution has to be weighed against the extra cost of lost opportunities to deploy agents in abnormal situations.

n.b. I know in the case of Red Hat's middleware that this is a real cost which will definitely arise no matter how hard we work to educate users about the necessary advance preparation required. It is also a significant cost because it will damage our ability to resolve certain very difficult support issues where only an agent can provide the information needed. And that is above above and beyond the cost of the re-education task itself. I don't doubt other companies will be affected similarly.

My mention above of 'abnormal' situations underlines why your argument about integrity is somewhat moot (to me). Yes, it is important to know that encapsulation means encapsulation -- at least, I agree that is so in /normal/ circumstances. However, agents are clearly not normal code performing the normal program operations of an application. Many agents are specifically designed fro deployment in abnormal situations and perform abnormal actions. That is precisely what provides the impetus to deploy agents dynamically.

It is highly valuable in such circumstances, and only in those circumstances, to be able to allow privileged agent code to /selectively/ remove certain integrity barriers, even if -- perhaps, especially because -- any dismantling of the normal rules of operation only happens modulo the specific licence the agent has been crafted and configured to grant. Useful agents clearly scope the degree to which they perturb normality to achieve abnormal results. Careful and thoughtful users can (must) still feel safe that an agent is not going to do catastrophic damage to the running application and the integrity of its data and operation. Ironically, this means that deployment of my agent is actually a relatively normal (even if infrequent) procedure for many of our users.

So, while I agree that platform (or even application) integrity is a valuable property to maintain in normal program operation, I don't think those concerns are warranted in the case of an agent that has been deliberately and carefully deployed by those in charge of an application. I suspect we are probably not going to agree about the proposed default on these grounds (and I also suspect I will not be the only one to disagree with your position). So, perhaps we would be better off moving on to pragmatic concerns.

> I understand your points about the practical difficulties of having to 
> educate users about this new option and enhance startup scripts to use 
> the option only when invoking JDK 9.  Isn't it already the case, 
> however, that migrating existing applications to JDK 9 is often going 
> to require the use of a few new options anyway, in order to expose internal APIs?
> If so then would it really be that much more burdensome for users also 
> to think explicitly, at the same time, about whether they want to 
> enable dynamic agent loading?

If the default is reset to allow dynamic loading then I am happy to fully endorse this change and see no significant consequences. If this change is going to happen with your proposed default then I would very much prefer it to be staged: introduce the flag in 9 but with the default being to allow dynamic loading of agents (i.e. default to the status quo); reset the default in 10 to disable loading. The benefit of that is

  aware JDK9 users can still use or ignore the option as they see fit

  unaware JDK9 users will not get hit by the change by surprise in JDK9

  unaware JDK10 users may still get hit by surprise but by that stage any configuration option they add to their JDK10 scripts will be compatible if they need to switch back and forth between JDK10 and JDK9

  implementers of agents and implementers of middleware that might benefit from using those agents have more time to prepare their users, limiting the potential for any such nasty surprise in JDK10

> This change would be disruptive to some but it's the best way we've 
> found, so far, to preserve platform integrity in the face of dynamic 
> agent loading.  If there's a better way to do that, we'd like to know.

No, I don't think there is a better mechanism, only a better default.
That reflects my belief that, while 'preserving platform integrity' is a highly desirable goal, for most users it does not merit being pursued 'in the face of dynamic agent loading'.

regards,

Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander