Disallowing the dynamic loading of agents by default

Fri Mar 31 16:37:22 UTC 2017

This attempt to disable the ability of "dynamic agent loading" by
default somehow reminds me of the recently added "ptrace protection"
Linux kernel feature [1] which can be configured to disallow
PTRACE_ATTACH to any other process running under the same uid. As far
as I know, only Ubuntu disallows PTRACE_ATTACH by default, others like
RedHat don't [2] and yet others like Debian even switched back to
allow it again, because it was considered a "bad idea" and not "a real
protection" [3].

While I agree with the RedHat/Debian opinion, even for other systems
it is still possible to dynamically enable (i.e. without restarting
the offending application) PTRACE_ATTACH [4] if that was initially
disabled by default.

I think that introducing an option to control "dynamic agent loading"
would be fine, but I'd rather name it
"-XX:+DisableDyanmicAgentLoading" which clearly implicates that the
default should be to allow dynamic agent loading.

As a side note, I'd like to mention that Java is running without a
security manager by default. Every reputable Java developer knows this
and every serious Java application installs a security manager if it
cares about security. In my opinion, the impact of running with
"dynamic agent loading" enabled by default has much less impact on
security than running without a security manager. If somebody still
cares, he would be free to use "-XX:+DisableDyanmicAgentLoading" at
his discretion, but not the other way round.

Regards,
Volker

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/Documentation/security/Yama.txt?id=2d514487faf188938a4ee4fb3464eeecfbdcf8eb
[2] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/SELinux_Users_and_Administrators_Guide/sect-Security-Enhanced_Linux-Working_with_SELinux-Disable_ptrace.html
[3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=712740
[4] by writing '0' to /proc/sys/kernel/yama/ptrace_scope

On Fri, Mar 31, 2017 at 11:46 AM, Andrew Dinn <adinn at redhat.com> wrote:
> Hi Mark,
>
> On 30/03/17 16:38, mark.reinhold at oracle.com wrote:
>> // Moving the general discussion to jigsaw-dev for the record;
>> // bcc'ing {hotspot-runtime,serviceability}-dev for reference.
>>
>> Andrew,
>>
>> Thanks for your feedback on this topic [1][2][3].
>
> ... and thank you for your considered reply.
>
>> First, we apologize for the way in which this topic was raised.  Our
>> intent was to post a proposal for discussion prior to code review, but
>> unfortunately that review was posted prematurely (as is evident by its
>> inclusion of Oracle-internal e-mail addresses and URLs).
>
> Hmm, yes! I must say I didn't notice that. I appreciate the apology but
> it's not really necessary. I certainly didn't expect any explanation to
> omit some element of miscommunication and/or cock-up :-)
>
>> Second, I agree with your earlier analysis as to the security impact of
>> this change.  If an attack is possible via this vector then closing the
>> vector would only slow the attack, not prevent it.
>
> Good, I am glad to hear there is not some terrible loop-hole at play
> that I am not aware of.
>
>> The motivation for this change is, however, not merely to improve the
>> security of the platform but to improve its integrity, which is one of
>> the principal goals of the entire modularity effort.  ...
>
> Ok, I understand the motive here although I'm still not personally
> convinced by it. I'll come to the practical considerations below. Before
> that I'd like to address the question of integrity at a more abstract level.
>
> I'm certainly not against providing -XX+/-EnableDynamicAgentLoading as a
> command line option. I agree that it's probably useful for some users to
> have the option to completely lock down the platform to guarantee its
> integrity. It seems from what you say above that this lock-down option
> is only there to provide 'belt and braces'. In other words, it is only
> necessary to guard against a security breach that could be managed by
> other means (e.g. a failure to control what jars go into your classpath;
> a failure to control access to the JVM uid on on the JVM host machine).
> I cannot fault the idea of a belt and braces lockdown per se but I am
> still not convinced why that extra protection needs to be enabled /by
> default/.
>
> You specifically bring up the scenario where rogue code, once entered
> into the JVM, might use the attach API to raise its privilege level.
>
> "As things stand today, code in any JAR file can use the
> `VirtualMachine.loadAgent` API to load an agent into the JVM in which
> it's running and, via that agent, break into any module it likes."
>
> Yet, you also acknowledge above that this merely constitutes an
> opportunistic escalation of a situation that is already a serious
> security breach in its own right. I don't think I follow the logic here.
>
> Are you saying that we need the extra braces because there is a real
> danger here? one that users cannot rightly always be expected to guard
> against? Or are you just being extra cautious. This is really the crux
> of the matter because that extra caution has to be weighed against the
> extra cost of lost opportunities to deploy agents in abnormal situations.
>
> n.b. I know in the case of Red Hat's middleware that this is a real cost
> which will definitely arise no matter how hard we work to educate users
> about the necessary advance preparation required. It is also a
> significant cost because it will damage our ability to resolve certain
> very difficult support issues where only an agent can provide the
> information needed. And that is above above and beyond the cost of the
> re-education task itself. I don't doubt other companies will be affected
> similarly.
>
> My mention above of 'abnormal' situations underlines why your argument
> about integrity is somewhat moot (to me). Yes, it is important to know
> that encapsulation means encapsulation -- at least, I agree that is so
> in /normal/ circumstances. However, agents are clearly not normal code
> performing the normal program operations of an application. Many agents
> are specifically designed fro deployment in abnormal situations and
> perform abnormal actions. That is precisely what provides the impetus to
> deploy agents dynamically.
>
> It is highly valuable in such circumstances, and only in those
> circumstances, to be able to allow privileged agent code to
> /selectively/ remove certain integrity barriers, even if -- perhaps,
> especially because -- any dismantling of the normal rules of operation
> only happens modulo the specific licence the agent has been crafted and
> configured to grant. Useful agents clearly scope the degree to which
> they perturb normality to achieve abnormal results. Careful and
> thoughtful users can (must) still feel safe that an agent is not going
> to do catastrophic damage to the running application and the integrity
> of its data and operation. Ironically, this means that deployment of my
> agent is actually a relatively normal (even if infrequent) procedure for
> many of our users.
>
> So, while I agree that platform (or even application) integrity is a
> valuable property to maintain in normal program operation, I don't think
> those concerns are warranted in the case of an agent that has been
> deliberately and carefully deployed by those in charge of an
> application. I suspect we are probably not going to agree about the
> proposed default on these grounds (and I also suspect I will not be the
> only one to disagree with your position). So, perhaps we would be better
> off moving on to pragmatic concerns.
>
>> I understand your points about the practical difficulties of having to
>> educate users about this new option and enhance startup scripts to use
>> the option only when invoking JDK 9.  Isn't it already the case, however,
>> that migrating existing applications to JDK 9 is often going to require
>> the use of a few new options anyway, in order to expose internal APIs?
>> If so then would it really be that much more burdensome for users also
>> to think explicitly, at the same time, about whether they want to enable
>> dynamic agent loading?
>
> If the default is reset to allow dynamic loading then I am happy to
> fully endorse this change and see no significant consequences. If this
> change is going to happen with your proposed default then I would very
> much prefer it to be staged: introduce the flag in 9 but with the
> default being to allow dynamic loading of agents (i.e. default to the
> status quo); reset the default in 10 to disable loading. The benefit of
> that is
>
>   aware JDK9 users can still use or ignore the option as they see fit
>
>   unaware JDK9 users will not get hit by the change by surprise in JDK9
>
>   unaware JDK10 users may still get hit by surprise but by that stage
> any configuration option they add to their JDK10 scripts will be
> compatible if they need to switch back and forth between JDK10 and JDK9
>
>   implementers of agents and implementers of middleware that might
> benefit from using those agents have more time to prepare their users,
> limiting the potential for any such nasty surprise in JDK10
>
>> This change would be disruptive to some but it's the best way we've
>> found, so far, to preserve platform integrity in the face of dynamic
>> agent loading.  If there's a better way to do that, we'd like to know.
>
> No, I don't think there is a better mechanism, only a better default.
> That reflects my belief that, while 'preserving platform integrity' is a
> highly desirable goal, for most users it does not merit being pursued
> 'in the face of dynamic agent loading'.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander