JEP draft: Disallow the Dynamic Loading of Agents by Default

Mon May 1 09:57:58 UTC 2023

Hi Volker!

> On 28 Apr 2023, at 16:38, Volker Simonis <volker.simonis at gmail.com> wrote:
> 
> I think it is a little unfortunate to put the usage of s.m.Unsafe and
> JNI/Instrumentation/JVMTI into the same category, especially when it
> comes to blaming developers for their usage. While s.m.Unsafe has
> always been an internal, undocumented and unsupported API, the latter
> three are part of the Java Platform (e.g. "native" is a Java keyword
> and Runtime.loadLibrary() is part of the Java API).

To have integrity by default, theses must all become restricted. In fact — not just them. Even the fully-official and brand-new FFM API, that a lot of investment has gone into very recently, must also be restricted. 

That these features must be restricted doesn’t mean they’re wrong or bad. It just means that they're superpowers, and so the user must acknowledge the choice to use them over the loss of integrity. Native libraries are good and integrity is good, but because they’re in contradiction, there must be a switch expressing the user’s preference like other switches offered by the runtime to select between alternatives.

Unsafe, on the other hand, may become more than just restricted over time. It may gradually be emptied out until it’s gone.

> 
> Do you really plan to make JNI an optional feature which will have to
> be manually enabled at startup?

Not optional at all, but an important, useful feature that is restricted; JNI’s replacement, FFM will be restricted, too (in its use of native libraries). The restriction of FFM is already mentioned in JEP 442. Another JEP addressing JNI will be published soonish.

> What will be the benefit?

Integrity. The ability to to bypass encapsulation when needed is not being taken away, but we need the new ability to establish and enforce invariants. We don’t yet have it.

> I understand
> that in an ideal world where you had no user-supplied JNI libraries at
> all, you might be able to perform more/better optimizations. But as
> you'd have to support JNI anyway, wouldn't the maintenance of the
> resulting code become a nightmare. How many "if (JNI) {..} else {..}"
> would we get?

There’s no need for such code. Modules that need JNI will use JNI. The application will simply give them permission to do so with --enable-native-access=MODULE-NAME, as it would also do to allow FFM to use native libraries.

> And what would be the benefit of disabling it by default
> for the user except increased "integrity"?

Not disabled, restricted, and integrity is the benefit for the user, e.g. in the form of programs not breaking (or breaking less) when upgrading the JDK. Integrity is required for the platform to continue evolve while keeping the ecosystem sustainable.

> I.e. do you have some
> concrete examples of planned features X, Y, Z which will only work
> with disabled JNI?

Not disabled, restricted. Like all encapsulation-breaking restricted superpowers, allowing them might have implications on possible Leyden features. For example, if a private method could be accessed from outside a module — whether through deep reflection or JNI — private methods could not be removed at link time.

> Will these features be Java SE features or
> implementation specific OpenJDK-only features?

As with all integrity and strong encapsulation features, all limitations will be part of the platform spec.

We realise that in each individual case there might be good reasons to allow knocking down encapsulation barriers. But whereas every application and library author rightfully want minimise the burden on their particular code, such individual decisions inevitably lead to a tragedy of the commons (as they already have). We must strive to minimise the overall burden integrated over the entire ecosystem *as a whole*. So the platform will have the right defaults for the ecosystem, and every application would be able to relax encapsulation to suit its particular needs.

Most Java program don’t use native libraries, agents (startup or dynamic), or deep reflection. Many do, and these features are very powerful and can be very useful, but with great power comes great responsibility, and that responsibility falls on the *application*. Libraries must not silently impose that responsibility on the application in a way that makes it infeasible to exercise.

Moreover, most encapsulation boundaries are never bypassed, but without integrity by default, the platform and its users still can’t be certain that code means what it says as long as any fourth-level dependency can decide on its own that any line of code in the program might mean something else.

> 
> I don't think it is fair to assume that profilers are the only "valid"
> use case for agents and imply that all other use cases are a mis-use
> of the API.

We are not assuming that at all. Only the use of *dynamically loaded* agents *by libraries* is misuse. Dynamically loaded agents were specifically designed to support serviceability tools, not to allow libraries to circumvent the need to ask the application for permission to break encapsulation.

> 
> I don't understand this "Non-Goal"? The Attach API [1] allows to
> dynamically attach to a running JVM and "Once a reference to a virtual
> machine is obtained, the loadAgent, loadAgentLibrary, and
> loadAgentPath methods are used to load agents into target virtual
> machine". So how can you achieve this JEP's goals without
> changing/restricting the Attach API? I therefore think this "Non-Goal"
> should be rephrased to explain which parts of the Attach API will be
> changed and moved to the "Goal" section instead.

It says “for monitoring and management purposes.” These purposes don’t require dynamically loaded agents. They rarely require agents at all, but when they do, they only need agents loaded at startup.

> 
> General comments:
> 
> - You go into great detail to explain why a human-operated tool is
> "superior" (in the sense of trust and security) to a library and
> "would ideally not be subject to the integrity constraints imposed on
> the application". I can't follow this argument, because both, the
> decision to use a specific tool as well as the decision to rely on a
> library is taken by a human.

A tool is not superior. Only:

1. Most libraries that break encapsulation are not chosen by application authors. They are usually low-level libraries chosen by the authors of the libraries that the application uses, i.e. they’re transitive dependencies. I don’t think that applications in the JDK 8 timeframe became non-portable as a result of a conscious choice. Moreover, it is practically infeasible to actually know everything the code you use may do even if you want to. So not only do application authors not know what libraries do (especially deep dependencies), but they *can’t* feasibly know.

2. You expect a mechanic to tune your car engine but you'd probably be surprised to learn that the little tree air freshener climbs down from the rearview mirror at night and crawls into the engine to make modifications. When an operator uses a serviceability tool, they expect it to open up the box and rummage through internals. That’s what servicing often means, in software as in the physical world. They do not expect that of libraries.

> I'd even argue that the decision to
> depend on a specific library which requires the dynmaic attach
> mechanism is taken by a more knowledgeable user (i.e. the developer
> himself). Of course both, a tool as well as a library can contain
> malicious code, but I don't see a fundamental difference between the
> two.

Malicious code is not a concern at all; we assume all code — whether in tools or libraries — is trusted and benevolent. (Even when looking at the security aspect in the server side ecosystem overall, malicious code amounts to a minuscule portion of the danger, judging by the number of attacks. When it comes to server security, benevolent code poses a much greater risk than malicious code, as the vast majority of security attacks exploit vulnerabilities in benevolent, trusted code. Of course, benevolent code imposes other risks covered in the JEP that are unrelated to security). 

Knowledgeable users who want to allow a library to arbitrarily change the meaning of code in the application are free to give it the permission to do so. But too many applications don’t even know that a dependency of a dependency of a dependency of theirs does it, and so the permission to do it cannot be the default.

> 
> - You may argue that users have to be protected from malicious
> libraries which gain their superpowers by secretly loading agents at
> runtime.

Again, malicious code is largely a non-issue for Java since Applets were removed.

Since you brought up malicious code in previous conversations, too, let me repeat that again: Even though there have been some software supply chain attacks on various language ecosystems, malicious code poses a relatively small risk to Java nowadays and it is *not* a major concern (at least for the moment); most risks — including security risks — are due to nice, helpful code.

> But users who don't know and don't care about their library
> dependencies will just as easy and without reflection (pun intended :)
> add the -XX:+EnableDynamicAgentLoading to their command line arguments
> (making this the new, most often used command line option even
> surpassing the usage of --add-opens :)

Adding permissions by “cargo cult” is, indeed, a problem, but at the very least the command line would still offer an auditable record of the risks taken up by the application. Responsible companies know that in some situations they may be held accountable for their technical decisions and deviations from recommended practices, and will have mechanisms in place to review command-line permissions just as they review code.

As a general rule, while we certainly want to help users do the right thing, we must first give those who want to do the right thing the ability to do so. Without strengthening strong encapsulation, even someone who really wants to know the integrity risks is unable to do so without an infeasible analysis of ever line of code in the application and all of its dependencies.

Moreover, because quite a few application authors do want to carefully consider risks, the fact that they need to explicitly accept more risk to use certain libraries would put pressure on libraries to reduce their superpower demands.

> 
> - I still can't understand the benefit of "only" changing the default
> behavior for dynamic agent loading. I could understand this if you'd
> do it with a plan to deprecate and completely remove the dynamic agent
> loading capability. But what are the benefits of changing the default
> if you'll have to support the functionality anyway?

The application can choose to knock down encapsulation barriers as it wishes (after all, it can even modify the Java runtime as it controls it), but we want the command line to offer a map of the codebase and its abilities. You get integrity by default, and an auditable record of the encapsulation choices always. We want tools to have superpowers, and it’s even arguably okay for certain libraries to be granted superpowers in certain situations provided that it’s done with the application’s explicit consent. It’s just that the situation where superpowers are given silently and by default has become untenable for the ecosystem as a whole.

> As mentioned in
> earlier discussions, my main concern with the proposed change is the
> impact it will have on the evolution of Java. Java's dynamic features
> are one of its biggest strength and a major reason for its success.

That’s right, but the Why Now? section covers that in detail. In short, the times — they are a changin’, and Java must be a changin’ with them. Even putting aside the new requirements and more Java-in-Java, the old situation has become untenable as we saw in the 8 -> 9+ migration. The reason the old way worked — until it didn’t — was that for a long while (the 6-8 timeframe) Java was relatively stagnant.

However, that relative stagnation didn’t just allow the encapsulation free-for-all to work; it’s also what made much of it necessary in the first place, to work around shortcomings in the JDK’s development. So not only can we not continue with the old regime, but there’s not as much need for it anymore.

However useful dynamism is at times, we must have the ability to control it. The faster Java evolves, the more important that control becomes. The most important thing to remember is that the need for integrity doesn’t come from some theoretical desire for architectural cleanliness, but from real user needs. Users want smoother upgrades; they want robust security; they want more features, and they want new kinds of features that reduce startup time. All these things require control over Java’s dynamism. Some users may want dynamism, too, but since these desires are in conflict, applications must choose between them, and that’s the idea of integrity by default (dynamism by default and integrity by choice can’t work because of the structure of the Java ecosystem).

> Sacrificing some of them or making their usage increasingly expensive
> requires a broader discussion in the community and shouldn't happen
> "under the hood" of a discussion about the default setting of a
> command line flag.

First, while requiring an auditable map of the codebase certainly does require some effort, let’s not exaggerate it. We’re talking about cost that is negligible compared to that of developing software, cost that is imposed only on those who want or need to use relatively advanced superpowered features, and cost that results in a something of value: a map of the codebase and its permissions. But yes, some will be inconvenienced by this, but so are those who cannot easily upgrade JDK versions due to non-portable libraries. You can call it a sacrifice if you wish, but whatever we do or don’t do *someone’s* convenience will be sacrificed, and this direction reduces rather than increases the overall sacrifice. That’s exactly why this is the direction — we want to reduce the overall pain for users and increase their value.

Second, that lengthy discussion about this direction already took place over years when Jigsaw was under development (as did the discussion about agents in particular). What was missing was a summary in JEP form, hence the informational JEP. We’ll post the JEP to jdk-dev once we finish writing it, but it describes a path that Java has already been on for several years.

> 
> - I don't understand why this JEP has scope "SE". As you rightly
> mentioned, the Attach API is a "non-standard" API which can be changed
> at any time and without affecting the Java SE specification, so this
> JEP should rather have scope "JDK" instead. On the other hand, the
> fact that this functionality is not governed by the SE specification
> will allow different OpenJDK distributors to use a different default
> setting for -XX:EnableDynamicAgentLoading which has the potential to
> cause a lot of confusion if we can't sattle on a common strategy.

The Attach API is JDK-specific, but agents are an SE feature (well JVM TI is optional), and the platform spec will say something along the lines of “if an implementation offers a way to attach an agent to a running JVM instance, that capability must be disabled by default and enabled with an explicit flag”.

> 
> - If doing this change at all, I think it would be better to do it in
> a non-LTS release first.

LTS is a service governed by Oracle Sales or something like that on the business side of things. OpenJDK has no concept of LTS. Nevertheless, given that for various business reasons more people are likely to be using JDK 21 than other versions, we can take that expected popularity into account. See my reply to Dan.

— Ron