[External] : Re: Disallowing the dynamic loading of agents by default

Ron Pressler ron.pressler at oracle.com
Mon Mar 20 17:53:25 UTC 2023


Hi Kirk.

While the JEP will reiterate the relevant considerations (and no one denies that dynamically loaded agents are not useful) that led to this change being announced some years ago, the purpose of my email was to announce it will finally take effect in JDK 21. All the discussions at time, over all the relevant areas, informed the design at the core of the platform and its evolution in the past five years, namely that the application must grant explicit consent to anything affecting integrity (i.e. guarantees you can trust). Unless something has changed dramatically since then, or some new information has come to light, reopening the discussions around past decisions that have shaped the platform’s current design are unlikely to yield different results. Many of those discussions are available on jigsaw-dev.

It is because some people may be affected that JEP 261 postponed the changing of that default, so that everyone would have time to prepare and prepare their users. We’ve now given an extra six-month advance notice to give those who haven’t finished preparing their users the time to do so.

This is an opportunity to remind everyone that other capabilities that similarly affect platform and application integrity — such as JNI and Unsafe — will also require the application’s consent on the command line — not in JDK 21, but soon thereafter.

— Ron

On 20 Mar 2023, at 17:02, Kirk Pepperdine <kirk.pepperdine at gmail.com<mailto:kirk.pepperdine at gmail.com>> wrote:

Hi Ron,


On Mar 20, 2023, at 3:10 AM, Ron Pressler <ron.pressler at oracle.com<mailto:ron.pressler at oracle.com>> wrote:

Hi.

The majority of serviceability tools don’t require dynamically loading an agent, and the majority of applications never load an agent dynamically.

While I wouldn’t be surprised that the majority don’t load agents dynamically, I wouldn’t want to diminishes the importance of this capability for those that do make use of it. And I believe the number that do dynamically load might surprise you. But then, my data on this is likely highly biased. Do you have better data to support this view point?


True, there are some tools that will be affected, which is why the decision was to introduce the flag in JDK 9 and to announce this change, but change the default in a later version to give tools ample time to prepare their users. The rationale for this change then hasn’t changed, but will be reiterated in a JEP (we just wanted to announce this ahead of the JEP to give tool authors another reminder more than six months ahead of JDK 21). The only change between then and now is that even fewer use cases require dynamically loaded agents, and so the impact is even smaller.

Again, I’m not sure I see the data to support this. But then again, my view point remains highly biased. And I see an assumption that tools will be able to easily adapt to this change. I’m not sure that is entirely true. At least not in a way that in effect returns dynamic attach capabilities with a directly loaded proxy.


It is also true that, when starting an application you don’t know that you *will* need to load an agent, but in most situations you know that you might. E.g. processes that are too critical to bring down even for deep maintenance (although not many of these are written in modern version of Java anyone) or canary services that are under trial. The relatively few sophisticated users who know how to write ad-hoc agents can even opt to enable dynamic agent loading on all their servers; these users are better equipped to can weigh the risks and tradeoffs involved.

Again, I’m not sure I’d equate numbers to importance. As an analogy, there are very few people that know how to build cars and maybe more that know how to fix them but, there are certainly many many more than know how to use them.


Finally, some tools that require a dynamically loaded JVM TI agents, such as profilers that profile native code, are so tied to the VM's internals that the best place for them is in the JDK. If anything, the bigger problem is not that profilers are used too much in production, but too little, including less advanced ones that don’t require an agent. There is plenty of time to enhance the JDK’s built-in profiling capabilities ahead of demand.

At odds is that all profilers come with biases. I’ve always stressed that the first thing one needs to do with any profiler is discover it’s biases and then determine how that bias affects the results, how one should account for the bias, or even should another tool be used. To this point, JFR, a built in profiler, has significant biases. For example, allocation profiling, quite often, completely misses allocation hotspots for small objects that are not scalar replaced in all but trivial examples. But let's not pick on JFR because it is the tool of choice for many other things and other allocation profilers do have other biases such as altering JIT behavior that may cause EA to fail thus preventing otherwise eligible allocation hotspots from being scalar replaced. While one tool is blind, the other generates false positives. Knowing this, I can combine the JIT logs with profiler results to help offset the effects of the bias for the later profile.. however, I can’t do anything for the blind spot.

Finally, if there is anything lesson to be learned from the migrations from 8 to 9 is that tooling is a huge anchor preventing people from upgrading. That JDK 8 is still in as widespread use as it is, is in no small part due to the extensive change in the tooling chain. In fact, a number of very useful tools simply didn’t survive leaving us with less desirable alternatives. The other historical data point that maybe of comparison is the introduction of generics into the language. While this slowed the adoption of JDK 5 (from 1.4.2), it had no where near the impact that the degradation of the observability/diagnostic tool chain had on the migration rates from 7 to 8 and then this huge impact of 9. In my opinion, we’ve learned enough from this migration to understand that we may need to re-evaluate decisions that were made prior to these learning.

Kind regards,
Kirk



— Ron

On 20 Mar 2023, at 01:21, Andrei Pangin <andrei.pangin at gmail.com<mailto:andrei.pangin at gmail.com>> wrote:

Hi all,

Serviceability has been one of the biggest Java strengths, but the proposed change is going to have a large negative impact on it.

Disallowing dynamic agents by default means it will no longer be possible to attach a profiler to a running app in runtime. JFR cannot close this gap due to lack of capabilities modern Java profilers have (that's a separate topic though).

When an issue happens with a live app, it's already too late to add a command line argument. Furthermore, it may not be even feasible to add an agent at startup in containerized applications. Starting profiler on demand from the host OS or from a sidecar is the only viable solution in these cases.

Next, it's hard to predict beforehand what tools exactly might be useful for troubleshooting: e.g., one tool may be better for finding memory leaks, a different one for analyzing CPU performance. Adding all possible tools at startup does not seem a reasonable approach, especially when tools may conflict with each other.

The most important aspect of dynamic agents is the possibility to make a special tool just in time for solving a particular problem. A typical example is to get a value of some field in a live app without dumping the entire 60 GB heap. Another common use case is hot patching for fixing trivial bugs or for adding debug logs dynamically. The prominent example is when the dynamic agent has proved irreplaceable aid in addressing the notorious log4j vulnerabilities CVE-2021-44228 and CVE-2021-45046.

I would be grateful to know more about the reasons why we should give up all the above advantages of dynamic agents in their good and legitimate use cases.

Thank you,
Andrei

чт, 16 мар. 2023 г. в 18:48, Ron Pressler <ron.pressler at oracle.com<mailto:ron.pressler at oracle.com>>:
Hi.

In JDK 21 we intend to disallow the dynamic loading of agents by default. This
will affect tools that use the Attach API to load an agent into a JVM some time
after the JVM has started [1]. There is no change to any of the mechanisms that
load an agent at JVM startup (-javaagent/-agentlib on the command line or the
Launcher-Agent-Class attribute in the main JAR's manifest).

This change in default behavior was proposed in 2017 as part of JEP 261 [2][3].
At that time the consensus was to switch to this default not in JDK 9 but in a
later release to give tool maintainers sufficient time to inform their users.
To allow the dynamic loading of agents, users will need to specify
-XX:+EnableDynamicAgentLoading on the command line.

I'll post a draft JEP for review shortly.

-- Ron

[1]: https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html
[2]: https://openjdk.org/jeps/261
[3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jigsaw-dev/attachments/20230320/0da085c9/attachment-0001.htm>


More information about the jigsaw-dev mailing list