[External] : Re: New candidate JEP: 451: Prepare to Disallow the Dynamic Loading of Agents

Fri May 19 21:21:41 UTC 2023

I fully understand why authors of (truly excellent, in your case!) advanced serviceability agents would see any feature that could affect their area of interest as a problem, but you surely understand that our responsibility is toward a far larger ecosystem. Even the ultimate restriction — which is NOT the subject of this JEP — would only require the addition of a command line flag to retain the current behaviour, but all this JEP does is add a warning. That’s the best mechanism that would allow us to estimate the impact of a future restriction, so you should welcome it. This JEP disallows nothing.

On a more personal note, my fear is not that profilers (in general, not just those that require agents) are now used too much, but that they’re used too little, including profilers that don’t require agents at all. I’m afraid we’ll find out that even if the usage of such tools were to grow by 10x it would still be negligible. I’d love to see — and your help putting such capabilities into the JDK would be much appreciated  greater awareness for the importance of profilers, and much greater use.

As to JFR — I am aware that many improvement are necessary, but a new stacktrace capture mechanism that JFR is about to start using is something that we are unlikely to ever manage to expose through an API because it requires close collaboration with HotSpot internals. Stay tuned, as it’s really exciting! You mention that JVM TI serves multiple uses, and you are absolutely right, but the plan is to incorporate the “read only” size into JFR, as it has some foundational benefits over JVM TI in addition to the one I mentioned (JFR is asynchronous whereas JVM TI is synchronous, which cause a whole lot of pain for VM engineers).

— Ron

> On 19 May 2023, at 19:40, Andrei Pangin <andrei.pangin at gmail.com> wrote:
> 
> Hi Ron,
> I reviewed integrity JEPs once again along with this email thread and I think there are several flaws in the proposal that need to be addressed before implementation.
>  
> 	• First, the JEP draws equality between an agent and an instrumenting agent, which is not true. Instrumentation is just one of the capabilities that an agent needs to request explicitly by calling JVM TI AddCapabilities function. There are many other read-only features of JVM TI that observability and troubleshooting agents can use without compromising application integrity. Disabling all agents by default just to protect from a few ones that modify application code is like cracking a nut with a sledgehammer, especially when a more fine-grained approach is already built into JVM TI.
> 
> 	• JEP states that most serviceability tools do not require dynamic agents. This sounds weird to me. How was that "most" measured? How can half a dozen JDK builtin tools be compared to an infinite number of custom tools that may be and already developed using JVM TI?
> 
> 	• JEP assumes that existing JDK tools are enough for troubleshooting. I wish they were. How, for example, you would dump an object graph without sensitive user data from a live service? With JVM TI agent, this is possible. Which builtin tool allows you to find native memory leaks, sources of long time-to-safepoint pauses, map perf counters to Java code? Unfortunately, none. Even worse, when dynamic agents are disabled, development of new custom tools will become meaningless.
> 
> 	• You emphasized many times that the proposal to disable dynamic agents appeared years ago. And that's actually the problem with this JEP. It relies on outdated assumptions and has not been adjusted to the modern trends. Technology didn't stay still; new use cases became popular, which this proposal does not take into account. Here are some examples:
> 		• Containers became the standard way to ship and deploy applications (btw, a good thing integrity-wise). Container image usually has the minimum amount of software required to run the app: no additional tools, restricted environment. Now consider that I want to monitor the application. Even if I'm allowed to modify the command line, I can't simply add -agentpath, since the agent library is not available in the container. A typical pattern for using serviceability tools with containerized applications is to run a sidecar container that has all required tools and capabilities. How would you suggest attaching a tool to a running container?
> 		• In the last couple of years, with the growing popularity of continuous profilers, a number of solutions appeared for system-wide or infrastructure-wide zero-configuration monitoring. The idea is that you install the observability software, and it automatically discovers all supported processes and starts monitoring/profiling them, regardless of how they were deployed. gProfiler, Parca, Pyroscope, just to name a few examples. The keyword here is "zero-configuration". Observability by       default is just as important nowadays as integrity by default.
> 
> 	• JEP outlines JFR as a universal solution for profiling, claiming it is "far more efficient than anything" in collecting stack traces. This is not true. Async-profiler (6K stars on GitHub, 700+ forks, more than a million downloads) can collect 1000 execution samples per second per core without significant overhead, thanks to hardware performance counters. Scalability of JFR sampling mechanism is inherently poor: it uses just one dedicated thread to walk through all Java threads in a loop and stop them one by one. JFR does not show non-Java threads in a profile, it is blind to native frames, its notion of thread states is misleading (e.g., Socket.read can spend CPU time in the networking stack or just wait for incoming data, but JFR has no clue). JFR fails to traverse valid Java stacks and silently discards such samples, e.g., you will not see arraycopy in a profile, although it's a common performance bottleneck. JFR is misleading not only in CPU profiling but also in memory profiling, see JDK-8307488. It's utopian to think that JFR can replace external profilers sometime soon - there is no even progress on fixing smaller issues: open bugs hang for years (JDK-8252417, JDK-8153167, JDK-8281677), some are closed as will-not-fix (JDK-8191415). Is it fair to disallow valid usages of profilers at runtime without providing a viable alternative?
> 
> 	• You mentioned two goals: 1) disallow libraries to grant themselves superpowers; 2) minimize the impact on serviceability tools that have to be started by a human operator. However, what this JEP actually suggests is the opposite: disabling dynamic loading of agents does not prevent libraries from obtaining superpowers - they can simply call System.load(). At the same time, disabling dynamic loading of agents has a huge impact on serviceability, up to the complete inability to use external tools at runtime. I understand that the plan is to disallow JNI someday too (unless explicitly allowed via a command line option) for the purpose of integrity. Following your goals, it would be more logical to disallow JNI first, as it is an easier way for libraries to break integrity.
>  
> To summarize the above, the current proposal does not seem to me elaborate enough for targeting to JDK 21. I would suggest improving it by 1) actualizing assumptions; 2) taking mentioned use cases into account; 3) providing read-to-use alternatives; 4) matching the plan with the goals.
>  
> Thank you,
> Andrei Pangin
> 
> пт, 19 мая 2023 г. в 15:44, Ron Pressler <ron.pressler at oracle.com>:
> Because the discussion of this JEP has veered in many directions, let me summarise where we are:
> 
> This JEP proposes to emit a suppressible warning when a JVM TI or Java agent is loaded into a JVM sometime after startup through the Attach mechanism.
> 
> The warning helps make users aware that an agent has been injected into the JVM and identify deployments that may need adjustment in advance of any future changes to disallow agents from being dynamically loaded without the application's consent. The warning will also let us better judge the impact of such a future change.
> 
> — Ron
> 
> > On 8 May 2023, at 20:17, Mark Reinhold <mark.reinhold at oracle.com> wrote:
> > 
> > https://openjdk.org/jeps/451
> > 
> >  Summary: Issue warnings when agents are loaded dynamically into a
> >  running JVM. These warnings aim to prepare users for a future release
> >  which disallows the dynamic loading of agents by default in order to
> >  improve integrity by default. Serviceability tools that load agents at
> >  startup will not cause warnings to be issued in any release.
> > 
> > - Mark
>