New candidate JEP: 451: Prepare to Disallow the Dynamic Loading of Agents

Tue May 16 17:22:48 UTC 2023

At the core of your arguments is the claim — that we’ve heard told second-hand but rarely if ever reported first-hand — that the inability to control the command line is common. This claim is very important because its implications go well beyond the relatively niche issue of dynamically loaded agents, so I think it merits further discussion4. Certainly since the discontinuation of the centralised JRE deployment model in JDK 11, it’s been a deep assumption of Java’s design that deploying a Java application requires control of the command line. If you cannot control the command line, there are things you simply cannot do, including much more basic things than loading agents. If your application needs APM, surely it also needs control over the heap and GC with potentially new options.

Those who restrict access to the command line will need to explain their constraints because that approach is already not viable. Deploying a Java application, certainly one that supports super-advanced uses such as the dynamic loading of agents for code manipulation, without control of the command line is not a model that the platform has supported for a while. Java’s deployment model changed some years ago, and policies that applied to the retired model do not apply for the new one and have to change (again, nothing to do with agents). 

Since we’re talking about changes that only take place in new releases anyway, I find the notion that adding a command line flag is harder than adopting a new runtime version to be somewhat suspect, but if there’s a good reason for that, someone will need to present it.

A service vendor that wishes to allow a Java program to use a new runtime and to create a child process that injects a native library into the parent process will need to explain why they cannot also allow that program to set command line flags. If you want to support JDK 11 and upward to enjoy new features and performance enhancements, you need to also support the changes to the deployment model that accompany these new developments.

— Ron

> On 16 May 2023, at 13:36, Jack Shirazi <jacks at fasterj.com> wrote:
> 
> Here I refer to anyone who is operating the JVM as a customer. Scenarios where this JEP will (when moved from deprecation to fully applied) onerously adversely impact customers include:
> 
> 1. Most obviously, those customers who have running JVMs, decide they want to add an observability agent, but are unaware of the DisableAttachMechanism JVM option - most operators are unaware of the majority of JVM options. These customers would only be able to add observability agents after
> 1a. investigating why the attach is failing
> 1b. resetting their command-line
> 1c. waiting until they can restart
> These customers would normally be able to immediately gain APM data, and troubleshoot an existing running JVM when it has a problem, by attaching an agent. The requirement to reset the command-line would mean that future invocations would be observable and can use an agent for troubleshooting, but often the choice to apply a troubleshooting agent is their first encounter with agent technology - in other words the first time they need it, they can't use it. It's hardly ideal to insist that only sophisticated or experienced customers can use agent troubleshooting technologies
> 
> 2. Many customers have a long process between proposing command-line changes and allowing them to be applied in production (including changes via environment variables). Allowing an agent to be attached provides immediate capability until that process is complete. The same long process means there is no prospect of switching on remote agent attachment without that same process being followed. These customers currently accept agent attachment before the changes have been passed as the technology is proven, robust and typically they use an agent from a vendor that guarantees (and with support contract) that it is valid to attach the agent to a production JVM
> 
> 3. There are types of customers who run 3rd party JVM applications that they configure - but they can only configure the application and any JVM parameters that have been explicitly exported by the 3rd party. (This is very common for bought-in business processing applications). Changes to the JVM command-line are rarely allowed in this scenario (it violates the contract). Typically if the customer wants a change they make an enhancement request and even where these are prioritized, they often have to pay more for the changes so they are reluctant to make requests that are not directly business enhancing. Currently these types of customers can attach APM agents with no issue, providing excellent observability in to the request flow in the system to identify problems. These customers will be disadvantaged
> 
> 4. In complete opposition to the whole thrust of the JEP, I've seen customers who will not allow agents attached via the command-line but will allow them if started by a Java library. Go figure. It's not logical but that's our industry for you. This is the case for several managed platforms (where the developer does not have direct access to JVM environment variables and arguments), and also where the person in charge of the deployment delegates the responsibility of agents to the dev team, who need to do this programmatically. Perhaps these customers/platforms will accept adding the flipped option to the command-line, perhaps not or perhaps only after finding that they now have an agent attachment issue - in any case it imposes additional process to something that works well at the moment
> 
> 5. Injecting agents to containers running JVMs is a minefield. Attachment via a script after startup is often easier. Changing the command-line involves setting the JAVA_TOOL_OPTIONS environment variable before starting the container, but if the container already uses that variable then the conflict usually causes loss of one or the other setting, so it doesn't work in that case.
> 
> 6. For k8s, using JAVA_TOOL_OPTIONS is currently the preferred mechanism (eg via mutating webhooks) and works well so this JEP shouldn't matter. But there are cases where that doesn't work (eg per container conflicts as above; or where security roles restrict this; etc) and in those the only alternative is to attach to the pod and attach the agent . This would require rebuilding images (either with the -XX:-DisableAttachMechanism option or with -javaagent/agentlib). Of course the observability community will attempt to pre-empt the problem by telling everyone to build their images with -XX:-DisableAttachMechanism - but that already points to the JEP being an anti-pattern
> 
> 
> On 12/05/2023 19:28, Ron Pressler wrote:
>> (Moving to the appropriate mailing lists for the discussion of this JEP)
>> 
>> We want reports of common uses of dynamically loaded agents for serviceability and difficulties setting a flag. Our judgment will sway if we learn that the use of dynamically loaded agents for serviceability is very common and that setting a command line flag is onerous. Such reports of “I use dynamically loaded agents for X and it’s hard for me to set a flag because Y” should be made here, i.e. jigsaw-dev at openjdk.org, serviceability-dev at openjdk.org.
>> 
>> Saying “I don’t like this (because I can think of cases where it may inconvenience me a little)” is not a report of a problem. A JDK feature that is disliked by only 1% of users will still be disliked by tens of thousands of people, and pretty much every JDK feature or lack of a feature is disliked by some Java developers; some features even inconvenience some minority of users. By physical necessity we sometimes inconvenience some users  because users have contradictory requirements. What we’re trying to estimate is just *how much* of an inconvenience will be caused by feature X or the lack of X when integrated over the entire ecosystem.
>> 
>>  — Ron
>> 
>>> On 12 May 2023, at 12:37, Jack Shirazi <jacks at fasterj.com> wrote:
>>> 
>>> Thanks, this is going in circles. You want reports, I'm fine with that, I will provide a report. But my one report is not going to be sufficient to move your judgement. So I'll ask once again where should further such reports go, and at what point does your judgement sway?
>>> 
>>> 
>>> On 12/05/2023 16:46, Ron Pressler wrote:
>>>> Let’s start with you describing the particular use-cases of dynamically loaded agents that you’re concerned about and why you think a command-line flag to enable the functionality is onerous. In other words, describe the nature and severity of a *problem*. Remember that the goal of JDK maintainers is to serve the ecosystem as a whole, which means accommodating the conflicting desires by different classes of users. Because different people’s requirements are sometimes in contradiction with one another, we need to make a judgment. As JEP 451 says, this judgment is based on the assumptions that: 1. The need for dynamically loaded agent is not very common, and 2. When needed, adding a flag is not onerous.
>>>> 
>>>> Stating you don’t like a policy that’s been discussed for roughly a decade and started to be put into effect five years ago is not enough. However, if you have questions regarding the informational JEP that attempts to summarise past discussions (https://openjdk.org/jeps/8305968) I’ll gladly try and answer them.
>>>> 
>>>> — Ron
>>>> 
>>>>> On 12 May 2023, at 10:05, Jack Shirazi <jacks at fasterj.com> wrote:
>>>>> 
>>>>> 
>>>>>> Integrity must be opt out, and cannot be opt in, and so opting in is not a solution that will give us integrity*by default*. Seehttps://openjdk.org/jeps/8305968#Strong-Encapsulation-by-Default
>>>>> This is an opinion, not a statement of fact. It needs to be justified, not assumed. Integrity is a goal, and there is a balance between what is useful and what can be limited. For full integrity, don't use the JVM at all. I for one prefer to continue using it.
>>>>> 
>>>>>> The only information of relevance would be reports showing that dynamically loading agents are a commonly-needed functionality and that adding a command-line option to allow it is onerous.
>>>>> I'm fine with that. I'm reporting exactly that here. I encourage others interested in this to also report that. I'll mention it in my next newsletter - where do you want the reports sent? My readers won't want to signup to this email list just to send a comment. At what point does the reporting mean the JEP is dropped?
>>>>> 
>>>>> 
>>>>> On 12/05/2023 14:44, Ron Pressler wrote:
>>>>>>> On 12 May 2023, at 05:26, Jack Shirazi <jacks at fasterj.com> wrote:
>>>>>>> 
>>>>>>> Thank  you for your reply. This makes it clear that the JEP has a single specific tradeoff. So we have two capabilities at issue here
>>>>>>> 
>>>>>>> A) Currently libraries can turn themselves into agents
>>>>>>> 
>>>>>>> B) Currently agents can remotely attach
>>>>>>> 
>>>>>>> The JEP has decided for the community that each of these are a bad thing and should be disabled by default (though enableable by setting an option).
>>>>>> No, the JEP says:
>>>>>> 
>>>>>> "To assure integrity, we need stronger measures to prevent the misuse by libraries of dynamically loaded agents. Unfortunately, we have not found a simple and automatic way to distinguish between a serviceability tool that dynamically loads an agent and a library that dynamically loads an agent.”
>>>>>> 
>>>>>> The only problem is libraries, but because there’s no simple way to distinguish between the two, and because dynamically loaded agents are not needed in most serviceability uses, disabling them by default is reasonable. BTW, this was already decided in 2017 in JEP 261: https://openjdk.org/jeps/261
>>>>>> 
>>>>>> As the JEP also says, in the future we may be able to distinguish between tools and libraries via a more complex mechanism that could allow tools to load agents dynamically without the flag.
>>>>>> 
>>>>>> 
>>>>>>> My involvement in community discussions over the years has been that no one complains about (A), it has not been used maliciously, and there is a small niche who use it. (B) is used quite a lot and enhances JVM serviceability with a capability that is a clear advantage over other runtimes. It seems a shame to eliminate that competitive advantage.
>>>>>> Malicious use is not a concern *at all*. What this JEP addresses is integrity by default. See https://openjdk.org/jeps/8305968
>>>>>> 
>>>>>>> The JEP clearly points out that anyone concerned by these can disable the ability with a simple command-line option, so there is a simple solution for this minority.
>>>>>> Integrity must be opt out, and cannot be opt in, and so opting in is not a solution that will give us integrity *by default*. See https://openjdk.org/jeps/8305968#Strong-Encapsulation-by-Default
>>>>>> 
>>>>>> 
>>>>>>> The fundamental error is really that the attaching agent is read-write rather than read-only. If we could change that, it would be ideal, but sadly I don't think that's easily doable.
>>>>>> Perhaps, but most uses of dynamically loaded agents (and nearly all uses of dynamically loaded *Java* agents) are for “write.” The most common use-case for “read-only” is dynamically attached advanced profilers that use JVM TI. The solution there, as the JEP says, is not to separate agent capabilities but to improve JFR’s capabilities — which do not require an agent at all — and JFR can obtain profiles far more efficiently than anything JVM TI could ever hope to achieve.
>>>>>> 
>>>>>>> I and many in the monitoring community believe this JEP is NOT an enhancement to the JDK. The proposers believe it is. Is there a mechanism other than this email discussion list to gain wider community feedback so we can ascertain if there is really a strong community preference either way?
>>>>>>> 
>>>>>> The only information of relevance would be reports showing that dynamically loading agents are a commonly-needed functionality and that adding a command-line option to allow it is onerous.
>>>>>> 
>>>>>> — Ron