New candidate JEP: 451: Prepare to Disallow the Dynamic Loading of Agents

Tue May 16 12:36:41 UTC 2023

Here I refer to anyone who is operating the JVM as a customer. Scenarios 
where this JEP will (when moved from deprecation to fully applied) 
onerously adversely impact customers include:

1. Most obviously, those customers who have running JVMs, decide they 
want to add an observability agent, but are unaware of the 
DisableAttachMechanism JVM option - most operators are unaware of the 
majority of JVM options. These customers would only be able to add 
observability agents after
1a. investigating why the attach is failing
1b. resetting their command-line
1c. waiting until they can restart
These customers would normally be able to immediately gain APM data, and 
troubleshoot an existing running JVM when it has a problem, by attaching 
an agent. The requirement to reset the command-line would mean that 
future invocations would be observable and can use an agent for 
troubleshooting, but often the choice to apply a troubleshooting agent 
is their first encounter with agent technology - in other words the 
first time they need it, they can't use it. It's hardly ideal to insist 
that only sophisticated or experienced customers can use agent 
troubleshooting technologies

2. Many customers have a long process between proposing command-line 
changes and allowing them to be applied in production (including changes 
via environment variables). Allowing an agent to be attached provides 
immediate capability until that process is complete. The same long 
process means there is no prospect of switching on remote agent 
attachment without that same process being followed. These customers 
currently accept agent attachment before the changes have been passed as 
the technology is proven, robust and typically they use an agent from a 
vendor that guarantees (and with support contract) that it is valid to 
attach the agent to a production JVM

3. There are types of customers who run 3rd party JVM applications that 
they configure - but they can only configure the application and any JVM 
parameters that have been explicitly exported by the 3rd party. (This is 
very common for bought-in business processing applications). Changes to 
the JVM command-line are rarely allowed in this scenario (it violates 
the contract). Typically if the customer wants a change they make an 
enhancement request and even where these are prioritized, they often 
have to pay more for the changes so they are reluctant to make requests 
that are not directly business enhancing. Currently these types of 
customers can attach APM agents with no issue, providing excellent 
observability in to the request flow in the system to identify problems. 
These customers will be disadvantaged

4. In complete opposition to the whole thrust of the JEP, I've seen 
customers who will not allow agents attached via the command-line but 
will allow them if started by a Java library. Go figure. It's not 
logical but that's our industry for you. This is the case for several 
managed platforms (where the developer does not have direct access to 
JVM environment variables and arguments), and also where the person in 
charge of the deployment delegates the responsibility of agents to the 
dev team, who need to do this programmatically. Perhaps these 
customers/platforms will accept adding the flipped option to the 
command-line, perhaps not or perhaps only after finding that they now 
have an agent attachment issue - in any case it imposes additional 
process to something that works well at the moment

5. Injecting agents to containers running JVMs is a minefield. 
Attachment via a script after startup is often easier. Changing the 
command-line involves setting the JAVA_TOOL_OPTIONS environment variable 
before starting the container, but if the container already uses that 
variable then the conflict usually causes loss of one or the other 
setting, so it doesn't work in that case.

6. For k8s, using JAVA_TOOL_OPTIONS is currently the preferred mechanism 
(eg via mutating webhooks) and works well so this JEP shouldn't matter. 
But there are cases where that doesn't work (eg per container conflicts 
as above; or where security roles restrict this; etc) and in those the 
only alternative is to attach to the pod and attach the agent . This 
would require rebuilding images (either with the 
-XX:-DisableAttachMechanism option or with -javaagent/agentlib). Of 
course the observability community will attempt to pre-empt the problem 
by telling everyone to build their images with 
-XX:-DisableAttachMechanism - but that already points to the JEP being 
an anti-pattern

On 12/05/2023 19:28, Ron Pressler wrote:
> (Moving to the appropriate mailing lists for the discussion of this JEP)
>
> We want reports of common uses of dynamically loaded agents for serviceability and difficulties setting a flag. Our judgment will sway if we learn that the use of dynamically loaded agents for serviceability is very common and that setting a command line flag is onerous. Such reports of “I use dynamically loaded agents for X and it’s hard for me to set a flag because Y” should be made here, i.e. jigsaw-dev at openjdk.org, serviceability-dev at openjdk.org.
>
> Saying “I don’t like this (because I can think of cases where it may inconvenience me a little)” is not a report of a problem. A JDK feature that is disliked by only 1% of users will still be disliked by tens of thousands of people, and pretty much every JDK feature or lack of a feature is disliked by some Java developers; some features even inconvenience some minority of users. By physical necessity we sometimes inconvenience some users  because users have contradictory requirements. What we’re trying to estimate is just *how much* of an inconvenience will be caused by feature X or the lack of X when integrated over the entire ecosystem.
>
>   — Ron
>
>> On 12 May 2023, at 12:37, Jack Shirazi <jacks at fasterj.com> wrote:
>>
>> Thanks, this is going in circles. You want reports, I'm fine with that, I will provide a report. But my one report is not going to be sufficient to move your judgement. So I'll ask once again where should further such reports go, and at what point does your judgement sway?
>>
>>
>> On 12/05/2023 16:46, Ron Pressler wrote:
>>> Let’s start with you describing the particular use-cases of dynamically loaded agents that you’re concerned about and why you think a command-line flag to enable the functionality is onerous. In other words, describe the nature and severity of a *problem*. Remember that the goal of JDK maintainers is to serve the ecosystem as a whole, which means accommodating the conflicting desires by different classes of users. Because different people’s requirements are sometimes in contradiction with one another, we need to make a judgment. As JEP 451 says, this judgment is based on the assumptions that: 1. The need for dynamically loaded agent is not very common, and 2. When needed, adding a flag is not onerous.
>>>
>>> Stating you don’t like a policy that’s been discussed for roughly a decade and started to be put into effect five years ago is not enough. However, if you have questions regarding the informational JEP that attempts to summarise past discussions (https://openjdk.org/jeps/8305968) I’ll gladly try and answer them.
>>>
>>> — Ron
>>>
>>>> On 12 May 2023, at 10:05, Jack Shirazi <jacks at fasterj.com> wrote:
>>>>
>>>>
>>>>> Integrity must be opt out, and cannot be opt in, and so opting in is not a solution that will give us integrity*by default*. Seehttps://openjdk.org/jeps/8305968#Strong-Encapsulation-by-Default
>>>> This is an opinion, not a statement of fact. It needs to be justified, not assumed. Integrity is a goal, and there is a balance between what is useful and what can be limited. For full integrity, don't use the JVM at all. I for one prefer to continue using it.
>>>>
>>>>> The only information of relevance would be reports showing that dynamically loading agents are a commonly-needed functionality and that adding a command-line option to allow it is onerous.
>>>> I'm fine with that. I'm reporting exactly that here. I encourage others interested in this to also report that. I'll mention it in my next newsletter - where do you want the reports sent? My readers won't want to signup to this email list just to send a comment. At what point does the reporting mean the JEP is dropped?
>>>>
>>>>
>>>> On 12/05/2023 14:44, Ron Pressler wrote:
>>>>>> On 12 May 2023, at 05:26, Jack Shirazi <jacks at fasterj.com> wrote:
>>>>>>
>>>>>> Thank  you for your reply. This makes it clear that the JEP has a single specific tradeoff. So we have two capabilities at issue here
>>>>>>
>>>>>> A) Currently libraries can turn themselves into agents
>>>>>>
>>>>>> B) Currently agents can remotely attach
>>>>>>
>>>>>> The JEP has decided for the community that each of these are a bad thing and should be disabled by default (though enableable by setting an option).
>>>>> No, the JEP says:
>>>>>
>>>>> "To assure integrity, we need stronger measures to prevent the misuse by libraries of dynamically loaded agents. Unfortunately, we have not found a simple and automatic way to distinguish between a serviceability tool that dynamically loads an agent and a library that dynamically loads an agent.”
>>>>>
>>>>> The only problem is libraries, but because there’s no simple way to distinguish between the two, and because dynamically loaded agents are not needed in most serviceability uses, disabling them by default is reasonable. BTW, this was already decided in 2017 in JEP 261: https://openjdk.org/jeps/261
>>>>>
>>>>> As the JEP also says, in the future we may be able to distinguish between tools and libraries via a more complex mechanism that could allow tools to load agents dynamically without the flag.
>>>>>
>>>>>
>>>>>> My involvement in community discussions over the years has been that no one complains about (A), it has not been used maliciously, and there is a small niche who use it. (B) is used quite a lot and enhances JVM serviceability with a capability that is a clear advantage over other runtimes. It seems a shame to eliminate that competitive advantage.
>>>>> Malicious use is not a concern *at all*. What this JEP addresses is integrity by default. See https://openjdk.org/jeps/8305968
>>>>>
>>>>>> The JEP clearly points out that anyone concerned by these can disable the ability with a simple command-line option, so there is a simple solution for this minority.
>>>>> Integrity must be opt out, and cannot be opt in, and so opting in is not a solution that will give us integrity *by default*. See https://openjdk.org/jeps/8305968#Strong-Encapsulation-by-Default
>>>>>
>>>>>
>>>>>> The fundamental error is really that the attaching agent is read-write rather than read-only. If we could change that, it would be ideal, but sadly I don't think that's easily doable.
>>>>> Perhaps, but most uses of dynamically loaded agents (and nearly all uses of dynamically loaded *Java* agents) are for “write.” The most common use-case for “read-only” is dynamically attached advanced profilers that use JVM TI. The solution there, as the JEP says, is not to separate agent capabilities but to improve JFR’s capabilities — which do not require an agent at all — and JFR can obtain profiles far more efficiently than anything JVM TI could ever hope to achieve.
>>>>>
>>>>>> I and many in the monitoring community believe this JEP is NOT an enhancement to the JDK. The proposers believe it is. Is there a mechanism other than this email discussion list to gain wider community feedback so we can ascertain if there is really a strong community preference either way?
>>>>>>
>>>>> The only information of relevance would be reports showing that dynamically loading agents are a commonly-needed functionality and that adding a command-line option to allow it is onerous.
>>>>>
>>>>> — Ron