#ReflectiveAccessByInstrumentationAgents

Wed Apr 20 11:40:38 UTC 2016

On 19/04/2016 18:00, Andrew Dinn wrote:
> :
> I have been building and testing on JDK9 Byteman using the EA program
> releases and have not seen anything break so far. However, that may just
> be because i) Byteman and the Byteman tests are only using runtime
> classes from the java base module and ii) app code used in tests is not
> modularized -- well, at least not using Jigsaw (see below).
It's good to hear that you are testing with the EA builds.

I think the main thing for Byteman, and this will be true at least some 
other agents too, is that there will likely need to be updated to 
support instrumentation with modules. We have a reasonable compatibility 
story for existing JVM TI and java agents but as soon as you get into 
instrumenting code to statically or reflectively access code in other 
modules then it may require the agent to arrange for this access to be 
allowed.

To get started, then I would suggest studying the documents, 
presentations and recordings that we have linked from the Project Jigsaw 
page. This should get you up to speed on the new concepts and how 
accessibility works with modules. I think this has to a prerequisite to 
having a more detailed discussion on how agents doing BCI can work with 
modules.

>
> My concern was based on statements in the requirements about limiting
> reflective access according to module context i.e. code in module M
> might lose the ability to access non-public members of classes belonging
> to module M'.
Core reflection has always been specified to do the same access checks 
as the Java Language and bytecode. The only thing you loose is the 
sledge hammer that is setAccessible as it cannot be used to break into 
non-exported packages. You can use it to get a non-public members in 
your own module, you can use it to get to non-public members of types in 
packages that others have exported to you, you just cannot use to break 
into the parts of the module that the module author has decided not to 
export to you.

> Byteman relies on reflective access to get/put non-public
> fields and to invoke non-public methods. I noticed in the Jigsaw code
> base that setEnabled was checking the module of the caller. So, I wanted
> to be sure that setEnabled would still work (at the very least would
> still work so long as my Java agent code was making the call).
This is where the agent author needs to be creative. It may, for 
example, inject bytecode into the victim module to invoke addExports and 
give the burglar access.

> :
>
> Firstly, if the code injected into C.m includes reference to a JDK
> runtime class C' in module M' and C' is not exported then can a
> classloader lookup from the classloader of C fail? If so then this is
> going to be a legacy compatibility problem.
There are no changes to visibility and so C' is visible, Class.forName 
should work as before for example.

That said, if C' is in a package that is not exported to C then code in 
C cannot access it. This is not specific to JDK modules, M' is any module.

>
> Secondly, assume the reference to C' can be resolved allowing Byteman
> agent code to obtain a Field instance fi' for a non-public  field f' or
> a Method instance mi' for a non-public method m' of C'. Will a call to
> setEnabled on fi'/mi' be rejected because the members in question belong
> to a module M' which Byteman agent classes do not belong to? It was this
> possibility that was hinted at in the requirements and whihc set alarm
> bells ringing.
setAccessible should fail. If Byteman is injecting code into C that uses 
code reflection to access a member of C' then it will need to arrange 
for M' to export the package containing C' to C.

>
> Finally, will acesses/invocations of fi'/mi' from other code possibly be
> rejected because the fields in question belong to a module M' which
> accessing does not belong to?
>
> The last question is relevant when Byteman executes injected code by
> compiling it to bytecode rather than interpreting it. The generated
> bytecode is attached to a dynamically generated class which means that
> it cannot use get/put or invoke bytecodes to access private members. To
> resolve this the bytecode is given access to the necessary instance fi'
> or mi' when it needs to access/invoke non-public members. So, the
> question is what happens when this dynamically generated class does not
> belong to module M'? Will the Jigsaw JVM reject the reflective access
> from this bytecode?
If setAccessible(true) has already been successfully called on the 
Method of Field then handing it around is somewhat dangerous as it can 
be used to invoke the method or access the field without an access 
check. However if the generated code is being passed a Method or Field 
where setAccessible(true) has not been called then there will be access 
check.

> :
>
> There is already an extension to Byteman which supports the notion of
> module imports for JBoss Modules. In that case injected code may import
> JBoss modules M1, M2 etc. Types mentioned in the injected code are
> resolved using a composite classloader buolt by delegating to the
> clasloader for C and then the classloaders C1, C2 etc for M1, m2 etc.
> Types are resolved firstly by looking them up in the target method's
> classloader C then failing that by looking up in the classloaders C1, C2.
>
> JBoss Modules provides a dynamic API to access and delegate to these
> classloaders. Is there going to be some sort of equivalent for modules?
> If so what restrictions will be placed on use of that API? I'd like to
> be able to deal any legacy reference failures by providing a Jigsaw
> module imports extension but I don't know whether that is going to be
> possible without details of what API can be used to allow composite
> classloaders to delegate class lookups into modules.
For the most part, this is a non-issue with the changes in JDK 9 because 
we have not changed visibility. However it possible to instantiate 
groups of modules where each module is defined to its own class loader. 
Really advanced users can also defines modules to their class loaders as 
they see fit.

If I read your paragraphs correctly then the instrumentation involves 
injecting code that can only work if the class loader delegation is 
augmented, is that right? There aren't hooks or other means in the API 
to do this. To be honest, it sounds like a hazard that would need strong 
use-cases.

> Is this a Jigsaw extension to the java.lang.instrument API you are
> talking about? If so can you point me at the code and.or javadoc?
The EA downloads have a link to the docs:
   http://download.java.net/java/jdk9/docs/api/index.html

There is a new section on "Instrumenting code in modules" in 
java.lang.instrument. The Instrumentation class defines a new method 
addModuleReads to update a module to read another. This is not 
interesting when you are injecting code that uses core reflection but 
will be important when you inject bytecode with static references to 
types in other modules. You'll need if you inject code that uses method 
handles too.

The update JVM TI and JNI docs are here:
   http://download.java.net/java/jdk9/docs/platform/jvmti/jvmti.html
http://download.java.net/java/jdk9/docs/technotes/guides/jni/spec/jniTOC.html

The JVM TI spec has a section "Bytecode Instrumentation of code in 
modules". JNI has a few set of functions that are documented under 
"Module Operations".

>
> When you say "it can instrument code in A to have A reflectively read
> B." it sounds like you are saying that my existing use of Members is
> going to continue to work. Is that what you mean? O ram I being too
> optimistic? :-)
If the injected code is using code reflection they you won't need to use 
the API to reflective add read edges. This does not mean you won't need 
to inject code to reflectively export packages through.

> :
> I don't follow what you are actually suggesting here. In what sense
> would this 'export' the package. If you mean that I ned to transform B
> in order to use it from A then what type of transformation would I need
> to apply to a class B in order to allow A to access a private method m?
> Would this involve adding a new method which was public? one that would
> call the old one?
>
> That's not an option since Byteman cannot make structural changes to
> bytecode. It has to be able to transform classes which already exist
> when the agent is loaded (it is a retransformer) and also needs to be
> able to remove injected code and restore the status quo (e.g. for
> testing different changes need to be present from one test to the next).
> So, any requirement to change structure is not a solution.
I didn't suggest schema changes although load time instrumentation does 
give you the opportunity to add initializers which could be useful to 
some of the issues you might encounter.

> :
> I thought I asked for a get out for /JVMTI Java agents/ i.e. for /Java/
> code loaded by the -javaagent command line option or the VM_Attach API
> rather than for JVMTI Native agents. The former is certainly what I was
> interested in.
>
> Anyway, I think I have outlined what the problems are above. What I was
> asking for was some way of bypassing these problems when calls were made
> from agent code i.e. allowing reflective accesses to non-public Members
> to proceed if it was known that the access was somehow sanctioned by
> Java agent code.
>
> So, if it turns out that usage of a Member from any invoking context is
> constrained merely according to whether setAccessible(true) has been
> successfully executed then could you ensure that the check as to whether
> setAccessible should succeed or fail can identify that the caller
> belongs to agent code and if so make it succeed.
>
> Alternatively, if an access from the bytecode for method C.m of a Member
> of class C' (either get, put or invoke) is constrained according to some
> relation between the modules of C and C' in question then can you ensure
> that when C belongs to agent code the access always succeeds?
>
> The latter would not be enough to deal with potential restrictions on
> the current generated bytecode but I can probably ensure that the
> generated code calls into Byteman code to do the reflective access.
There is lots that we can comment on here but I think it would be better 
to spend a bit of time coming up to speed on modules and encapsulation. 
I think then we can continue at least part of this thread as you work 
through how to update Byteman to arrange for the intended access to work.

-Alan