#ReflectiveAccessByInstrumentationAgents

Tue Apr 19 17:00:54 UTC 2016

Hi Alan,

Thanks very much for kicking of this discussion.

On 19/04/16 12:35, Alan Bateman wrote:
> 
> This is a follow-up to Andrew Dinn's mail to jpms-spec-comments [1] on
> the topic of instrumentation. Andrew confirmed that he is okay to move
> follow-up discussion here.

And I am on the list so no need to include me specially in the CC.

Response to your questions are included inline below. Sorry for the
length of the reply -- it's really quite complicated (the first draft of
this reply was twice as long :-)

regards,

Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (US), Michael O'Neill (Ireland), Paul
Argiry (US)

> Andrew - I think it would be useful if you could set the scene and
> describe some of the scenarios where the Byteman instrumentation can
> potentially fail at runtime. I get the impression from the mail that you
> weren't seeing any issues at the time but this may have been early
> experimentation.

I have been building and testing on JDK9 Byteman using the EA program
releases and have not seen anything break so far. However, that may just
be because i) Byteman and the Byteman tests are only using runtime
classes from the java base module and ii) app code used in tests is not
modularized -- well, at least not using Jigsaw (see below).

My concern was based on statements in the requirements about limiting
reflective access according to module context i.e. code in module M
might lose the ability to access non-public members of classes belonging
to module M'. Byteman relies on reflective access to get/put non-public
fields and to invoke non-public methods. I noticed in the Jigsaw code
base that setEnabled was checking the module of the caller. So, I wanted
to be sure that setEnabled would still work (at the very least would
still work so long as my Java agent code was making the call).

> A starting question might be whether Byteman can potentially inject code
> into module A with calls into non-exported types in module B? Does this
> arise with fault injection?

I think there is potential for this now that we have a module system in
place. It certainly seems like it.

For example, code injected into method C.m might currently try to read a
static field f' belonging to some other class C'. In the past when C and
C' were both in the JDK runtime or when C was some app class and C' was
part of the JDK runtime it was always possible to resolve C' from the
classloader of C and, if f' was public, access C'.f' directly or, when
it was non-public, use a Field instance to access it reflectively. If C
and C' are now in different modules or C is in an unmodularized app jar
and C' in a module then I see several problems which may arise.

Firstly, if the code injected into C.m includes reference to a JDK
runtime class C' in module M' and C' is not exported then can a
classloader lookup from the classloader of C fail? If so then this is
going to be a legacy compatibility problem.

Secondly, assume the reference to C' can be resolved allowing Byteman
agent code to obtain a Field instance fi' for a non-public  field f' or
a Method instance mi' for a non-public method m' of C'. Will a call to
setEnabled on fi'/mi' be rejected because the members in question belong
to a module M' which Byteman agent classes do not belong to? It was this
possibility that was hinted at in the requirements and whihc set alarm
bells ringing.

Thirdly, assume setEnabled does succeed. Will acesses/invocations of
fi'/mi' from Byteman agent code be rejected because the fields in
question belong to a module M' which Byteman agent classes do not belong to?

Finally, will acesses/invocations of fi'/mi' from other code possibly be
rejected because the fields in question belong to a module M' which
accessing does not belong to?

The last question is relevant when Byteman executes injected code by
compiling it to bytecode rather than interpreting it. The generated
bytecode is attached to a dynamically generated class which means that
it cannot use get/put or invoke bytecodes to access private members. To
resolve this the bytecode is given access to the necessary instance fi'
or mi' when it needs to access/invoke non-public members. So, the
question is what happens when this dynamically generated class does not
belong to module M'? Will the Jigsaw JVM reject the reflective access
from this bytecode?

> I may be reading too much into this but I would assume Byteman is
> already doing some checking to ensure that the B types are visible from
> A. If you add support for modules to Byteman then the scenario means
> thinking about accessibility too. For the access to succeed then it
> means that (1) module A reads module B, and (2) that B exports the
> package to A.

Byteman does indeed ensure that types are visible by resolving all type
references to B (C' in my example above) via the classloader of A (C in
the above example).

There is already an extension to Byteman which supports the notion of
module imports for JBoss Modules. In that case injected code may import
JBoss modules M1, M2 etc. Types mentioned in the injected code are
resolved using a composite classloader buolt by delegating to the
clasloader for C and then the classloaders C1, C2 etc for M1, m2 etc.
Types are resolved firstly by looking them up in the target method's
classloader C then failing that by looking up in the classloaders C1, C2.

JBoss Modules provides a dynamic API to access and delegate to these
classloaders. Is there going to be some sort of equivalent for modules?
If so what restrictions will be placed on use of that API? I'd like to
be able to deal any legacy reference failures by providing a Jigsaw
module imports extension but I don't know whether that is going to be
possible without details of what API can be used to allow composite
classloaders to delegate class lookups into modules.

> For (1) then there are APIs in both java.lang.instrument and JNI and so
> a JVM TI or java Byteman can do this. Alternatively it can instrument
> code in A to have A reflectively read B.

Is this a Jigsaw extension to the java.lang.instrument API you are
talking about? If so can you point me at the code and.or javadoc?

When you say "it can instrument code in A to have A reflectively read
B." it sounds like you are saying that my existing use of Members is
going to continue to work. Is that what you mean? O ram I being too
optimistic? :-)

> For (2) then then there aren't API to break encapsulation but Byteman
> could instrument code in B to use the reflective APIs to export the
> package to A. This is runtime equivalent of the -XaddExports command
> line option.

I don't follow what you are actually suggesting here. In what sense
would this 'export' the package. If you mean that I ned to transform B
in order to use it from A then what type of transformation would I need
to apply to a class B in order to allow A to access a private method m?
Would this involve adding a new method which was public? one that would
call the old one?

That's not an option since Byteman cannot make structural changes to
bytecode. It has to be able to transform classes which already exist
when the agent is loaded (it is a retransformer) and also needs to be
able to remove injected code and restore the status quo (e.g. for
testing different changes need to be present from one test to the next).
So, any requirement to change structure is not a solution.

> Your mail mentions setAccessible which suggests it might be actually be
> injecting code that uses core reflection rather than static references
> to types in B. There is a mention of accessing private members in your
> mail so maybe it uses core reflection for this scenario because such
> access is not allowed with bytecode???  If this is the case then you'll
> still need (2).

Yes, mostly Byteman operates using reflection because for the most part
it executes injected code by interpreting it. However, as I mentioned
above it sometimes executes the injected code as bytecode. This is
needed when code is injected into hot methods. It allows the optimizer
to inline and then optimize the injected code. I think I already asked
the pertinent question above. Will the Jigsaw JVM reject the reflective
access from this bytecode because the Member being accessed belongs to a
class in a different module?

> Another question is about your comment on "get-out for JVM TI agent
> code".  JVM TI agents are native agents and so are using JNI where there
> isn't any access checks. I don't expect there are issues there but it
> would be good to clarify what you may have meant at the time.

I thought I asked for a get out for /JVMTI Java agents/ i.e. for /Java/
code loaded by the -javaagent command line option or the VM_Attach API
rather than for JVMTI Native agents. The former is certainly what I was
interested in.

Anyway, I think I have outlined what the problems are above. What I was
asking for was some way of bypassing these problems when calls were made
from agent code i.e. allowing reflective accesses to non-public Members
to proceed if it was known that the access was somehow sanctioned by
Java agent code.

So, if it turns out that usage of a Member from any invoking context is
constrained merely according to whether setAccessible(true) has been
successfully executed then could you ensure that the check as to whether
setAccessible should succeed or fail can identify that the caller
belongs to agent code and if so make it succeed.

Alternatively, if an access from the bytecode for method C.m of a Member
of class C' (either get, put or invoke) is constrained according to some
relation between the modules of C and C' in question then can you ensure
that when C belongs to agent code the access always succeeds?

The latter would not be enough to deal with potential restrictions on
the current generated bytecode but I can probably ensure that the
generated code calls into Byteman code to do the reflective access.