Bytecode transformation investigation

Tue Aug 2 20:30:38 UTC 2022

When Mark kicked off the project, he wrote about the "spectrum of
constraints" enabling optimizations that are weaker than those of the
closed world constraint, but more broadly applicable.  In line with
that, I've been doing some investigation into bytecode
transformations.

While bytecode transformations are strictly less powerful than AOT,
they provide a way to simplify the program we're running based on
information available at build / deploy time.  They allow us to move
(some) dynamic behaviour from one phase (runtime) to an earlier one -
behaviour such as reflective operations, runtime class generation,
optional paths, etc can be simplified at the bytecode level based on
information the author (or deployer) of the software knows without
having to discover it at runtime.

Great!  But there's always a catch.  And the primary catch here is
that bytecode transformation can result in user visible changes.
Before we go too far down the path of developing transformations, we
should determine which user-visible changes are legitimate and where
the lines need to be drawn.

jlink experiment:
----------------------
As a starting point, I prototyped using jlink to transform Lambda
expressions to use pre-generated classes rather than runtime generated
ones.

Lambda expressions
* encode the lambda body as a private method in the defining class
* use an invokedynamic instruction to dynamically pick the strategy
for creating the lambda instances at runtime, and
* encode a "recipe" combining MethodHandle, MethodType, Class and int
arguments passed to the LambdaMetaFactory to actually generate the
required class and create the lambda instance.

None of the code outside the LambdaMetafactory (LMF) cares how the
lambda is implemented as long as it meets the contract by implementing
the correct interfaces and by calling the private implementation
method.

I modified the LMF internals to allow a jlink plugin to pre-generate
the lambda classes [0], but doing so produces user-visible behaviour
changes:

1) Lambda classes are no longer hidden anonymous classes.
The LMF loaded the implementation class as a hidden, anonymous class.
This meant Class.forName() can't find the class, that
Class::isHidden()[1] returned true, and that the class was specially
named [2].
With the pre-generated class, Class.forName can find the class, it is
no longer hidden as it is loaded using normal class loading, and the
name is a normal class name. [3]

2) The pregenerated class must be a Nest member peer to the defining class.
Since Lambda implementation methods are private on the class that
defines them, the pre-generated lambda class must be a nest peer of
the defining class inorder to call them.
Calls to Class:getNestHost on the lambda class may result in different
answers between the two strategies.  The nest host will also now
include the pre-generated classes in its list of nest members for the
pre-generated case.
Users can observe this difference with the Class::getNestHost &
::getNestMember calls.

3) Stacktraces
Classes generated by the LMF at runtime are not visible in stack
traces.  The pre-generated classes are visible.
Users will be able to observe this with the StackWalker class and may
notice the difference in any tools they use to process stack traces.

This is the initial set of user visible changes I've run across in
this experiment.  There are likely other corner cases that I haven't
hit yet, and other experiments will reveal other user visible
differences.

The key question out of this effort is whether these kinds of
user-visible differences are "acceptable"?  Where do we draw the line
and how do we inform users of these differences?

--Dan

[0] https://github.com/DanHeidinga/jdk-sandbox/pull/1/files (prototype code)
[1] https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/Class.html#isHidden()
[2] ex.mod.Example$$Lambda$23/0x0000000800c019f0
[3] ex.mod.Example$$Lambda$4