Pluggable Annotation Processing: problems from the IDE (IntelliJ) perspective

Fri Jan 29 12:40:54 UTC 2021

  Thanks, Joe

I would like to add some comments regarding our usage of compiler API.
We invoke javac programmatically via the JavaCompiler.getTask().call().
This allows us to customize the FileManager used by the compiler in order
to closer integrate javac with our build. This approach also allows us to
use eclipse batch compiler "ecj" via the same Compiler API, because ecj
implements it too. We do not subclass the compiler or alter the way it
works, but we use the Compiler API abilities to track and control
input/output files (via the FileManager) and get AST-level information
about references (via the TaskListener) that cannot be recovered from the
bytecode. All dependency tracking for the incremental compilation is done
outside the compiler. This approach allows us to use any compiler that
implements Compiler API. The primary source of dependency information is
the bytecode: we process it with ASM and thus obtain most class structure-
and references- related information. The fact that we get dependency data
from the bytecode allows us to remain relatively compiler-independent since
we only rely on a standard bytecode format. However, we have to use
Compiler API to get information about imports and inlined compile-time
constants directly from javac, as this data cannot be obtained from the
bytecode. This is of course, a javac-specific part, but it does not affect
the general approach to collecting dependency data. So we do not rely on
compiler built-in incremental capabilities, but rather manage and process
dependency data separately.
Because of that it is critical for the build to have access to any data
that help maintain dependency information complete. One of this data is the
mapping between source and output files. For java sources we get it from
the "sibling" argument passed to a FileManager, but for anything produced
by an AnnotationProcessor we only have to reply on originatingElements
arguments.
We have not found any other way to access this data as by wrapping the
Filer object with our own wrapper that would intercept these values. But in
order to install the wrapper over the Filer, we have to wrap the
ProcessingEnvironment and Processor objects too. As you mention these
interfaces were designed to be wrappable so we use this possibility to get
access to the data that is otherwise skipped by javac.
If there was an alternative way to access originatingElements, we'd gladly
skip the wrapping as it complicates the code and brings us to another
problem: the way third-party authors write their AnnotationProcessors.
Sure, if processor code is written so that it makes restrictive assumptions
about ProcessingEnvironment instances, the code should be fixed on
processor's side. But if processor wans to use a Tree API, it has a little
choice, because Tree.instance() is itself written in the assumption that it
is passed a certain instance of AnnotationProcessing environment object.
Despite the wrappable design of AP interface classes, javac's own internal
implementation (Trees API) contradicts to this approach. So the best thing
we came up with, is to explain the AP author how to "unwrap" the wrapped
ProcessingEnvironment object so that Tree API facade can be properly
instantiated.

Ann's post goal was to draw attention to these problems. I would also like
to suggest following changes that would simplify life for both us and AP
authors:

1. Do not ignore originatingElements on the Filer implementation level but
provide a way to pass this data further to FileManager or to some dedicated
listener that can be registered on a CompilerTask, for example. This would
allow build environments collecting dependency data to get access to it.
2. If interface wrapping is a supported approach, changes could be made to
Trees.instance(ProcessingEnvironment) to make it aware of this. For
example, there can be an "official" and standard way to unwrap the wrapped
environment object so that the facade object could be reliably instantiated.

Regards,
   Eugene

On Wed, Jan 27, 2021 at 11:30 PM Joe Darcy <joe.darcy at oracle.com> wrote:

> Hi Anna,
>
> Thanks for the comments. For context, can you describe in more detail how
> IntelliJ is using javac and any other Java compilers? For example, is javac
> being invoked programmatically, or phases of it being subclassed, is a
> different compiler infrastructure used for incremental IDE usage versus
> generation of class files?
>
> For background, when JSR 269 was being developed, besides javac there was
> an independent implementation of the API being done in Eclipse. The Eclipse
> implementation was in the context of that IDE and was a successor to the
> earlier Eclipse implementation of the apt API. Eclipse provided incremental
> running of annotation processors in response to updated files, etc. My
> understanding is the apt implementation in Eclipse had more complete
> dependency tracking, but the JSR 269 API provides fuller mechanisms to
> implement an incremental re-running policy.
>
> As you note, internally javac currently drops the originating elements
> information as it is not used in a batch compilation context.
>
> The interfaces for the environment objects, Filer, etc. were designed to
> be wrappable, but that is problematic if users do instanceof tests or rely
> on other implementation-class functionality.
>
> During JSR 269, the utility of having a standard AST API was recognized,
> but it was technically infeasible to have an AST-level API that worked well
> across two compiler that didn't shave the same code base, javac and ecj in
> particular.
>
> Thanks,
>
> -Joe
> On 1/26/2021 4:05 AM, Anna Kozlova wrote:
>
> Hi all,
>
> As we lately see the `JSR 269 - Pluggable Annotation Processing -
> Maintenance Review ballot', we would like to share our troubles with this
> API from the IDE perspective. It isn't connected to the latest changes in
> the JSR so I was advised to post our thoughts here.
>
> *Initial setup:*
> IntelliJ IDEA allows incremental compilation, which means that only
> changed code and its dependencies are recompiled instead of the whole
> project (workspace)/module (project). The task is complicated by itself but
> when people use Annotation Processors it becomes sometimes impossible
> though, it seems, we may get all information we need to build source -
> output relation and thus enable incremental compilation.
>
> *Problem description:*
> When a processor generates a java source code, a bytecode or a resource
> file, it uses "create*" methods from the javax.annotation.processing.Filer
> interface. Every "create*" method has a vararg parameter
> "originatingElements" which are supposed to be "type or package or module
> elements causally associated with the creation of this file, may be elided
> or null". Those elements are supposed to be used by the processing
> environment to register and track dependencies between generated classes
> and existing code elements (classes, methods, fields) used by the processor
> to produce the generated code. The default Filter implementation by javac
> simply ignores this data. Internally, javac calls JavaFileManager to
> actually create and store the generated data, but, unfortunately, the
> originatingElements passed by processors are already lost. When generating
> bytecode, javac uses the javax.tools.JavaFileManager.getJavaFileForOutput()
> method and passes the "sibling" argument, pointing to the corresponding
> source file. This information is used by our build system to register
> source->output dependencies and facilitate incremental
> compilation. However, if data generation is initiated by an annotation
> processor, the "sibling" parameter is always null. One could expect it to
> point to a source file object containing originatingElements passed by the
> AP.
> (Another problem is that there are multiple originating elements
> potentially corresponding to multiple source files, but there is only a
> single "sibling" reference in the getJavaFileForOutput() method.) Because
> originatingElements are ignored, our build system cannot track dependencies
> between source files and AP-generated code. So we should always assume the
> worst scenario and recompile the whole module or a project whenever we
> detect that generated code is affected.
>
> Without this information the detection itself is not as reliable as it
> could have been. So if a project heavily relies on AP code generation, we
> cannot provide the best incremental compilation experience because of lack
> of data.
>
> *Current solution:*
>
> The javadoc for create* methods in the Filer interface suggests that
>
> "This information may be used in an incremental environment to determine
> the need to rerun processors or remove generated files. Non-incremental
> environments may ignore the originating element information."
>
> In order to get access to originatingElements, our build system has to
> provide its own implementation of Filer interface. We do this by wrapping
> the original Filter implementation with a wrapper that registers
> originatingElements and delegates the call to original Filter
> implementation. This is on its own a non-trivial task, because the whole
> API provides no direct ways neither to access the originatingElements nor
> to register a custom Filter implementation. Just to wrap the original Filer
> implementation, we have to re-implement the AP discovery logic, and then
> use the JavaCompiler.CompilationTask.setProcessors() method that would
> explicitly configure processor objects. Every Processor object is in turn
> wrapped with our "wrapper", whose the only purpose is to make sure the AP
> will get a wrapped Filer and not the original one.
>
> *Problems with the current solution:*
>
> This approach generally works, but it leads to additional problems on the
> annotation processor side. Unfortunately, many popular processors assume
> that passed objects like ProcessingEnvironment and Filer have certain
> implementations. The processor code may heavily rely on this assumption,
> e.g. cast the passed object to implementation or use instanceof checks on
> it. So if processor gets a wrapped ProcessingEnvironment, it would fail to
> execute further. As a result, user's project just stops compiling.
>
> Another problem caused by such wrapping approach, is javac's so called
> Tree API. An annotation processor may use the Tree API for its internal
> logic. The only way for the processor to obtain a reference to this API
> Facade is a call com.sun.source.util.Trees.instance(ProcessingEnvironment).
> If processor passes the wrapped ProcessingEnvironment object to the
> Trees.instance() method, it won't work, because its implementation
> internally uses an instanceof check itself!
>
> Our current approach is to detect such situation and provide AP developer
> with hints in the error message. In order to make the processor work in our
> incremental environment, the AP developer has to write additional code that
> "unwraps" the passed ProcessingEnvironment object and uses the unwrapped
> object to initialize the Tree API. Such a situation, of course, is far from
> ideal: AP developers should not write code to please an IDE.
>
> *What can improve the situation:*
> So would be great if following problems were addressed in the API: there
> should be a direct way to access originatingElements. Ideally there
> accessing this data should not make AP developers to change their code. If
> code changes are inevitable on AP side, those changes should be possible
> without making assumptions about inteface implementations. There should be
> a standard way to get access to Tree API as well, or, even better, the
> original API should be extended to make implementation of complex
> processing logic possible without semi-closed Tree API.
>
> Thanks,
> Anna
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20210129/829c9ab9/attachment-0001.htm>