Bug in attribute writing

Mon May 22 14:32:24 UTC 2023

On 5/22/2023 10:09 AM, - wrote:
> For writing, I don't think unknown attributes are ever created when
> they are presented in the first place; so we can assume "support
> unknown attributes" is always true.

Not quite sure what you're saying here?

>
> A tricky part of unknown attribute handling is that
> AttributedElement::attributes() return value may contain null if
> unknown attributes aren't supported. This should be explicitly
> specified.

Or perhaps unknown attributes should be removed from the list and the 
size adjusted, as if the attribute were not present.  Having a null in 
this list likely helps no one.

> The bound type annotation attributes indeed may not properly be
> rewritten when its dependencies (exception handlers, labels, parameter
> indices, etc.) are updated. Should users be advised to track these
> changes to attributes on their own?

Sigh, type annotations are the bane of my existence.  There are really 
no good answers here.

Most attributes contain indexes into the CP, but express little other 
dependency on the structure of the classfile.  As a result, we can use 
"is there a shared CP" as a proxy for "is it OK to write an attribute 
from one classfile to another".  This is true for all the standard 
attributes except the type annotation ones, which contain indexes into 
the code array, exception table, signature attribute ("mth bound of nth 
type parameter"), etc.  Tracking whether a transform would undermine 
these indexes is probably nearly impossible, and even if it were, the 
cost would be orders of magnitude out of line.  So we have a few bad 
choices:

  - silently drop TA attributes on the floor when adapting;
  - faithfully write presented TA attributes even though they might be 
invalid;
  - insist the user unpack and repack the attribute.

The "unpack" option is just make-work that won't do much to improve the 
accuracy of the result; much of what's in the TA attributes is of 
interest to static compilers, not bytecode processors.  Silently 
dropping the attribute seems rude.  So the least-bad alternative is to 
just believe the user that this attribute is valid enough for use at 
runtime.

I'd like a fourth option, which maybe involves some sort of classfile 
processing option to give us better guidance, but so far we are not sure 
what is best here.

> I think we can alleviate such dependencies by exposing pointers in the
> model: for instance, we will use Label instead of int for startPc for
> LocalVariableTable; the catch table index in type annotation targets
> can refer to ExceptionCatch. Then, in the writeDirect checks, we
> propagate checks to the components: if the dependencies of the Labels
> or ExceptionCatchs are the same (much like how a constant pool is
> still the same), we will perform direct write; otherwise we perform a
> clean write.

There are two kinds of references that are virtualized by the library: 
CP indexes and offsets (labels).  The underlying goal is that a 
classfile element from one classfile can be correctly interpreted in 
another.  For CP indexes, if the two classfile share a CP, then CP 
indexes are already good; if not, they contain enough information to be 
copied at write time.  For labels, labels have a meaning specific to the 
containing Code attribute / CodeBuilder, so taht the same label can be 
correctly interpreted in both contexts. We do not virtualize anything else.

We could try to do the same for the exception table; this is worth 
exploring, since we've definitely encountered bugs related to the table 
being reordered before.  But for TA attributes, this eventually runs 
into the absurd, because it contains all sorts of exotic pointers like 
"mth bound of nth type parameter".

Further, virtualization of more quantities starts to exhibit poor 
returns, because we want to be able to minimize the number of 
"unpack/repack" operations when doing a trivial transform.  We can't 
help but do so for labels, but if we had to do so for everything else, 
the costs would likely spiral out of control.

Zooming out, there's a real lesson here: the type annotation attributes 
violated an implicit design principle for attributes, and now everyone 
downstream is paying the cost.  But I am diffident about imposing more 
of these costs on every user of every bytecode-processing framework for 
the sake of incrementally better TA fidelity.

Where we are now is a compromise; do you see a better compromise?

>
> Chen Liang
>
> On Thu, May 18, 2023 at 9:34 AM Brian Goetz <brian.goetz at oracle.com> wrote:
>> I just noticed that in BoundAttribute.BoundUnknownAttribute:
>>
>>          private void checkWriteSupported(Function<ConstantPool, Boolean> condition) {
>>              if (!condition.apply(classReader))
>>                  throw new UnsupportedOperationException("Write of unknown attribute " + attributeName() + " not supported to alien constant pool");
>>          }
>>
>> we should at least be checking the option "support unknown attributes" before we say OK.
>>
>> Overall, I am not sure we have a fully consistent story for unsupported attributes here.  The basic problem with attributes for which we don't have an AttributeMapper is that we can't be sure about validity.  We don't know where the CP indexes are (though if we are sharing CPs, this is OK), and we don't know what other data might be off.
>>
>> (We also have a bigger problem with the type annotations attributes, which are known to have all sorts of non-CP offsets (nth bound of mth type variable, nth exception, indexes into code array, etc) which could easily but thrown off by transformation and there is no practical way to detect whether the original RVTA is still valid.)