Attribute safety

Mon Jul 31 14:00:24 UTC 2023

> I like the idea. It makes sense to simplify handling of custom 
> attributes for some common situations.
>
> As the proposal adds a method to AtributeMapper identifying “brittle” 
> attributes, it still implies existence of custom attribute mapper for 
> each custom attribute.
>

Right now, there are two choices for modeling attributes:

  - No attribute mapper.  Here, we will treat it as an unknown 
attribute, and use the option for unknown attribute handling to 
determine whether to preserve or drop the attribute.

  - Attribute mapper present.  Here, we currently assume that if there 
is an attribute mapper, we can pass the attribute through uninterpreted 
during transformation if the constant pool is shared, and we lift the 
attribute to the object form and re-render to bytes it if the constant 
pool is not shared.

We've tried to make it easy to write attribute mappers, to encourage 
people to do so.  The implicit assumption in the attribute mapper design 
currently is that the only thing that might be environmentally sensitive 
is the constant pool.  I think this is the assumption we want to 
refine.  (Secondarily, the explode-and-rewrite trick can also tolerate 
labels moving, because labels are handled through a level of indirection.)

Thinking some more about how to model this, a single bit is not good 
enough.  So I propose:

     enum AttributeStability { STATELESS, CP_REFS, LABELS, HAZMAT }

(the names here are bad.)

Where:

  - STATELESS means the attribute contains only pure data, such as 
timestamps, and can always be bulk-copied.
  - CP_REFS means that the attribute contains only pure data and CP 
refs, so can be bulk-copied when CP sharing is in effect, and 
exploded/rewritten when CP sharing is not in effect
  - LABELS means that the attribute may contain labels, so should always 
be exploded/rewritten
  - HAZMAT means the attribute may contain indexes into structured not 
managed by the library (type variable lists, etc) and so we consult the 
"toxic attributes" option to determine whether to preserve or drop it

Most JVMS attributes are CP_REF.  Some like Deprecated and CompilationID 
are STATELESS.  The TA attributes are HAZMAT.  The local variable table 
attributes are LABELS.

So the new API surface is:

  - an enum for the attribute's environmental coupling
  - an accessor on AttributeMapper for that enum
  - an option for what to do with HAZMAT attributes (which should 
probably be merged with the option for UKNOWN attributes)

If stateless attributes were common, we might try to make life easier 
for attribute mapper writers by making the read/write methods optional 
for such attributes, but they are pretty uncommon so I think this is not 
worth it.

> Current attributes can be split into following categories :
>
>  1. Self-contained attributes (no dependency on CP nor Code offsets).
>     Such attributes can be safely transformed in any situation and
>     their payload is just copy/pasted.
>  2. Attributes with references to constant pool. Such attributes can
>     be safely transformed when the CP is shared, however require
>     custom handling (cloning of CP entries) during write into a class
>     with new CP.
>  3. Attributes with references to bytecode offsets (Code attributes).
>     Payload of such attributes can be safely copy/pasted only when the
>     Code is untouched. Otherwise they require custom translation into
>     labeled model during read and back to offsets during write. These
>     attribute most probably also use constant pool.
>
> I would suggest an alternative proposal to provide various custom 
> attribute mapper factories, mainly to simplify handling of category #1 
> and #2 of custom attributes.
>
> That solution would not require to add any indication methods to the 
> mappers nor global switches. Each custom mapper (composed by user) 
> will respond to the actual situation accordingly.
>
> For category #1 there might be a single factory getting attribute name 
> and returning attribute mapper.
>
> For category #2 there might be more options:
>
>   * A factory producing mapper which throws on write when CP is not shared
>   * Or a factory producing mapper simplifying CP entries clone and
>     re-mapping on write when CP is not shared (it might be implemented
>     even the way the user function identify offsets of CP indexes
>     inside the payload and mapper does all the job with CP entries
>     re-mapping).
>
> For category #3 we may also provide some mapper factories, as we will 
> better know specific use cases.
>
> Thanks,
>
> Adam
>
> *From: *classfile-api-dev <classfile-api-dev-retn at openjdk.org> on 
> behalf of Brian Goetz <brian.goetz at oracle.com>
> *Date: *Thursday, 27 July 2023 23:02
> *To: *classfile-api-dev at openjdk.org <classfile-api-dev at openjdk.org>
> *Subject: *Attribute safety
>
> We currently divide attributes into two buckets: those for which an 
> attribute mapper exists, and those for which one doesn't.  The latter 
> are represented with `UnknownAttribute`.  There is also an Option to 
> determine whether unknown attributes should be discarded when reading 
> or writing a classfile.  The main reason to be cautious about unknown 
> attributes is that we cannot guarantee their integrity during 
> transformation if there are any other changes to the classfile, 
> because we don't know what their raw contents represent.
>
> The library leans heavily on constant pool sharing to optimize 
> transformation.  The default behavior when transforming a classfile is 
> to keep the original constant pool as the initial part of the new 
> constant pool.  If constant pool sharing is enabled in this way, 
> attributes that contain only pure data and/or constant pool offsets 
> can be bulk-copied during transformation rather than parsing and 
> regenerating them.
>
> Most of the known attributes meet this criteria -- that they contain 
> only pure data and/or constant pool offsets.  However, there are a 
> cluster of attributes that are more problematic: the type annotation 
> attributes.  These may contain offsets into the bytecode table, 
> exception table, list of type variables, bounds of type variables, and 
> many other structures that may be perturbed during transformation.  
> This leaves us with some bad choices:
>
>  - Try to track if anything the attribute indexes into has been 
> changed.  (The cost and benefit here are out of balance by multiple 
> orders of magnitude here.)
>  - Copy the attribute and hope it is good enough.  Much of the fine 
> structure of RVTA and friends are not actually used at runtime, so 
> this may be OK.
>  - Drop the attribute during transformation and hope that's OK.
>
> (There are also middle grounds, such as trying to detect whether the 
> entity with the attribute (method, field, etc) has been modified.  
> This is lighter-weight that trying to track if the attribute has been 
> invalidated, but this is already a significant task.)
>
> I haven't been happy with any of the options, but I have a proposal 
> for incrementally improving it:
>
>  - Add a method to AttributeMapper for to indicate whether or not the 
> attribute contains only pure data and/or constant pool offsets.  
> (Almost all the attributes defined in JVMS meet this restriction; only 
> the type annotation attributes do not.)  For purposes of this mail, 
> call the ones that do not the "brittle" attributes.
>
>  - Add an option to determine what to do with brittle attributes under 
> transformation: drop them, retain them, fail.
>
> This way, nonstandard brittle attributes can be marked as such as 
> well, and get the same treatment as the known brittle attributes.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20230731/91a85731/attachment-0001.htm>