Attribute safety

Wed Aug 2 17:12:49 UTC 2023

Some thoughts on refining the AttributeStability enum.

Since we don't want to have to support combinations (labels and CP 
refs), this enum should be linearly ordered, where STATELESS < CP < 
LABELS < HAZMAT < UNKNOWN.  I believe the semantics so far follow this, 
but we should verify and the docs should reflect this so people don't 
ask "what if it has both X and Y".

The names of the enum constants need some work :)

I am wondering about what is the practical difference between HAZMAT and 
UNKNOWN.  Since you have to ask the attribute mapper for the stability 
level, and "unknown attribute" has historically been "no attribute 
mapper available", is this difference worth reflecting?  If so, what 
does it mean?  Are we looking to inflate a synthetic attribute mapper 
for each observed unknown attribute, whose functionality is partial?

Similarly, the AttributesProcessingOption should reflect a linear 
progression of pickiness about attributes:

  - pass all attributes
  - block unknown, but pass hazmat
  - block unknown and hazmat

I think this is a matter of documenting that this sequence is monotonic.

A related question is where we want to do the dropping.  We could drop 
attributes either on the read side (never even present it to the user) 
or the write side.  Currently we drop on read, but I think this may not 
be ideal.

Since readers should always be prepared to deal with attributes they 
don't recognize, dropping unknown attributes on reading seems 
premature.  THe only reason to drop an attribute preemptively is if 
processing it will trigger unnecessary work. For attributes on classes, 
fields, and methods, we do very little work already; we record the 
offset, name, and length of the attribute, and wrap this with an object 
that only looks at the bytes if you ask more questions of it.  (Given 
the structure of the classfile format, its hard to imagine doing less 
work, though I think we could be even lazier about inflating a String 
for the UTF8 of the name.)

Where we do a lot of work for attributes is on the attributes in the 
Code attribute -- line number tables, local variable tables, etc.  But 
we have an option for suppressing these separately.

For HAZMAT attributes such as RVTA, the validity is subtle -- the RVTA 
on a method can be invalidated if anything in the method changes 
(including other method attributes).  But if we don't even explode the 
method (because the user just passed it through as a ClassElement), we 
wouldn't want to explode it just to remove the RVTA.  Which means that 
the AttributeProcessingOption is not a guarantee that such attributes 
would be dropped, just permission to do so if the going gets tough.

So I think what we need to discuss a little further is:
  - Better names in the AttributeStability enum
  - When we should drop attributes according to the 
AttributeProcessingOption

On 8/1/2023 5:07 AM, Adam Sotona wrote:
>
> FYI: I’ve created JDK-8313452 
> <https://bugs.openjdk.org/browse/JDK-8313452> and draft PR 15101 
> <https://github.com/openjdk/jdk/pull/15101> “Improve Classfile API 
> attributes handling safety”**
>
> (the PR is targeted after JDK-8312491 
> <https://bugs.openjdk.org/browse/JDK-8312491> / PR 14968 
> <https://git.openjdk.org/jdk/pull/14968> is integrated due to Javadoc 
> update and conflicts).
>
> The custom attribute mapper simplification has lower priority and can 
> be discussed independently.
>
> Thanks,
>
> Adam
>
> *From: *Brian Goetz <brian.goetz at oracle.com>
> *Date: *Monday, 31 July 2023 16:00
> *To: *Adam Sotona <adam.sotona at oracle.com>, 
> classfile-api-dev at openjdk.org <classfile-api-dev at openjdk.org>
> *Subject: *Re: Attribute safety
>
>
>
>     I like the idea. It makes sense to simplify handling of custom
>     attributes for some common situations.
>
>     As the proposal adds a method to AtributeMapper identifying
>     “brittle” attributes, it still implies existence of custom
>     attribute mapper for each custom attribute.
>
>
> Right now, there are two choices for modeling attributes:
>
>  - No attribute mapper.  Here, we will treat it as an unknown 
> attribute, and use the option for unknown attribute handling to 
> determine whether to preserve or drop the attribute.
>
>  - Attribute mapper present.  Here, we currently assume that if there 
> is an attribute mapper, we can pass the attribute through 
> uninterpreted during transformation if the constant pool is shared, 
> and we lift the attribute to the object form and re-render to bytes it 
> if the constant pool is not shared.
>
> We've tried to make it easy to write attribute mappers, to encourage 
> people to do so.  The implicit assumption in the attribute mapper 
> design currently is that the only thing that might be environmentally 
> sensitive is the constant pool.  I think this is the assumption we 
> want to refine.  (Secondarily, the explode-and-rewrite trick can also 
> tolerate labels moving, because labels are handled through a level of 
> indirection.)
>
> Thinking some more about how to model this, a single bit is not good 
> enough.  So I propose:
>
>     enum AttributeStability { STATELESS, CP_REFS, LABELS, HAZMAT }
>
> (the names here are bad.)
>
> Where:
>
>  - STATELESS means the attribute contains only pure data, such as 
> timestamps, and can always be bulk-copied.
>  - CP_REFS means that the attribute contains only pure data and CP 
> refs, so can be bulk-copied when CP sharing is in effect, and 
> exploded/rewritten when CP sharing is not in effect
>  - LABELS means that the attribute may contain labels, so should 
> always be exploded/rewritten
>  - HAZMAT means the attribute may contain indexes into structured not 
> managed by the library (type variable lists, etc) and so we consult 
> the "toxic attributes" option to determine whether to preserve or drop it
>
> Most JVMS attributes are CP_REF.  Some like Deprecated and 
> CompilationID are STATELESS.  The TA attributes are HAZMAT.  The local 
> variable table attributes are LABELS.
>
> So the new API surface is:
>
>  - an enum for the attribute's environmental coupling
>  - an accessor on AttributeMapper for that enum
>  - an option for what to do with HAZMAT attributes (which should 
> probably be merged with the option for UKNOWN attributes)
>
> If stateless attributes were common, we might try to make life easier 
> for attribute mapper writers by making the read/write methods optional 
> for such attributes, but they are pretty uncommon so I think this is 
> not worth it.
>
>
>
>
>
>
>     Current attributes can be split into following categories :
>
>      1. Self-contained attributes (no dependency on CP nor Code
>         offsets). Such attributes can be safely transformed in any
>         situation and their payload is just copy/pasted.
>      2. Attributes with references to constant pool. Such attributes
>         can be safely transformed when the CP is shared, however
>         require custom handling (cloning of CP entries) during write
>         into a class with new CP.
>      3. Attributes with references to bytecode offsets (Code
>         attributes). Payload of such attributes can be safely
>         copy/pasted only when the Code is untouched. Otherwise they
>         require custom translation into labeled model during read and
>         back to offsets during write. These attribute most probably
>         also use constant pool.
>
>     I would suggest an alternative proposal to provide various custom
>     attribute mapper factories, mainly to simplify handling of
>     category #1 and #2 of custom attributes.
>
>     That solution would not require to add any indication methods to
>     the mappers nor global switches. Each custom mapper (composed by
>     user) will respond to the actual situation accordingly.
>
>     For category #1 there might be a single factory getting attribute
>     name and returning attribute mapper.
>
>     For category #2 there might be more options:
>
>      1. A factory producing mapper which throws on write when CP is
>         not shared
>      2. Or a factory producing mapper simplifying CP entries clone and
>         re-mapping on write when CP is not shared (it might be
>         implemented even the way the user function identify offsets of
>         CP indexes inside the payload and mapper does all the job with
>         CP entries re-mapping).
>
>     For category #3 we may also provide some mapper factories, as we
>     will better know specific use cases.
>
>     Thanks,
>
>     Adam
>
>     *From: *classfile-api-dev <classfile-api-dev-retn at openjdk.org>
>     <mailto:classfile-api-dev-retn at openjdk.org> on behalf of Brian
>     Goetz <brian.goetz at oracle.com> <mailto:brian.goetz at oracle.com>
>     *Date: *Thursday, 27 July 2023 23:02
>     *To: *classfile-api-dev at openjdk.org
>     <classfile-api-dev at openjdk.org> <mailto:classfile-api-dev at openjdk.org>
>     *Subject: *Attribute safety
>
>     We currently divide attributes into two buckets: those for which
>     an attribute mapper exists, and those for which one doesn't. The
>     latter are represented with `UnknownAttribute`.  There is also an
>     Option to determine whether unknown attributes should be discarded
>     when reading or writing a classfile. The main reason to be
>     cautious about unknown attributes is that we cannot guarantee
>     their integrity during transformation if there are any other
>     changes to the classfile, because we don't know what their raw
>     contents represent.
>
>     The library leans heavily on constant pool sharing to optimize
>     transformation.  The default behavior when transforming a
>     classfile is to keep the original constant pool as the initial
>     part of the new constant pool.  If constant pool sharing is
>     enabled in this way, attributes that contain only pure data and/or
>     constant pool offsets can be bulk-copied during transformation
>     rather than parsing and regenerating them.
>
>     Most of the known attributes meet this criteria -- that they
>     contain only pure data and/or constant pool offsets.  However,
>     there are a cluster of attributes that are more problematic: the
>     type annotation attributes.  These may contain offsets into the
>     bytecode table, exception table, list of type variables, bounds of
>     type variables, and many other structures that may be perturbed
>     during transformation.  This leaves us with some bad choices:
>
>      - Try to track if anything the attribute indexes into has been
>     changed.  (The cost and benefit here are out of balance by
>     multiple orders of magnitude here.)
>      - Copy the attribute and hope it is good enough. Much of the fine
>     structure of RVTA and friends are not actually used at runtime, so
>     this may be OK.
>      - Drop the attribute during transformation and hope that's OK.
>
>     (There are also middle grounds, such as trying to detect whether
>     the entity with the attribute (method, field, etc) has been
>     modified.  This is lighter-weight that trying to track if the
>     attribute has been invalidated, but this is already a significant
>     task.)
>
>     I haven't been happy with any of the options, but I have a
>     proposal for incrementally improving it:
>
>      - Add a method to AttributeMapper for to indicate whether or not
>     the attribute contains only pure data and/or constant pool
>     offsets.  (Almost all the attributes defined in JVMS meet this
>     restriction; only the type annotation attributes do not.)  For
>     purposes of this mail, call the ones that do not the "brittle"
>     attributes.
>
>      - Add an option to determine what to do with brittle attributes
>     under transformation: drop them, retain them, fail.
>
>     This way, nonstandard brittle attributes can be marked as such as
>     well, and get the same treatment as the known brittle attributes.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20230802/e4e5328d/attachment-0001.htm>