From brian.goetz at oracle.com Thu Jul 27 21:02:40 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 27 Jul 2023 17:02:40 -0400 Subject: Attribute safety Message-ID: <44eea6b4-f00e-154c-d3fd-3675a63934c9@oracle.com> We currently divide attributes into two buckets: those for which an attribute mapper exists, and those for which one doesn't.? The latter are represented with `UnknownAttribute`.? There is also an Option to determine whether unknown attributes should be discarded when reading or writing a classfile.? The main reason to be cautious about unknown attributes is that we cannot guarantee their integrity during transformation if there are any other changes to the classfile, because we don't know what their raw contents represent. The library leans heavily on constant pool sharing to optimize transformation.? The default behavior when transforming a classfile is to keep the original constant pool as the initial part of the new constant pool.? If constant pool sharing is enabled in this way, attributes that contain only pure data and/or constant pool offsets can be bulk-copied during transformation rather than parsing and regenerating them. Most of the known attributes meet this criteria -- that they contain only pure data and/or constant pool offsets.? However, there are a cluster of attributes that are more problematic: the type annotation attributes.? These may contain offsets into the bytecode table, exception table, list of type variables, bounds of type variables, and many other structures that may be perturbed during transformation.? This leaves us with some bad choices: ?- Try to track if anything the attribute indexes into has been changed.? (The cost and benefit here are out of balance by multiple orders of magnitude here.) ?- Copy the attribute and hope it is good enough.? Much of the fine structure of RVTA and friends are not actually used at runtime, so this may be OK. ?- Drop the attribute during transformation and hope that's OK. (There are also middle grounds, such as trying to detect whether the entity with the attribute (method, field, etc) has been modified.? This is lighter-weight that trying to track if the attribute has been invalidated, but this is already a significant task.) I haven't been happy with any of the options, but I have a proposal for incrementally improving it: ?- Add a method to AttributeMapper for to indicate whether or not the attribute contains only pure data and/or constant pool offsets.? (Almost all the attributes defined in JVMS meet this restriction; only the type annotation attributes do not.)? For purposes of this mail, call the ones that do not the "brittle" attributes. ?- Add an option to determine what to do with brittle attributes under transformation: drop them, retain them, fail. This way, nonstandard brittle attributes can be marked as such as well, and get the same treatment as the known brittle attributes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Fri Jul 28 00:04:49 2023 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Fri, 28 Jul 2023 01:04:49 +0100 Subject: Attribute safety In-Reply-To: <44eea6b4-f00e-154c-d3fd-3675a63934c9@oracle.com> References: <44eea6b4-f00e-154c-d3fd-3675a63934c9@oracle.com> Message-ID: <3afccb21-d44b-4575-cf20-30736d329620@oracle.com> > > Most of the known attributes meet this criteria -- that they contain > only pure data and/or constant pool offsets.? However, there are a > cluster of attributes that are more problematic: the type annotation > attributes. Don't other attributes have this issue too? E.g. LocalVariableTable etc. ? > These may contain offsets into the bytecode table, exception table, > list of type variables, bounds of type variables, and many other > structures that may be perturbed during transformation.? This leaves > us with some bad choices: > > ?- Try to track if anything the attribute indexes into has been > changed.? (The cost and benefit here are out of balance by multiple > orders of magnitude here.) The problem with this approach is that you need a semantic description of what the attribute is, in order to be able to understand whether its contents moved or not. Pack200 allowed this - e.g. it had a little layout language to describe custom classfile attributes, so that the pack200 tool could compress even attributes it didn't know about from the start. Pretty cool, but also hard to use, and probably adds complexity. > ?- Copy the attribute and hope it is good enough.? Much of the fine > structure of RVTA and friends are not actually used at runtime, so > this may be OK. > ?- Drop the attribute during transformation and hope that's OK. Maybe an option to select which one of the last two behaviors you want? > > (There are also middle grounds, such as trying to detect whether the > entity with the attribute (method, field, etc) has been modified.? > This is lighter-weight that trying to track if the attribute has been > invalidated, but this is already a significant task.) > > I haven't been happy with any of the options, but I have a proposal > for incrementally improving it: > > ?- Add a method to AttributeMapper for to indicate whether or not the > attribute contains only pure data and/or constant pool offsets.? > (Almost all the attributes defined in JVMS meet this restriction; only > the type annotation attributes do not.)? For purposes of this mail, > call the ones that do not the "brittle" attributes. > > ?- Add an option to determine what to do with brittle attributes under > transformation: drop them, retain them, fail. > > This way, nonstandard brittle attributes can be marked as such as > well, and get the same treatment as the known brittle attributes. It is a little awkward that the user feels like PC references are being adjusted in some attributes (exception tables, local variable related attributes), but not for TAs. Other than that looks good. Maurizio > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jul 28 12:46:00 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Jul 2023 08:46:00 -0400 Subject: Attribute safety In-Reply-To: <3afccb21-d44b-4575-cf20-30736d329620@oracle.com> References: <44eea6b4-f00e-154c-d3fd-3675a63934c9@oracle.com> <3afccb21-d44b-4575-cf20-30736d329620@oracle.com> Message-ID: <0cfe9b98-fa89-967b-dfee-8bdf51a8c535@oracle.com> On 7/27/2023 8:04 PM, Maurizio Cimadamore wrote: > >> >> Most of the known attributes meet this criteria -- that they contain >> only pure data and/or constant pool offsets. However, there are a >> cluster of attributes that are more problematic: the type annotation >> attributes. > Don't other attributes have this issue too? E.g. LocalVariableTable etc. ? I was incomplete in my characterization.? LVT / LNT and other Code attribute-attributes have offsets into the code array, but these are mapped to Label and so can tolerate adaptation as well.? It is really the TA attributes that have "random crazy dependencies", though one can imagine nonstandard attributes also having random crazy dependencies. >> These may contain offsets into the bytecode table, exception table, >> list of type variables, bounds of type variables, and many other >> structures that may be perturbed during transformation. This leaves >> us with some bad choices: >> >> ?- Try to track if anything the attribute indexes into has been >> changed.? (The cost and benefit here are out of balance by multiple >> orders of magnitude here.) > The problem with this approach is that you need a semantic description > of what the attribute is, in order to be able to understand whether > its contents moved or not. Pack200 allowed this - e.g. it had a little > layout language to describe custom classfile attributes, so that the > pack200 tool could compress even attributes it didn't know about from > the start. Pretty cool, but also hard to use, and probably adds > complexity. Right, what I'm aiming at is something more like "can tolerate binary teleportation across classfiles if the CP is stable."? This is a middle ground that describes the vast majority of attributes. > It is a little awkward that the user feels like PC references are > being adjusted in some attributes (exception tables, local variable > related attributes), but not for TAs. Other than that looks good. Yeah, the reality is that PC references are such a tiny part of the brittleness of TA attributes, that incurring lots of runtime cost for that one seems entirely wasted. -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Fri Jul 28 13:18:13 2023 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Fri, 28 Jul 2023 14:18:13 +0100 Subject: Attribute safety In-Reply-To: <0cfe9b98-fa89-967b-dfee-8bdf51a8c535@oracle.com> References: <44eea6b4-f00e-154c-d3fd-3675a63934c9@oracle.com> <3afccb21-d44b-4575-cf20-30736d329620@oracle.com> <0cfe9b98-fa89-967b-dfee-8bdf51a8c535@oracle.com> Message-ID: <2913afc6-9300-4f63-0653-7f59e05ed0ce@oracle.com> > >> It is a little awkward that the user feels like PC references are >> being adjusted in some attributes (exception tables, local variable >> related attributes), but not for TAs. Other than that looks good. > > Yeah, the reality is that PC references are such a tiny part of the > brittleness of TA attributes, that incurring lots of runtime cost for > that one seems entirely wasted. That said, we could refine your notion of "non-brittle attribute" to also include "it refers to bytecodes via Label" Maurizio -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Sat Jul 29 08:02:47 2023 From: adam.sotona at oracle.com (Adam Sotona) Date: Sat, 29 Jul 2023 08:02:47 +0000 Subject: Attribute safety In-Reply-To: <44eea6b4-f00e-154c-d3fd-3675a63934c9@oracle.com> References: <44eea6b4-f00e-154c-d3fd-3675a63934c9@oracle.com> Message-ID: I like the idea. It makes sense to simplify handling of custom attributes for some common situations. As the proposal adds a method to AtributeMapper identifying ?brittle? attributes, it still implies existence of custom attribute mapper for each custom attribute. Current attributes can be split into following categories : 1. Self-contained attributes (no dependency on CP nor Code offsets). Such attributes can be safely transformed in any situation and their payload is just copy/pasted. 2. Attributes with references to constant pool. Such attributes can be safely transformed when the CP is shared, however require custom handling (cloning of CP entries) during write into a class with new CP. 3. Attributes with references to bytecode offsets (Code attributes). Payload of such attributes can be safely copy/pasted only when the Code is untouched. Otherwise they require custom translation into labeled model during read and back to offsets during write. These attribute most probably also use constant pool. I would suggest an alternative proposal to provide various custom attribute mapper factories, mainly to simplify handling of category #1 and #2 of custom attributes. That solution would not require to add any indication methods to the mappers nor global switches. Each custom mapper (composed by user) will respond to the actual situation accordingly. For category #1 there might be a single factory getting attribute name and returning attribute mapper. For category #2 there might be more options: * A factory producing mapper which throws on write when CP is not shared * Or a factory producing mapper simplifying CP entries clone and re-mapping on write when CP is not shared (it might be implemented even the way the user function identify offsets of CP indexes inside the payload and mapper does all the job with CP entries re-mapping). For category #3 we may also provide some mapper factories, as we will better know specific use cases. Thanks, Adam From: classfile-api-dev on behalf of Brian Goetz Date: Thursday, 27 July 2023 23:02 To: classfile-api-dev at openjdk.org Subject: Attribute safety We currently divide attributes into two buckets: those for which an attribute mapper exists, and those for which one doesn't. The latter are represented with `UnknownAttribute`. There is also an Option to determine whether unknown attributes should be discarded when reading or writing a classfile. The main reason to be cautious about unknown attributes is that we cannot guarantee their integrity during transformation if there are any other changes to the classfile, because we don't know what their raw contents represent. The library leans heavily on constant pool sharing to optimize transformation. The default behavior when transforming a classfile is to keep the original constant pool as the initial part of the new constant pool. If constant pool sharing is enabled in this way, attributes that contain only pure data and/or constant pool offsets can be bulk-copied during transformation rather than parsing and regenerating them. Most of the known attributes meet this criteria -- that they contain only pure data and/or constant pool offsets. However, there are a cluster of attributes that are more problematic: the type annotation attributes. These may contain offsets into the bytecode table, exception table, list of type variables, bounds of type variables, and many other structures that may be perturbed during transformation. This leaves us with some bad choices: - Try to track if anything the attribute indexes into has been changed. (The cost and benefit here are out of balance by multiple orders of magnitude here.) - Copy the attribute and hope it is good enough. Much of the fine structure of RVTA and friends are not actually used at runtime, so this may be OK. - Drop the attribute during transformation and hope that's OK. (There are also middle grounds, such as trying to detect whether the entity with the attribute (method, field, etc) has been modified. This is lighter-weight that trying to track if the attribute has been invalidated, but this is already a significant task.) I haven't been happy with any of the options, but I have a proposal for incrementally improving it: - Add a method to AttributeMapper for to indicate whether or not the attribute contains only pure data and/or constant pool offsets. (Almost all the attributes defined in JVMS meet this restriction; only the type annotation attributes do not.) For purposes of this mail, call the ones that do not the "brittle" attributes. - Add an option to determine what to do with brittle attributes under transformation: drop them, retain them, fail. This way, nonstandard brittle attributes can be marked as such as well, and get the same treatment as the known brittle attributes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Mon Jul 31 13:19:41 2023 From: adam.sotona at oracle.com (Adam Sotona) Date: Mon, 31 Jul 2023 13:19:41 +0000 Subject: Attribute safety Message-ID: I?ve added some factories below, which may simplify handling of some common custom attribute cases, for example: Classfile.of(Classfile.AttributeMapperOption.of(attrName -> switch (attrName.stringValue()) { case "MyCustomBinaryContentAttribute" -> selfContainedCustomAttribute(attrName.stringValue()); case "MyCustomAttributeWithCPReference" -> singleConstantPoolEntryCustomAttribute(attrName.stringValue()); default -> null; })); For category #1 there might be a single factory getting attribute name and returning attribute mapper. static > AttributeMapper selfContainedCustomAttribute(String attributeName, BiFunction, byte[], T> attributeFactory, Function contentAccessor) { return new AttributeMapper() { @Override public String name() { return attributeName; } @Override public T readAttribute(AttributedElement enclosing, ClassReader cf, int pos) { return attributeFactory.apply(this, cf.readBytes(pos, cf.readInt(pos - 4))); } @Override public void writeAttribute(BufWriter buf, T attr) { buf.writeBytes(contentAccessor.apply(attr)); } }; } static AttributeMapper selfContainedCustomAttribute(String attributeName) { class SelfContainedAttribute extends CustomAttribute { final byte[] content; public SelfContainedAttribute(AttributeMapper mapper, byte[] content) { super(mapper); this.content = content; } } return selfContainedCustomAttribute(attributeName, SelfContainedAttribute::new, a -> a.content); } For category #2 there might be more options: * A factory producing mapper which throws on write when CP is not shared This is default behavior of unknown attributes, so there is no user action needed. * Or a factory producing mapper simplifying CP entries clone and re-mapping on write when CP is not shared: static > AttributeMapper singleConstantPoolEntryCustomAttribute(String attributeName, BiFunction, PoolEntry, T> attributeFactory, Function entryAccessor) { return new AttributeMapper() { @Override public String name() { return attributeName; } @Override public T readAttribute(AttributedElement enclosing, ClassReader cf, int pos) { return attributeFactory.apply(this, cf.readEntryOrNull(pos)); } @Override public void writeAttribute(BufWriter buf, T attr) { buf.writeIndexOrZero(entryAccessor.apply(attr)); } }; } static AttributeMapper singleConstantPoolEntryCustomAttribute(String attributeName) { class SingleEntryAttribute extends CustomAttribute { final PoolEntry entry; public SingleEntryAttribute(AttributeMapper mapper, PoolEntry entry) { super(mapper); this.entry = entry; } } return singleConstantPoolEntryCustomAttribute(attributeName, SingleEntryAttribute::new, a -> a.entry); } -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Mon Jul 31 14:00:24 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 31 Jul 2023 10:00:24 -0400 Subject: Attribute safety In-Reply-To: References: <44eea6b4-f00e-154c-d3fd-3675a63934c9@oracle.com> Message-ID: <7ded73f4-71fe-1cd6-e69f-c5eac985cf41@oracle.com> > I like the idea. It makes sense to simplify handling of custom > attributes for some common situations. > > As the proposal adds a method to AtributeMapper identifying ?brittle? > attributes, it still implies existence of custom attribute mapper for > each custom attribute. > Right now, there are two choices for modeling attributes: ?- No attribute mapper.? Here, we will treat it as an unknown attribute, and use the option for unknown attribute handling to determine whether to preserve or drop the attribute. ?- Attribute mapper present.? Here, we currently assume that if there is an attribute mapper, we can pass the attribute through uninterpreted during transformation if the constant pool is shared, and we lift the attribute to the object form and re-render to bytes it if the constant pool is not shared. We've tried to make it easy to write attribute mappers, to encourage people to do so.? The implicit assumption in the attribute mapper design currently is that the only thing that might be environmentally sensitive is the constant pool.? I think this is the assumption we want to refine.? (Secondarily, the explode-and-rewrite trick can also tolerate labels moving, because labels are handled through a level of indirection.) Thinking some more about how to model this, a single bit is not good enough.? So I propose: ??? enum AttributeStability { STATELESS, CP_REFS, LABELS, HAZMAT } (the names here are bad.) Where: ?- STATELESS means the attribute contains only pure data, such as timestamps, and can always be bulk-copied. ?- CP_REFS means that the attribute contains only pure data and CP refs, so can be bulk-copied when CP sharing is in effect, and exploded/rewritten when CP sharing is not in effect ?- LABELS means that the attribute may contain labels, so should always be exploded/rewritten ?- HAZMAT means the attribute may contain indexes into structured not managed by the library (type variable lists, etc) and so we consult the "toxic attributes" option to determine whether to preserve or drop it Most JVMS attributes are CP_REF.? Some like Deprecated and CompilationID are STATELESS.? The TA attributes are HAZMAT.? The local variable table attributes are LABELS. So the new API surface is: ?- an enum for the attribute's environmental coupling ?- an accessor on AttributeMapper for that enum ?- an option for what to do with HAZMAT attributes (which should probably be merged with the option for UKNOWN attributes) If stateless attributes were common, we might try to make life easier for attribute mapper writers by making the read/write methods optional for such attributes, but they are pretty uncommon so I think this is not worth it. > Current attributes can be split into following categories : > > 1. Self-contained attributes (no dependency on CP nor Code offsets). > Such attributes can be safely transformed in any situation and > their payload is just copy/pasted. > 2. Attributes with references to constant pool. Such attributes can > be safely transformed when the CP is shared, however require > custom handling (cloning of CP entries) during write into a class > with new CP. > 3. Attributes with references to bytecode offsets (Code attributes). > Payload of such attributes can be safely copy/pasted only when the > Code is untouched. Otherwise they require custom translation into > labeled model during read and back to offsets during write. These > attribute most probably also use constant pool. > > I would suggest an alternative proposal to provide various custom > attribute mapper factories, mainly to simplify handling of category #1 > and #2 of custom attributes. > > That solution would not require to add any indication methods to the > mappers nor global switches. Each custom mapper (composed by user) > will respond to the actual situation accordingly. > > For category #1 there might be a single factory getting attribute name > and returning attribute mapper. > > For category #2 there might be more options: > > * A factory producing mapper which throws on write when CP is not shared > * Or a factory producing mapper simplifying CP entries clone and > re-mapping on write when CP is not shared (it might be implemented > even the way the user function identify offsets of CP indexes > inside the payload and mapper does all the job with CP entries > re-mapping). > > For category #3 we may also provide some mapper factories, as we will > better know specific use cases. > > Thanks, > > Adam > > *From: *classfile-api-dev on > behalf of Brian Goetz > *Date: *Thursday, 27 July 2023 23:02 > *To: *classfile-api-dev at openjdk.org > *Subject: *Attribute safety > > We currently divide attributes into two buckets: those for which an > attribute mapper exists, and those for which one doesn't.? The latter > are represented with `UnknownAttribute`.? There is also an Option to > determine whether unknown attributes should be discarded when reading > or writing a classfile.? The main reason to be cautious about unknown > attributes is that we cannot guarantee their integrity during > transformation if there are any other changes to the classfile, > because we don't know what their raw contents represent. > > The library leans heavily on constant pool sharing to optimize > transformation.? The default behavior when transforming a classfile is > to keep the original constant pool as the initial part of the new > constant pool.? If constant pool sharing is enabled in this way, > attributes that contain only pure data and/or constant pool offsets > can be bulk-copied during transformation rather than parsing and > regenerating them. > > Most of the known attributes meet this criteria -- that they contain > only pure data and/or constant pool offsets.? However, there are a > cluster of attributes that are more problematic: the type annotation > attributes.? These may contain offsets into the bytecode table, > exception table, list of type variables, bounds of type variables, and > many other structures that may be perturbed during transformation.? > This leaves us with some bad choices: > > ?- Try to track if anything the attribute indexes into has been > changed.? (The cost and benefit here are out of balance by multiple > orders of magnitude here.) > ?- Copy the attribute and hope it is good enough.? Much of the fine > structure of RVTA and friends are not actually used at runtime, so > this may be OK. > ?- Drop the attribute during transformation and hope that's OK. > > (There are also middle grounds, such as trying to detect whether the > entity with the attribute (method, field, etc) has been modified.? > This is lighter-weight that trying to track if the attribute has been > invalidated, but this is already a significant task.) > > I haven't been happy with any of the options, but I have a proposal > for incrementally improving it: > > ?- Add a method to AttributeMapper for to indicate whether or not the > attribute contains only pure data and/or constant pool offsets.? > (Almost all the attributes defined in JVMS meet this restriction; only > the type annotation attributes do not.)? For purposes of this mail, > call the ones that do not the "brittle" attributes. > > ?- Add an option to determine what to do with brittle attributes under > transformation: drop them, retain them, fail. > > This way, nonstandard brittle attributes can be marked as such as > well, and get the same treatment as the known brittle attributes. > -------------- next part -------------- An HTML attachment was scrubbed... URL: