From adam.sotona at oracle.com Tue Aug 1 09:07:44 2023 From: adam.sotona at oracle.com (Adam Sotona) Date: Tue, 1 Aug 2023 09:07:44 +0000 Subject: Attribute safety In-Reply-To: <7ded73f4-71fe-1cd6-e69f-c5eac985cf41@oracle.com> References: <44eea6b4-f00e-154c-d3fd-3675a63934c9@oracle.com> <7ded73f4-71fe-1cd6-e69f-c5eac985cf41@oracle.com> Message-ID: FYI: I?ve created JDK-8313452 and draft PR 15101 ?Improve Classfile API attributes handling safety? (the PR is targeted after JDK-8312491 / PR 14968 is integrated due to Javadoc update and conflicts). The custom attribute mapper simplification has lower priority and can be discussed independently. Thanks, Adam From: Brian Goetz Date: Monday, 31 July 2023 16:00 To: Adam Sotona , classfile-api-dev at openjdk.org Subject: Re: Attribute safety I like the idea. It makes sense to simplify handling of custom attributes for some common situations. As the proposal adds a method to AtributeMapper identifying ?brittle? attributes, it still implies existence of custom attribute mapper for each custom attribute. Right now, there are two choices for modeling attributes: - No attribute mapper. Here, we will treat it as an unknown attribute, and use the option for unknown attribute handling to determine whether to preserve or drop the attribute. - Attribute mapper present. Here, we currently assume that if there is an attribute mapper, we can pass the attribute through uninterpreted during transformation if the constant pool is shared, and we lift the attribute to the object form and re-render to bytes it if the constant pool is not shared. We've tried to make it easy to write attribute mappers, to encourage people to do so. The implicit assumption in the attribute mapper design currently is that the only thing that might be environmentally sensitive is the constant pool. I think this is the assumption we want to refine. (Secondarily, the explode-and-rewrite trick can also tolerate labels moving, because labels are handled through a level of indirection.) Thinking some more about how to model this, a single bit is not good enough. So I propose: enum AttributeStability { STATELESS, CP_REFS, LABELS, HAZMAT } (the names here are bad.) Where: - STATELESS means the attribute contains only pure data, such as timestamps, and can always be bulk-copied. - CP_REFS means that the attribute contains only pure data and CP refs, so can be bulk-copied when CP sharing is in effect, and exploded/rewritten when CP sharing is not in effect - LABELS means that the attribute may contain labels, so should always be exploded/rewritten - HAZMAT means the attribute may contain indexes into structured not managed by the library (type variable lists, etc) and so we consult the "toxic attributes" option to determine whether to preserve or drop it Most JVMS attributes are CP_REF. Some like Deprecated and CompilationID are STATELESS. The TA attributes are HAZMAT. The local variable table attributes are LABELS. So the new API surface is: - an enum for the attribute's environmental coupling - an accessor on AttributeMapper for that enum - an option for what to do with HAZMAT attributes (which should probably be merged with the option for UKNOWN attributes) If stateless attributes were common, we might try to make life easier for attribute mapper writers by making the read/write methods optional for such attributes, but they are pretty uncommon so I think this is not worth it. Current attributes can be split into following categories : 1. Self-contained attributes (no dependency on CP nor Code offsets). Such attributes can be safely transformed in any situation and their payload is just copy/pasted. 2. Attributes with references to constant pool. Such attributes can be safely transformed when the CP is shared, however require custom handling (cloning of CP entries) during write into a class with new CP. 3. Attributes with references to bytecode offsets (Code attributes). Payload of such attributes can be safely copy/pasted only when the Code is untouched. Otherwise they require custom translation into labeled model during read and back to offsets during write. These attribute most probably also use constant pool. I would suggest an alternative proposal to provide various custom attribute mapper factories, mainly to simplify handling of category #1 and #2 of custom attributes. That solution would not require to add any indication methods to the mappers nor global switches. Each custom mapper (composed by user) will respond to the actual situation accordingly. For category #1 there might be a single factory getting attribute name and returning attribute mapper. For category #2 there might be more options: 1. A factory producing mapper which throws on write when CP is not shared 2. Or a factory producing mapper simplifying CP entries clone and re-mapping on write when CP is not shared (it might be implemented even the way the user function identify offsets of CP indexes inside the payload and mapper does all the job with CP entries re-mapping). For category #3 we may also provide some mapper factories, as we will better know specific use cases. Thanks, Adam From: classfile-api-dev on behalf of Brian Goetz Date: Thursday, 27 July 2023 23:02 To: classfile-api-dev at openjdk.org Subject: Attribute safety We currently divide attributes into two buckets: those for which an attribute mapper exists, and those for which one doesn't. The latter are represented with `UnknownAttribute`. There is also an Option to determine whether unknown attributes should be discarded when reading or writing a classfile. The main reason to be cautious about unknown attributes is that we cannot guarantee their integrity during transformation if there are any other changes to the classfile, because we don't know what their raw contents represent. The library leans heavily on constant pool sharing to optimize transformation. The default behavior when transforming a classfile is to keep the original constant pool as the initial part of the new constant pool. If constant pool sharing is enabled in this way, attributes that contain only pure data and/or constant pool offsets can be bulk-copied during transformation rather than parsing and regenerating them. Most of the known attributes meet this criteria -- that they contain only pure data and/or constant pool offsets. However, there are a cluster of attributes that are more problematic: the type annotation attributes. These may contain offsets into the bytecode table, exception table, list of type variables, bounds of type variables, and many other structures that may be perturbed during transformation. This leaves us with some bad choices: - Try to track if anything the attribute indexes into has been changed. (The cost and benefit here are out of balance by multiple orders of magnitude here.) - Copy the attribute and hope it is good enough. Much of the fine structure of RVTA and friends are not actually used at runtime, so this may be OK. - Drop the attribute during transformation and hope that's OK. (There are also middle grounds, such as trying to detect whether the entity with the attribute (method, field, etc) has been modified. This is lighter-weight that trying to track if the attribute has been invalidated, but this is already a significant task.) I haven't been happy with any of the options, but I have a proposal for incrementally improving it: - Add a method to AttributeMapper for to indicate whether or not the attribute contains only pure data and/or constant pool offsets. (Almost all the attributes defined in JVMS meet this restriction; only the type annotation attributes do not.) For purposes of this mail, call the ones that do not the "brittle" attributes. - Add an option to determine what to do with brittle attributes under transformation: drop them, retain them, fail. This way, nonstandard brittle attributes can be marked as such as well, and get the same treatment as the known brittle attributes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Aug 2 17:12:49 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 2 Aug 2023 13:12:49 -0400 Subject: Attribute safety In-Reply-To: References: <44eea6b4-f00e-154c-d3fd-3675a63934c9@oracle.com> <7ded73f4-71fe-1cd6-e69f-c5eac985cf41@oracle.com> Message-ID: Some thoughts on refining the AttributeStability enum. Since we don't want to have to support combinations (labels and CP refs), this enum should be linearly ordered, where STATELESS < CP < LABELS < HAZMAT < UNKNOWN.? I believe the semantics so far follow this, but we should verify and the docs should reflect this so people don't ask "what if it has both X and Y". The names of the enum constants need some work :) I am wondering about what is the practical difference between HAZMAT and UNKNOWN.? Since you have to ask the attribute mapper for the stability level, and "unknown attribute" has historically been "no attribute mapper available", is this difference worth reflecting?? If so, what does it mean?? Are we looking to inflate a synthetic attribute mapper for each observed unknown attribute, whose functionality is partial? Similarly, the AttributesProcessingOption should reflect a linear progression of pickiness about attributes: ?- pass all attributes ?- block unknown, but pass hazmat ?- block unknown and hazmat I think this is a matter of documenting that this sequence is monotonic. A related question is where we want to do the dropping.? We could drop attributes either on the read side (never even present it to the user) or the write side.? Currently we drop on read, but I think this may not be ideal. Since readers should always be prepared to deal with attributes they don't recognize, dropping unknown attributes on reading seems premature.? THe only reason to drop an attribute preemptively is if processing it will trigger unnecessary work. For attributes on classes, fields, and methods, we do very little work already; we record the offset, name, and length of the attribute, and wrap this with an object that only looks at the bytes if you ask more questions of it.? (Given the structure of the classfile format, its hard to imagine doing less work, though I think we could be even lazier about inflating a String for the UTF8 of the name.) Where we do a lot of work for attributes is on the attributes in the Code attribute -- line number tables, local variable tables, etc.? But we have an option for suppressing these separately. For HAZMAT attributes such as RVTA, the validity is subtle -- the RVTA on a method can be invalidated if anything in the method changes (including other method attributes).? But if we don't even explode the method (because the user just passed it through as a ClassElement), we wouldn't want to explode it just to remove the RVTA.? Which means that the AttributeProcessingOption is not a guarantee that such attributes would be dropped, just permission to do so if the going gets tough. So I think what we need to discuss a little further is: ?- Better names in the AttributeStability enum ?- When we should drop attributes according to the AttributeProcessingOption On 8/1/2023 5:07 AM, Adam Sotona wrote: > > FYI: I?ve created JDK-8313452 > and draft PR 15101 > ?Improve Classfile API > attributes handling safety?** > > (the PR is targeted after JDK-8312491 > / PR 14968 > is integrated due to Javadoc > update and conflicts). > > The custom attribute mapper simplification has lower priority and can > be discussed independently. > > Thanks, > > Adam > > *From: *Brian Goetz > *Date: *Monday, 31 July 2023 16:00 > *To: *Adam Sotona , > classfile-api-dev at openjdk.org > *Subject: *Re: Attribute safety > > > > I like the idea. It makes sense to simplify handling of custom > attributes for some common situations. > > As the proposal adds a method to AtributeMapper identifying > ?brittle? attributes, it still implies existence of custom > attribute mapper for each custom attribute. > > > Right now, there are two choices for modeling attributes: > > ?- No attribute mapper.? Here, we will treat it as an unknown > attribute, and use the option for unknown attribute handling to > determine whether to preserve or drop the attribute. > > ?- Attribute mapper present.? Here, we currently assume that if there > is an attribute mapper, we can pass the attribute through > uninterpreted during transformation if the constant pool is shared, > and we lift the attribute to the object form and re-render to bytes it > if the constant pool is not shared. > > We've tried to make it easy to write attribute mappers, to encourage > people to do so.? The implicit assumption in the attribute mapper > design currently is that the only thing that might be environmentally > sensitive is the constant pool.? I think this is the assumption we > want to refine.? (Secondarily, the explode-and-rewrite trick can also > tolerate labels moving, because labels are handled through a level of > indirection.) > > Thinking some more about how to model this, a single bit is not good > enough.? So I propose: > > ??? enum AttributeStability { STATELESS, CP_REFS, LABELS, HAZMAT } > > (the names here are bad.) > > Where: > > ?- STATELESS means the attribute contains only pure data, such as > timestamps, and can always be bulk-copied. > ?- CP_REFS means that the attribute contains only pure data and CP > refs, so can be bulk-copied when CP sharing is in effect, and > exploded/rewritten when CP sharing is not in effect > ?- LABELS means that the attribute may contain labels, so should > always be exploded/rewritten > ?- HAZMAT means the attribute may contain indexes into structured not > managed by the library (type variable lists, etc) and so we consult > the "toxic attributes" option to determine whether to preserve or drop it > > Most JVMS attributes are CP_REF.? Some like Deprecated and > CompilationID are STATELESS.? The TA attributes are HAZMAT.? The local > variable table attributes are LABELS. > > So the new API surface is: > > ?- an enum for the attribute's environmental coupling > ?- an accessor on AttributeMapper for that enum > ?- an option for what to do with HAZMAT attributes (which should > probably be merged with the option for UKNOWN attributes) > > If stateless attributes were common, we might try to make life easier > for attribute mapper writers by making the read/write methods optional > for such attributes, but they are pretty uncommon so I think this is > not worth it. > > > > > > > Current attributes can be split into following categories : > > 1. Self-contained attributes (no dependency on CP nor Code > offsets). Such attributes can be safely transformed in any > situation and their payload is just copy/pasted. > 2. Attributes with references to constant pool. Such attributes > can be safely transformed when the CP is shared, however > require custom handling (cloning of CP entries) during write > into a class with new CP. > 3. Attributes with references to bytecode offsets (Code > attributes). Payload of such attributes can be safely > copy/pasted only when the Code is untouched. Otherwise they > require custom translation into labeled model during read and > back to offsets during write. These attribute most probably > also use constant pool. > > I would suggest an alternative proposal to provide various custom > attribute mapper factories, mainly to simplify handling of > category #1 and #2 of custom attributes. > > That solution would not require to add any indication methods to > the mappers nor global switches. Each custom mapper (composed by > user) will respond to the actual situation accordingly. > > For category #1 there might be a single factory getting attribute > name and returning attribute mapper. > > For category #2 there might be more options: > > 1. A factory producing mapper which throws on write when CP is > not shared > 2. Or a factory producing mapper simplifying CP entries clone and > re-mapping on write when CP is not shared (it might be > implemented even the way the user function identify offsets of > CP indexes inside the payload and mapper does all the job with > CP entries re-mapping). > > For category #3 we may also provide some mapper factories, as we > will better know specific use cases. > > Thanks, > > Adam > > *From: *classfile-api-dev > on behalf of Brian > Goetz > *Date: *Thursday, 27 July 2023 23:02 > *To: *classfile-api-dev at openjdk.org > > *Subject: *Attribute safety > > We currently divide attributes into two buckets: those for which > an attribute mapper exists, and those for which one doesn't. The > latter are represented with `UnknownAttribute`.? There is also an > Option to determine whether unknown attributes should be discarded > when reading or writing a classfile. The main reason to be > cautious about unknown attributes is that we cannot guarantee > their integrity during transformation if there are any other > changes to the classfile, because we don't know what their raw > contents represent. > > The library leans heavily on constant pool sharing to optimize > transformation.? The default behavior when transforming a > classfile is to keep the original constant pool as the initial > part of the new constant pool.? If constant pool sharing is > enabled in this way, attributes that contain only pure data and/or > constant pool offsets can be bulk-copied during transformation > rather than parsing and regenerating them. > > Most of the known attributes meet this criteria -- that they > contain only pure data and/or constant pool offsets.? However, > there are a cluster of attributes that are more problematic: the > type annotation attributes.? These may contain offsets into the > bytecode table, exception table, list of type variables, bounds of > type variables, and many other structures that may be perturbed > during transformation.? This leaves us with some bad choices: > > ?- Try to track if anything the attribute indexes into has been > changed.? (The cost and benefit here are out of balance by > multiple orders of magnitude here.) > ?- Copy the attribute and hope it is good enough. Much of the fine > structure of RVTA and friends are not actually used at runtime, so > this may be OK. > ?- Drop the attribute during transformation and hope that's OK. > > (There are also middle grounds, such as trying to detect whether > the entity with the attribute (method, field, etc) has been > modified.? This is lighter-weight that trying to track if the > attribute has been invalidated, but this is already a significant > task.) > > I haven't been happy with any of the options, but I have a > proposal for incrementally improving it: > > ?- Add a method to AttributeMapper for to indicate whether or not > the attribute contains only pure data and/or constant pool > offsets.? (Almost all the attributes defined in JVMS meet this > restriction; only the type annotation attributes do not.)? For > purposes of this mail, call the ones that do not the "brittle" > attributes. > > ?- Add an option to determine what to do with brittle attributes > under transformation: drop them, retain them, fail. > > This way, nonstandard brittle attributes can be marked as such as > well, and get the same treatment as the known brittle attributes. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Thu Aug 3 08:49:03 2023 From: adam.sotona at oracle.com (Adam Sotona) Date: Thu, 3 Aug 2023 08:49:03 +0000 Subject: Attribute safety In-Reply-To: References: <44eea6b4-f00e-154c-d3fd-3675a63934c9@oracle.com> <7ded73f4-71fe-1cd6-e69f-c5eac985cf41@oracle.com> Message-ID: From: Brian Goetz > I am wondering about what is the practical difference between HAZMAT and UNKNOWN. Maybe there is no difference, all unknown attributes are HAZMAT, so we may drop UNKNOWN. > A related question is where we want to do the dropping. We could drop attributes either on the read side (never even present it to the user) or the write side. Currently we drop on read, but I think this may not be ideal. I think we should drop on both sides according to the actual context. When user parses classfile with option set to drop, the attributes are not expected to pass in. However when the context for parsing allows the attributes and context for transformation don?t ? they should pass in, however not pass out. > AttributeProcessingOption is not a guarantee that such attributes would be dropped, just permission to do so if the going gets tough. Yes, this is common problem also for DebugElementsOption , LineNumbersOption and also for DeadCodeOption. For example to filter out debug attributes from code requires to expand the code with no-op transformation down to the instructions, otherwise the option has no effect. > So I think what we need to discuss a little further is: - Better names in the AttributeStability enum I like the HAZMAT :) - When we should drop attributes according to the AttributeProcessingOption I think on both sides according to the actual Classfile context. On 8/1/2023 5:07 AM, Adam Sotona wrote: FYI: I?ve created JDK-8313452 and draft PR 15101 ?Improve Classfile API attributes handling safety? (the PR is targeted after JDK-8312491 / PR 14968 is integrated due to Javadoc update and conflicts). The custom attribute mapper simplification has lower priority and can be discussed independently. Thanks, Adam From: Brian Goetz Date: Monday, 31 July 2023 16:00 To: Adam Sotona , classfile-api-dev at openjdk.org Subject: Re: Attribute safety I like the idea. It makes sense to simplify handling of custom attributes for some common situations. As the proposal adds a method to AtributeMapper identifying ?brittle? attributes, it still implies existence of custom attribute mapper for each custom attribute. Right now, there are two choices for modeling attributes: - No attribute mapper. Here, we will treat it as an unknown attribute, and use the option for unknown attribute handling to determine whether to preserve or drop the attribute. - Attribute mapper present. Here, we currently assume that if there is an attribute mapper, we can pass the attribute through uninterpreted during transformation if the constant pool is shared, and we lift the attribute to the object form and re-render to bytes it if the constant pool is not shared. We've tried to make it easy to write attribute mappers, to encourage people to do so. The implicit assumption in the attribute mapper design currently is that the only thing that might be environmentally sensitive is the constant pool. I think this is the assumption we want to refine. (Secondarily, the explode-and-rewrite trick can also tolerate labels moving, because labels are handled through a level of indirection.) Thinking some more about how to model this, a single bit is not good enough. So I propose: enum AttributeStability { STATELESS, CP_REFS, LABELS, HAZMAT } (the names here are bad.) Where: - STATELESS means the attribute contains only pure data, such as timestamps, and can always be bulk-copied. - CP_REFS means that the attribute contains only pure data and CP refs, so can be bulk-copied when CP sharing is in effect, and exploded/rewritten when CP sharing is not in effect - LABELS means that the attribute may contain labels, so should always be exploded/rewritten - HAZMAT means the attribute may contain indexes into structured not managed by the library (type variable lists, etc) and so we consult the "toxic attributes" option to determine whether to preserve or drop it Most JVMS attributes are CP_REF. Some like Deprecated and CompilationID are STATELESS. The TA attributes are HAZMAT. The local variable table attributes are LABELS. So the new API surface is: - an enum for the attribute's environmental coupling - an accessor on AttributeMapper for that enum - an option for what to do with HAZMAT attributes (which should probably be merged with the option for UKNOWN attributes) If stateless attributes were common, we might try to make life easier for attribute mapper writers by making the read/write methods optional for such attributes, but they are pretty uncommon so I think this is not worth it. Current attributes can be split into following categories : 1. Self-contained attributes (no dependency on CP nor Code offsets). Such attributes can be safely transformed in any situation and their payload is just copy/pasted. 2. Attributes with references to constant pool. Such attributes can be safely transformed when the CP is shared, however require custom handling (cloning of CP entries) during write into a class with new CP. 3. Attributes with references to bytecode offsets (Code attributes). Payload of such attributes can be safely copy/pasted only when the Code is untouched. Otherwise they require custom translation into labeled model during read and back to offsets during write. These attribute most probably also use constant pool. I would suggest an alternative proposal to provide various custom attribute mapper factories, mainly to simplify handling of category #1 and #2 of custom attributes. That solution would not require to add any indication methods to the mappers nor global switches. Each custom mapper (composed by user) will respond to the actual situation accordingly. For category #1 there might be a single factory getting attribute name and returning attribute mapper. For category #2 there might be more options: 1. A factory producing mapper which throws on write when CP is not shared 2. Or a factory producing mapper simplifying CP entries clone and re-mapping on write when CP is not shared (it might be implemented even the way the user function identify offsets of CP indexes inside the payload and mapper does all the job with CP entries re-mapping). For category #3 we may also provide some mapper factories, as we will better know specific use cases. Thanks, Adam From: classfile-api-dev on behalf of Brian Goetz Date: Thursday, 27 July 2023 23:02 To: classfile-api-dev at openjdk.org Subject: Attribute safety We currently divide attributes into two buckets: those for which an attribute mapper exists, and those for which one doesn't. The latter are represented with `UnknownAttribute`. There is also an Option to determine whether unknown attributes should be discarded when reading or writing a classfile. The main reason to be cautious about unknown attributes is that we cannot guarantee their integrity during transformation if there are any other changes to the classfile, because we don't know what their raw contents represent. The library leans heavily on constant pool sharing to optimize transformation. The default behavior when transforming a classfile is to keep the original constant pool as the initial part of the new constant pool. If constant pool sharing is enabled in this way, attributes that contain only pure data and/or constant pool offsets can be bulk-copied during transformation rather than parsing and regenerating them. Most of the known attributes meet this criteria -- that they contain only pure data and/or constant pool offsets. However, there are a cluster of attributes that are more problematic: the type annotation attributes. These may contain offsets into the bytecode table, exception table, list of type variables, bounds of type variables, and many other structures that may be perturbed during transformation. This leaves us with some bad choices: - Try to track if anything the attribute indexes into has been changed. (The cost and benefit here are out of balance by multiple orders of magnitude here.) - Copy the attribute and hope it is good enough. Much of the fine structure of RVTA and friends are not actually used at runtime, so this may be OK. - Drop the attribute during transformation and hope that's OK. (There are also middle grounds, such as trying to detect whether the entity with the attribute (method, field, etc) has been modified. This is lighter-weight that trying to track if the attribute has been invalidated, but this is already a significant task.) I haven't been happy with any of the options, but I have a proposal for incrementally improving it: - Add a method to AttributeMapper for to indicate whether or not the attribute contains only pure data and/or constant pool offsets. (Almost all the attributes defined in JVMS meet this restriction; only the type annotation attributes do not.) For purposes of this mail, call the ones that do not the "brittle" attributes. - Add an option to determine what to do with brittle attributes under transformation: drop them, retain them, fail. This way, nonstandard brittle attributes can be marked as such as well, and get the same treatment as the known brittle attributes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Aug 3 13:58:18 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 3 Aug 2023 09:58:18 -0400 Subject: Attribute safety In-Reply-To: References: <44eea6b4-f00e-154c-d3fd-3675a63934c9@oracle.com> <7ded73f4-71fe-1cd6-e69f-c5eac985cf41@oracle.com> Message-ID: <8f3ad1e2-1b64-9cc1-7398-10fad83d1786@oracle.com> On 8/3/2023 4:49 AM, Adam Sotona wrote: > > *From: *Brian Goetz > > > I am wondering about what is the practical difference between HAZMAT > and UNKNOWN. > > Maybe there is no difference, all unknown attributes are HAZMAT, so we > may drop UNKNOWN. > There is definitely a difference, the question is, whether it is possible for an attribute to be "unknown" if it has a mapper.? On the one hand I kind of like including it here, because then we can treat all attributes equally (they all have a mapper) rather than "unknown" as a special case, but then we have to do the additional (small) change of inflating a do-nothing mapper for every unknown attribute we come across.? Which i think is OK. -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Mon Aug 7 10:46:32 2023 From: adam.sotona at oracle.com (Adam Sotona) Date: Mon, 7 Aug 2023 10:46:32 +0000 Subject: Attribute safety Message-ID: That makes perfect sense, attribute safety is more exactly an attribute transformation safety. I agree that introduction of special read/write filters (in a form of context options) is confusing and non-systematic. When we focus on the implementation of the attributes transformation safety, I think the safety switch is less of a global context option but rather individual transformation immediate feature (a filtering feature). If we implement it as a global context option, we would have to insert a filtering layer before each transformation (on read side) or after (on write side) of each transformation. I think it would be pretty much the same as filtering on read/write, except for the fact it will affect transformations only (so maybe even more confusing). Classfile::transform would then behave differently than its expanded form using Classfile::build. However if we implement attribute transformation safety as specific transformations (doing the filtering job) ? it should work in harmony with the rest of the API. For example in addition to ClassTransform.ACCEPT_ALL we can add ClassTransform.ACCEPT_ALL_KNOWN_ATTRIBUTES (dropping UNKNOWN) and ClassTransform.ACCEPT_ALL_SAFE_ATTRIBUTES (dropping HAZMAT). As an interesting expansion of the ClassfileTransform features we can provide factories like for example ClassfileTransform::dropingAll(Predicate filter) ? where the ?All? (or ?Deep? or similar suffix) should indicate forced expansion of the whole tree, so the filter is really applied on all levels and filtered element never appears in the target class. It can be used to implement the global filtering transformations. I propose to add following set of filtering transformations: * ClassTransform.ACCEPT_ALL_KNOWN_ATTRIBUTES * ClassTransform.ACCEPT_ALL_SAFE_ATTRIBUTES * FieldTransform.ACCEPT_ALL_KNOWN_ATTRIBUTES * FieldTransform.ACCEPT_ALL_SAFE_ATTRIBUTES * MethodTransform.ACCEPT_ALL_KNOWN_ATTRIBUTES * MethodTransform.ACCEPT_ALL_SAFE_ATTRIBUTES * CodeTransform.ACCEPT_ALL_KNOWN_ATTRIBUTES * CodeTransform.ACCEPT_ALL_SAFE_ATTRIBUTES Thanks, Adam From: Brian Goetz Date: Saturday, 5 August 2023 3:01 To: Adam Sotona , classfile-api-dev at openjdk.org Subject: Re: Attribute safety OK, I thought about this some more while sitting in the sauna ... I think the locus of attribute safety is not reading or writing, but transforming. If I am just going to read a classfile, there is no need to drop anything, if I find an attribute I don't recognize, I'll just skip over it and keep going -- that's how attributes are designed to work. No need to drop anything on read, ever. WHen the library finds an unknown attribute, it wraps it with an UnknownAttribute element, whose understanding to the attribute is limited to name, size, and byte[] of the payload. Nothing so dangerous here that the user needs protection. Similarly, if a user is _writing_ a classfile, again, we should trust them that the classfile they are putting together is sensible. We shouldn't second guess at "oh, that's a type annotations attribute, those are so brittle, please sign here." Where there is potentially a problem is when we are _transforming_ a classfile, because for a HAZMAT or UNKNOWN attribute, we can't guarantee its integrity if we've changed anything else about the classfile (including reordering the constant pool.) So the "what do we do with brittle attributes" question applies only to transformation, where we are taking an attribute from one classfile (a bound attribute) and writing it to another. This is where the user can shoot themselves in the foot, because they might change something else about the classfile and subtly (or not subtly) undermine the integrity of the annotation they don't understand. And this is why we want to classify attributes according to their sensitivity to environmental change: - A stateless attribute is sensitive to no environmental changes. A transform can always safely bulk-copy the attribute directly. - An attribute with CP dependencies is sensitive to restructuring of the constant pool (no CP sharing), but the mapper contains enough information to survive CP restructuring. A transform can safely bulk-copy the attribute directly if the CP is shared between the original and new classfile, and can otherwise safely copy the attribute by inflating it and deflating it via the readAttribute/writeAttribute behavior of the mapper. - An attribute with label dependencies is sensitive to changes to the contents or structure of the bytecode array. A transform can safely bulk-copy the attribute directly if code array is unchanged, and can otherwise safely copy the attribute by inflating it and deflating it via the readAttribute/writeAttribute behavior of the mapper. However, there are currently no attributes that have label dependencies only but are not already treated specially by the classfile API, so this category may not be that interesting. - An attribute with unpredictable dependencies is sensitive to any change to the contents of the entity of which it is an attribute. It can be safely bulk-copied if nothing else in that entity has changed, but otherwise there is no safe way to copy it. - An unknown attribute is sensitive to all of the above, and so takes on the union of the copying risks of all of the above. So I think the Option we want governs what to do with various attributes when _transforming_ a CompoundElement in which they appear. And the problematic cases are those with unpredictable dependencies, and unknowns. So I think the options we want are: - When transforming, always keep HAZMAT and UNKNOWN attributes; for safety, lift and lower HAZMAT attributes. - When transforming, keep HAZMAT attributes (lifting and lowering), but always drop UNKNOWN attributes. - When transforming, always drop HAZMAT and UNKNOWN attributes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Fri Aug 11 09:10:04 2023 From: adam.sotona at oracle.com (Adam Sotona) Date: Fri, 11 Aug 2023 09:10:04 +0000 Subject: Classfile API ConstantPool::entryCount and ConstantPool::entryByIndex confusion Message-ID: Hi, I?ve noticed confusion in understanding (and inconsistency in implementations) of two ConstantPool methods: /** * {@return the entry at the specified index} * * @param index the index within the pool of the desired entry */ PoolEntry entryByIndex(int index); /** * {@return the number of entries in the constant pool} */ int entryCount(); Intuitive understanding of the above methods is that user can iterate over entries incrementing index by one up-to the entryCount and get entry for each index from the range. However the reality is that the methods reflects JVMS 4.1: constant_pool_count The value of the constant_pool_count item is equal to the number of entries in the constant_pool table plus one. A constant_pool index is considered valid if it is greater than zero and less than constant_pool_count, with the exception for constants of type long and double noted in ?4.4.5. Following user code cause more or less confusion: for (int i = 0; i < cp.entryCount(); i++) cp.entryByIndex(i); * Fails immediately with ConstantPoolException at index 0 for (int i = 1; i < cp.entryCount(); i++) cp.entryByIndex(i); * May fail for constant pools containing long or double entries (double-slot entries), however it may not fail if the tag at the invalid offset imitates a valid entry (this is a bug) or it may return null when SplitConstantPool implementation is involved (inconsistency in implementations). So the only valid (however not very intuitive) iteration over all entries should look like this: for (int i = 1; i < cp.entryCount(); i += cp.entryByIndex(i).width()) I propose following changes to ConstantPool: * Fix all implementations of PoolEntry ConstantPool::entryByIndex(int) to always throw ConstantPoolException when the index is invalid or change the method signature to Optional ConstantPool::entryByIndex(int) and explain it in the Javadoc * Rename ConstantPool::entryCount to slotsCount or size or width and explain it in the Javadoc * Make ConstantPool extends Iterable so user does not need to understand CP internals to iterate over its entries Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Fri Aug 11 10:02:39 2023 From: adam.sotona at oracle.com (Adam Sotona) Date: Fri, 11 Aug 2023 10:02:39 +0000 Subject: Classfile API improve chaining of end handlers of transformations Message-ID: Hi, I found one transformation pattern frequent ? to drop and add an element, or more generally to add end handler to an existing transformation. One common example; to replace a class element with a new one (no matter if present in the source classfile or not) can be implemented as: ClassTransform.dropping(e -> e instanceof WhateverElement).andThen(ClassTransform.endHandler(clb -> clb.with(newWhateverElement))); However this pattern compose heavy-weight chain of transformations (just to add an extra building code at the end). I propose to allow light-weight chaining of end handlers to the existing transformations. For example ClassTransform::endHandler instance method may produce a copy of the transformation with attached additional end handler. The example above may then look like: ClassTransform.dropping(e -> e instanceof WhateverElement).endHandler(clb -> clb.with(newWhateverElement)); Actual static factory method ClassTransform.endHandler(? use can be refactored into ClassTransform.ACCEPT_ALL.endHandler(? What do you think? Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Aug 11 14:41:06 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 11 Aug 2023 14:41:06 +0000 Subject: Classfile API ConstantPool::entryCount and ConstantPool::entryByIndex confusion In-Reply-To: References: Message-ID: These seem reasonable. My preference is for throwing over optional. Size is probably the least problematic name. Sent from my iPad On Aug 11, 2023, at 2:10 AM, Adam Sotona wrote: ? Hi, I?ve noticed confusion in understanding (and inconsistency in implementations) of two ConstantPool methods: /** * {@return the entry at the specified index} * * @param index the index within the pool of the desired entry */ PoolEntry entryByIndex(int index); /** * {@return the number of entries in the constant pool} */ int entryCount(); Intuitive understanding of the above methods is that user can iterate over entries incrementing index by one up-to the entryCount and get entry for each index from the range. However the reality is that the methods reflects JVMS 4.1: constant_pool_count The value of the constant_pool_count item is equal to the number of entries in the constant_pool table plus one. A constant_pool index is considered valid if it is greater than zero and less than constant_pool_count, with the exception for constants of type long and double noted in ?4.4.5. Following user code cause more or less confusion: for (int i = 0; i < cp.entryCount(); i++) cp.entryByIndex(i); * Fails immediately with ConstantPoolException at index 0 for (int i = 1; i < cp.entryCount(); i++) cp.entryByIndex(i); * May fail for constant pools containing long or double entries (double-slot entries), however it may not fail if the tag at the invalid offset imitates a valid entry (this is a bug) or it may return null when SplitConstantPool implementation is involved (inconsistency in implementations). So the only valid (however not very intuitive) iteration over all entries should look like this: for (int i = 1; i < cp.entryCount(); i += cp.entryByIndex(i).width()) I propose following changes to ConstantPool: * Fix all implementations of PoolEntry ConstantPool::entryByIndex(int) to always throw ConstantPoolException when the index is invalid or change the method signature to Optional ConstantPool::entryByIndex(int) and explain it in the Javadoc * Rename ConstantPool::entryCount to slotsCount or size or width and explain it in the Javadoc * Make ConstantPool extends Iterable so user does not need to understand CP internals to iterate over its entries Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Aug 23 17:26:50 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 23 Aug 2023 13:26:50 -0400 Subject: Attribute safety In-Reply-To: References: Message-ID: <744677dd-82aa-1330-8baa-01ed7fc1ed60@oracle.com> Just a further thought on this: we can further focus our lens on _bound_ attributes, because these are the ones that have come from another classfile.? If the user creates a RVAA during a transform, we should assume that is fine, just as we do with writing. On 8/7/2023 6:46 AM, Adam Sotona wrote: > > That makes perfect sense, attribute safety is more exactly an > attribute transformation safety. > > I agree that introduction of special read/write filters (in a form of > context options) is confusing and non-systematic. > > When we focus on the implementation of the attributes transformation > safety, I think the safety switch is less of a global context option > but rather individual transformation immediate feature (a filtering > feature). > > If we implement it as a global context option, we would have to insert > a filtering layer before each transformation (on read side) or after > (on write side) of each transformation. I think it would be pretty > much the same as filtering on read/write, except for the fact it will > affect transformations only (so maybe even more confusing). > Classfile::transform would then behave differently than its expanded > form using Classfile::build. > > However if we implement attribute transformation safety as specific > transformations (doing the filtering job) ? it should work in harmony > with the rest of the API. > > For example in addition to ClassTransform.ACCEPT_ALL we can add > ClassTransform.ACCEPT_ALL_KNOWN_ATTRIBUTES (dropping UNKNOWN) and > ClassTransform.ACCEPT_ALL_SAFE_ATTRIBUTES (dropping HAZMAT). > > As an interesting expansion of the ClassfileTransform features we can > provide factories like for example > ClassfileTransform::dropingAll(Predicate filter) ? > where the ?All? (or ?Deep? or similar suffix) should indicate forced > expansion of the whole tree, so the filter is really applied on all > levels and filtered element never appears in the target class. It can > be used to implement the global filtering transformations. > > I propose to add following set of filtering transformations: > > * ClassTransform.ACCEPT_ALL_KNOWN_ATTRIBUTES > * ClassTransform.ACCEPT_ALL_SAFE_ATTRIBUTES > * FieldTransform.ACCEPT_ALL_KNOWN_ATTRIBUTES > * FieldTransform.ACCEPT_ALL_SAFE_ATTRIBUTES > * MethodTransform.ACCEPT_ALL_KNOWN_ATTRIBUTES > * MethodTransform.ACCEPT_ALL_SAFE_ATTRIBUTES > * CodeTransform.ACCEPT_ALL_KNOWN_ATTRIBUTES > * CodeTransform.ACCEPT_ALL_SAFE_ATTRIBUTES > > Thanks, > > Adam > > *From: *Brian Goetz > *Date: *Saturday, 5 August 2023 3:01 > *To: *Adam Sotona , > classfile-api-dev at openjdk.org > *Subject: *Re: Attribute safety > > OK, I thought about this some more while sitting in the sauna ... > > I think the locus of attribute safety is not reading or writing, but > transforming.? If I am just going to read a classfile, there is no > need to drop anything, if I find an attribute I don't recognize, I'll > just skip over it and keep going -- that's how attributes are designed > to work.? No need to drop anything on read, ever.? WHen the library > finds an unknown attribute, it wraps it with an UnknownAttribute > element, whose understanding to the attribute is limited to name, > size, and byte[] of the payload.? Nothing so dangerous here that the > user needs protection. > > Similarly, if a user is _writing_ a classfile, again, we should trust > them that the classfile they are putting together is sensible.? We > shouldn't second guess at "oh, that's a type annotations attribute, > those are so brittle, please sign here." > > Where there is potentially a problem is when we are _transforming_ a > classfile, because for a HAZMAT or UNKNOWN attribute, we can't > guarantee its integrity if we've changed anything else about the > classfile (including reordering the constant pool.)? So the "what do > we do with brittle attributes" question applies only to > transformation, where we are taking an attribute from one classfile (a > bound attribute) and writing it to another.? This is where the user > can shoot themselves in the foot, because they might change something > else about the classfile and subtly (or not subtly) undermine the > integrity of the annotation they don't understand.? And this is why we > want to classify attributes according to their sensitivity to > environmental change: > > ? - A stateless attribute is sensitive to no environmental changes.? A > transform can always safely bulk-copy the attribute directly. > > ? - An attribute with CP dependencies is sensitive to restructuring of > the constant pool (no CP sharing), but the mapper contains enough > information to survive CP restructuring.? A transform can safely > bulk-copy the attribute directly if the CP is shared between the > original and new classfile, and can otherwise safely copy the > attribute by inflating it and deflating it via the > readAttribute/writeAttribute behavior of the mapper. > > ?- An attribute with label dependencies is sensitive to changes to the > contents or structure of the bytecode array.? A transform can safely > bulk-copy the attribute directly if code array is unchanged, and can > otherwise safely copy the attribute by inflating it and deflating it > via the readAttribute/writeAttribute behavior of the mapper.? However, > there are currently no attributes that have label dependencies only > but are not already treated specially by the classfile API, so this > category may not be that interesting. > > ?- An attribute with unpredictable dependencies is sensitive to any > change to the contents of the entity of which it is an attribute.? It > can be safely bulk-copied if nothing else in that entity has changed, > but otherwise there is no safe way to copy it. > > ?- An unknown attribute is sensitive to all of the above, and so takes > on the union of the copying risks of all of the above. > > So I think the Option we want governs what to do with various > attributes when _transforming_ a CompoundElement in which they > appear.? And the problematic cases are those with unpredictable > dependencies, and unknowns.? So I think the options we want are: > > ?- When transforming, always keep HAZMAT and UNKNOWN attributes; for > safety, lift and lower HAZMAT attributes. > ?- When transforming, keep HAZMAT attributes (lifting and lowering), > but always drop UNKNOWN attributes. > ?- When transforming, always drop HAZMAT and UNKNOWN attributes. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.lloyd at redhat.com Mon Aug 28 16:50:35 2023 From: david.lloyd at redhat.com (David Lloyd) Date: Mon, 28 Aug 2023 11:50:35 -0500 Subject: Minor issue with stack map generator and generics Message-ID: I've been doing more experimenting with this API and ran across a minor issue. The way the stack map generator API is presently structured, it seems difficult to "get-or-generate". For example, I thought to find the stack map for a method, or otherwise generate it if it did not exist, so my code looked something like this (structured for readability): MethodModel mm = ....; Optional optAttr = mm.findAttribute(Attributes.STACK_MAP_TABLE); StackMapAttribute sma = optAttr.orElseGet(() -> new StackMapGenerator(...).stackMapTableAttribute()); But this fails because `StackMapGenerator.stackMapTableAttribute()` returns an `Attribute` instead of just `StackMapAttribute`. This method returns an anonymous subclass of `AdHocAttribute`; could it instead be changed to an inner class which also implements `StackMapAttribute`? I think this might be more correct as well because the type argument of `Attribute` seems like it was intended to be a self-type, and if so, these direct anonymous subclasses (and there are a few of them) seem to violate that intention. -- - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Tue Aug 29 13:08:58 2023 From: adam.sotona at oracle.com (Adam Sotona) Date: Tue, 29 Aug 2023 13:08:58 +0000 Subject: Minor issue with stack map generator and generics In-Reply-To: References: Message-ID: Unfortunately jdk.internal.classfile.impl.StackMapGenerator is not exposed in the Classfile API and it is not designed to be called by user. The right way to drive stack maps generation is through Classfile.StackMapsOption. Where the default StackMapsOption.STACK_MAPS_WHEN_REQUIRED option will handle majority of use cases with maximum performance: * transformations of valid classes keeping the original stack maps when the code is unchanged * generation of stack maps for new methods * however it does not fix invalid class files at the input of the transformation StackMapsOption.GENERATE_STACK_MAPS option will force SM generation, for the cases like: * stack maps should be generated even when not mandatory by JVMS * source classes of the transformation are missing stack maps and Classfile API is used to fix it There are several specific aspects of the stack maps generation: 1. Non-presence of StackMapAttribute is a valid state for many methods, so this bare information is not enough to trigger the generator. 2. StackMapGenerator is internally invoked at the last stage of the code generation, where the bytecode array is fully constructed. What specific use case are you trying to solve? Thanks, Adam From: classfile-api-dev on behalf of David Lloyd Date: Monday, 28 August 2023 18:51 To: classfile-api-dev at openjdk.org Subject: Minor issue with stack map generator and generics I've been doing more experimenting with this API and ran across a minor issue. The way the stack map generator API is presently structured, it seems difficult to "get-or-generate". For example, I thought to find the stack map for a method, or otherwise generate it if it did not exist, so my code looked something like this (structured for readability): MethodModel mm = ....; Optional optAttr = mm.findAttribute(Attributes.STACK_MAP_TABLE); StackMapAttribute sma = optAttr.orElseGet(() -> new StackMapGenerator(...).stackMapTableAttribute()); But this fails because `StackMapGenerator.stackMapTableAttribute()` returns an `Attribute` instead of just `StackMapAttribute`. This method returns an anonymous subclass of `AdHocAttribute`; could it instead be changed to an inner class which also implements `StackMapAttribute`? I think this might be more correct as well because the type argument of `Attribute` seems like it was intended to be a self-type, and if so, these direct anonymous subclasses (and there are a few of them) seem to violate that intention. -- - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.lloyd at redhat.com Tue Aug 29 14:23:12 2023 From: david.lloyd at redhat.com (David Lloyd) Date: Tue, 29 Aug 2023 09:23:12 -0500 Subject: Minor issue with stack map generator and generics In-Reply-To: References: Message-ID: OK, that makes sense. I can use the option. I still think the generics usage isn't correct, but if all usages of the ad-hoc attributes are internal then it's not a problem for users and if implementers are happy with it then it's fine. On Tue, Aug 29, 2023 at 8:27?AM Adam Sotona wrote: > Unfortunately jdk.internal.classfile.impl.StackMapGenerator is not exposed > in the Classfile API and it is not designed to be called by user. > > The right way to drive stack maps generation is through > Classfile.StackMapsOption. > > Where the default StackMapsOption.STACK_MAPS_WHEN_REQUIRED option will > handle majority of use cases with maximum performance: > > - transformations of valid classes keeping the original stack maps > when the code is unchanged > - generation of stack maps for new methods > - however it does not fix invalid class files at the input of the > transformation > > StackMapsOption.GENERATE_STACK_MAPS option will force SM generation, for > the cases like: > > - stack maps should be generated even when not mandatory by JVMS > - source classes of the transformation are missing stack maps and > Classfile API is used to fix it > > > > There are several specific aspects of the stack maps generation: > > 1. Non-presence of StackMapAttribute is a valid state for many > methods, so this bare information is not enough to trigger the generator. > 2. StackMapGenerator is internally invoked at the last stage of the > code generation, where the bytecode array is fully constructed. > > > > > > What specific use case are you trying to solve? > > > > Thanks, > > Adam > > > > > > *From: *classfile-api-dev on behalf > of David Lloyd > *Date: *Monday, 28 August 2023 18:51 > *To: *classfile-api-dev at openjdk.org > *Subject: *Minor issue with stack map generator and generics > > I've been doing more experimenting with this API and ran across a minor > issue. The way the stack map generator API is presently structured, it > seems difficult to "get-or-generate". > > > > For example, I thought to find the stack map for a method, or otherwise > generate it if it did not exist, so my code looked something like this > (structured for readability): > > > > MethodModel mm = ....; > > Optional optAttr = > mm.findAttribute(Attributes.STACK_MAP_TABLE); > > StackMapAttribute sma = optAttr.orElseGet(() -> new > StackMapGenerator(...).stackMapTableAttribute()); > > > > But this fails because `StackMapGenerator.stackMapTableAttribute()` > returns an `Attribute` instead of just > `StackMapAttribute`. This method returns an anonymous subclass of > `AdHocAttribute`; could it instead be changed to an inner class which also > implements `StackMapAttribute`? I think this might be more correct as well > because the type argument of `Attribute` seems like it was intended to be a > self-type, and if so, these direct anonymous subclasses (and there are a > few of them) seem to violate that intention. > > > > -- > > - DML ? he/him > -- - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: