From ebruneton at free.fr Thu Nov 8 16:17:57 2012 From: ebruneton at free.fr (ebruneton at free.fr) Date: Fri, 9 Nov 2012 01:17:57 +0100 (CET) Subject: [type-annos-observers] Improving the format of type annotation attributes In-Reply-To: <20121107.103608.795384539929347270.mernst@cs.washington.edu> Message-ID: <857197387.246847726.1352420277650.JavaMail.root@zimbra73-e12.priv.proxad.net> >> Thanks Mike. Could you post the URLs to the PDF and HTML versions here >> please? The links in the changelog are broken. > >That is a working copy of the changelog (it's a snapshot of the current >version of the repository). ?The links will work when the changelog is >posted to the website as part of a release. > >I'm attaching the current version of the spec, in PDF and HTML form. >People can always generate these from the source in the Mercurial >repository, but I can also send it around. > > ? ? ? ? ? ? ? ? ? ? Thanks, > > ? ? ? ? ? ? ? ? ? ?-Mike Thanks for the update! I haven't looked at it in details yet, but I noticed that inner_type_path examples seem to be in the wrong order, at least not in the order I would have expected (i.e. from outside to inside). For instance, for @H O1. at E O2<@F S, at G T>. at D O3. at A Nested<@B U, @C V> I would expect this: @H empty path @E inner type path { empty path } @F inner type path { type argument path { 0, empty path } } @G inner type path { type argument path { 1, empty path } } @D inner type path { inner type path { empty path } } @A inner type path { inner type path { inner type path { empty path } } } @B inner type path { inner type path { inner type path { type argument path { 0, empty path } } } } @C inner type path { inner type path { inner type path { type argument path { 1, empty path } } } } Eric From wdietl at gmail.com Mon Nov 12 11:36:29 2012 From: wdietl at gmail.com (Werner Dietl) Date: Mon, 12 Nov 2012 11:36:29 -0800 Subject: Comments on the JSR 308 specification Message-ID: Dear JSR 308 expert group and fellow implementers, please find some comments about the specification (as of November 9th) below. page 16: - The values for "path_type" are not specified. Something like Fig. 1 might be overkill, but I don't just want to assume that they are 0 - 4. - I'm not quite sure I see the advantage of the nested structure. Couldn't we use an array instead? We are simply specifying a sequence of steps to follow, not a complicated tree. struct type_path_entry { u1 path_type; union { // empty for most, only type_argument_path has info u1 type_argument_index; } } struct type_path { u1 path_length; type_path_entry [ path_length ]; } The path_length could be used instead of or in addition to ending each path with empty_path - like we can have either C-style null-terminated strings or Pascal-style strings with length :-) I would prefer using path_length without empty_path. Then instead of: type_argument_path { 1, type_argument_path { 0, empty path } } we would have: length: 2; elements: type_argument_path, 1, type_argument_path, 0; I think the resulting sequence of bytes would be the same (if path_length is left off), but I would find the presentation as array a lot simpler. Am I missing what the advantage of this nested presentation is? If so, I might be implementing it wrong :-) - On a related note, I find the representation of nested types weird. It basically is "go one step up in the nesting". Similarly, type_argument_path could be a "go one step right in the type arguments" operation. I find the solution for type arguments nicer and would suggest that we use: union { // empty for most u1 type_argument_index; u1 outer_index; } The location for @M in the first example was: inner_type_path { inner_type_path { inner_type_path { empty_path } } } and would now simply be: length: 1; elements: inner_type_path, 3; This is similar to the old way of counting, just ignoring any array/type argument confusion and just counting the nesting. 0 is the main modifier and left off, 1 is the first enclosing, etc. This way the index corresponds to the number of "inner_type_path" elements in the complicated/current way. We could use 0-based instead. page 14: - Section 3.3.6: I'm a bit amazed that the "throws_type_index" is a u2, but the "method_parameter_index" is a u1. Can there really be so many more exceptions than parameters? - I note that annotations in the signature are stored in the "method_info structure", whereas annotations in method bodies are stored in "a Code attribute". Is this an intentional difference? Or is this something that wasn't updated yet? If it's intentional, I would find a heads-up useful, maybe together with Fig. 1 that discusses the different categories. page 3: - I would put "for casts" and "for type tests" together. I would mention that there are no runtime checks for these. - In "for constructor invocation results": In myVar . new @Tainted NestedClass add "()" at the end. - Point 4 already talks about receivers, before point 5 introduced the concept and syntax for that. Maybe the order should be switched? page 5: - The last sentence of point 5 is the first to mention TYPE_USE. Would it make sense to introduce the new ElementType constants earlier? - Point 6 could mention that it uses ElementType.TYPE_PARAMETER page 10: - "Annotations that target instructions are _are_ those..." - In "How Java SE 7 stores annotations" we are reminded that there are both Runtime[In]VisibleParameterAnnotations and Runtime[In]VisibleAnnotations. Would it help to highlight that in JSR 308 we do not add Runtime[In]VisibleParameterTypeAnnotations and instead store such annotations with the method? page 11: - The last paragraph of Section 3.1 should also have a reference to Section 3.4 and give a similar overview as for the other sections. page 12: - Last sentence of Section 3.3 refers to generic type arguments and arrays only. It should have the complete list with nested types, etc. - Copy & paste mistake: Section 3.3.1 mentions "type_parameter_bound_target" and "bound". page 13: - Section 3.3.2 could also mention that the index is 0-based, like earlier and later subsections do (redundantly, I agree). Best regards, cu, WMD. -- http://www.google.com/profiles/wdietl From wdietl at gmail.com Tue Nov 13 18:49:33 2012 From: wdietl at gmail.com (Werner Dietl) Date: Tue, 13 Nov 2012 18:49:33 -0800 Subject: [type-annos-observers] Comments on the Nov 7 specification In-Reply-To: <50A2FFFE.1040706@oracle.com> References: <50A2FFFE.1040706@oracle.com> Message-ID: Alex, experts, how about: struct type_path_entry { u1 type_path_kind; // 0: annotation is deeper in this array type // 1: annotation is deeper in this nested type // 2: annotation is on the bound of this wildcard type arg // 3: annotation is on the i'th type arg of this parameterized type u1 additional_index; // 0: ignore me // non-0: the 1st, 2nd, etc array index, nested type, or type arg of this parameterized type } That is, we use the additional_index (better name?) as argument for arrays, nested types, and type arguments. Instead of array_type_path { array_type_path {array_type_path {}}} one can then write: length: 1 elements: array_type_path, 3. This applies in addition to the comment I already had about nested types, where I also think it would be easier to use a flag and an argument instead of repeating a flag multiple times. This would make handling of arrays, nested types, and type arguments nicely uniform. I like re-using one field instead of having a context-dependent union. About encoding everything within the flag: this will prevent any future extension of the type_path_kind. Also, how many type parameters can a class/method have? Are 253 possible values enough? (Similarly for the nesting-depth of arrays and types.) cu, WMD. On Tue, Nov 13, 2012 at 6:20 PM, Alex Buckley wrote: > Experts, > > Werner Dietl sent the comments below about the spec in Mike's mail > "Improving the format of type annotation attributes" of 11/7/12. > > Most are useful clarifications to the spec, but one comment proposes an > array rather than a tree to represent the hierarchical location of a type > annotation in a compound type. Each level in the tree becomes the succeeding > array entry. > > The proposal has a union with a context-sensitive single member that would > break pack200, so here is a slightly modified version: > > struct type_path { > u1 path_length; > type_path_entry path[path_length]; > } > > struct type_path_entry { > u1 type_path_kind; > // 0: annotation is deeper in this array type > // 1: annotation is deeper in this nested type > // 2: annotation is on the bound of this wildcard type arg > // 3: annotation is on the i'th type arg of this parameterized type > u1 type_argument_index; > // 0: ignore me > // non-0: the 1st, 2nd, etc type arg of this parameterized type > } > > I think this is fine. A further improvement would be to drop the > type_argument_index item and encode the i'th type arg into the > type_path_kind item. Namely, type_path_kind values >=3 represent the > type_path_kind-2'th type arg of the current parameterized type. This trick > is used in the stack_map_frame structure (JVMS 4.7.4). > > Alex > > ********** > > page 16: > - The values for "path_type" are not specified. Something like Fig. 1 > might be overkill, but I don't just want to assume that they are 0 - 4. > > - I'm not quite sure I see the advantage of the nested structure. > Couldn't we use an array instead? We are simply specifying a sequence > of steps to follow, not a complicated tree. > > struct type_path_entry { > u1 path_type; > union { > // empty for most, only type_argument_path has info > u1 type_argument_index; > } > } > > struct type_path { > u1 path_length; > type_path_entry [ path_length ]; > } > > The path_length could be used instead of or in addition to ending each > path with empty_path - like we can have either C-style null-terminated > strings or Pascal-style strings with length > I would prefer using path_length without empty_path. > > Then instead of: > > type_argument_path { 1, type_argument_path { 0, empty path } } > > we would have: > > length: 2; > elements: type_argument_path, 1, type_argument_path, 0; > > I think the resulting sequence of bytes would be the same (if > path_length is left off), but I would find the presentation as array a > lot simpler. > Am I missing what the advantage of this nested presentation is? If so, > I might be implementing it wrong > > > - On a related note, I find the representation of nested types weird. It > basically is "go one step up in the nesting". Similarly, > type_argument_path could be a "go one step right in the type > arguments" operation. > I find the solution for type arguments nicer and would suggest that we use: > > union { > // empty for most > u1 type_argument_index; > u1 outer_index; > } > > The location for @M in the first example was: > > inner_type_path { inner_type_path { inner_type_path { empty_path } } } > > and would now simply be: > > length: 1; > elements: inner_type_path, 3; > > This is similar to the old way of counting, just ignoring any > array/type argument confusion and just counting the nesting. 0 is the > main modifier and left off, 1 is the first enclosing, etc. This way > the index corresponds to the number of "inner_type_path" elements in > the complicated/current way. We could use 0-based instead. > > > page 14: > - Section 3.3.6: I'm a bit amazed that the "throws_type_index" is a > u2, but the "method_parameter_index" is a u1. Can there really be so > many more exceptions than parameters? > > - I note that annotations in the signature are stored in the > "method_info structure", whereas annotations in method bodies are > stored in "a Code attribute". > Is this an intentional difference? Or is this something that wasn't > updated yet? If it's intentional, I would find a heads-up useful, > maybe together with Fig. 1 that discusses the different categories. > > > page 3: > - I would put "for casts" and "for type tests" together. I would > mention that there are no runtime checks for these. > > - In "for constructor invocation results": > In > myVar . new @Tainted NestedClass > add "()" at the end. > > - Point 4 already talks about receivers, before point 5 introduced the > concept and syntax for that. Maybe the order should be switched? > > page 5: > - The last sentence of point 5 is the first to mention TYPE_USE. Would > it make sense to introduce the new ElementType constants earlier? > > - Point 6 could mention that it uses ElementType.TYPE_PARAMETER > > page 10: > - "Annotations that target instructions are _are_ those..." > > - In "How Java SE 7 stores annotations" we are reminded that there are > both Runtime[In]VisibleParameterAnnotations and > Runtime[In]VisibleAnnotations. > Would it help to highlight that in JSR 308 we do not add > Runtime[In]VisibleParameterTypeAnnotations and instead store such > annotations with the method? > > page 11: > - The last paragraph of Section 3.1 should also have a reference to > Section 3.4 and give a similar overview as for the other sections. > > page 12: > - Last sentence of Section 3.3 refers to generic type arguments and > arrays only. It should have the complete list with nested types, etc. > > - Copy & paste mistake: Section 3.3.1 mentions > "type_parameter_bound_target" and "bound". > > page 13: > - Section 3.3.2 could also mention that the index is 0-based, like > earlier and later subsections do (redundantly, I agree). > > ********** -- http://www.google.com/profiles/wdietl From ebruneton at free.fr Fri Nov 16 09:55:23 2012 From: ebruneton at free.fr (Eric Bruneton) Date: Fri, 16 Nov 2012 18:55:23 +0100 Subject: [type-annos-observers] Comments on the Nov 7 specification Message-ID: <50A67E0B.2010108@free.fr> >Werner has proposed (see below) an optimization so that multiple >"levels" of the same kind of compound type can be "jumped" in one go. >That is, rather than a type_path_entry for each and every successive >type constructor ([] for array types, . for nested types, < for type >arguments, and ? ... for wildcard bounds), From an ASM point of view, both using an array instead of nested structures and optimizing multiple levels of the same kind is fine. The important point is that all paths should be "from outside to inside" (e.g. for nested types, the path should go from outer to inner, not the other way around as proposed in the Nov 7 specification). Eric From ebruneton at free.fr Sat Nov 24 08:28:47 2012 From: ebruneton at free.fr (Eric Bruneton) Date: Sat, 24 Nov 2012 17:28:47 +0100 Subject: [type-annos-observers] Comments on the Nov 7 specification In-Reply-To: <20121120.123716.298465860904600964.mernst@cs.washington.edu> References: <50A67E0B.2010108@free.fr> <20121120.123716.298465860904600964.mernst@cs.washington.edu> Message-ID: <50B0F5BF.20006@free.fr> 20/11/2012 21:37, Michael Ernst wrote: > Eric Bruneton said: > >> From an ASM point of view, >> ... >> The important >> point is that all paths should be "from outside to inside" (e.g. for nested >> types, the path should go from outer to inner, not the other way around as >> proposed in the Nov 7 specification). > > Eric, I'm willing to make this change, but I would like to be able to give > a more specific justification for the design choice. Can you explain the > rationale, or what difference it makes to ASM? Is the reason that the > identifiers appear left-to-right in the class file and you want to process > them in that order, or is it something else? Right, it's basically that. For instance, consider the problem of extracting the part of a type signature in the class file format (e.g. Ljava.util.Map<+Ljava.lang.String;Ljava.util.List;>;) that corresponds to a given type_path. With 'outside to inside' paths, this can be done with a simple recursive function, like extract(signature,path) - if empty path return signature - otherwise parse the first signature element from the left, check conformity with first path element, and call recursively with tail of signature and tail of path. With 'inside to outside' paths, you have to parse the three inner types in the signature to realize that "inner_path{inner_path{inner_path{empty_path}}}", for instance, was in fact the first parsed type (then you either have to save state during parsing, or do another parsing pass). I'm not sure I'm clear, but I hope you get the idea. Eric From wdietl at gmail.com Mon Nov 26 10:22:36 2012 From: wdietl at gmail.com (Werner Dietl) Date: Mon, 26 Nov 2012 10:22:36 -0800 Subject: [type-annos-observers] Comments on the Nov 7 specification In-Reply-To: <50B0F5BF.20006@free.fr> References: <50A67E0B.2010108@free.fr> <20121120.123716.298465860904600964.mernst@cs.washington.edu> <50B0F5BF.20006@free.fr> Message-ID: Eric, all, I see your point, but want to give two reasons why "inside-out" makes more sense to me - one reason being the meaning of the types and the other being the AST representation. I think the AST should be considered as much as the bytecode representation. 1. Meaning of the nested types: For a nested type, we can basically think of a type parameterized by the outer type. This type parameter is what the programmer can access with Outer.this. So for the type: @A Outer. @B Middle. @C Inner we can think of the generic type: @C Inner< @B Middle< @A Outer > > For the generic type, the locations would be: @A: 3(1), 3(1) @B: 3(1) @C: - Similarly, with the "inside-out" approach, we would have: @A: 1(0), 1(0) @B: 1(0) @C: - With the "outside-in" approach, we would have: @A: - @B: 1(0) @C: 1(0), 1(0) That is, "inside-out" follows the logical structure of the generic type correspondence, whereas "outside-in" breaks that order. 2. AST structure: Even if we disregard this correspondence to the generic type as too academic, I think the AST structure makes "inside-out" preferable. For the type: Outer.Inner.Middle we roughly build the following AST: VARIABLE id: f MEMBER_SELECT expr: Outer.Middle id: Inner MEMBER_SELECT expr: Outer id: Middle IDENTIFIER Outer That is, "Inner" is the root of the type, and "Outer.Middle" is the receiver of the field select, and so on. Similarly, for the type @A Outer. @B Middle. @C Inner we roughly build the following AST: VARIABLE id: g MODIFIERS ANNOTATION IDENTIFIER A ANNOTATED_TYPE ANNOTATION IDENTIFIER C MEMBER_SELECT expr: Outer. @B() Middle id: Inner ANNOTATED_TYPE ANNOTATION IDENTIFIER B MEMBER_SELECT expr: Outer id: Middle IDENTIFIER Outer That is, type "@C Inner" is at the root of the tree and "Outer. @B Middle" is the receiver expression. In this AST representation, determining the nesting position is easier if we can simply count how deep we descend in the tree to reach a certain type. I would be interested in hearing from people that use other ASTs whether they have a similar issue. Thanks, cu, WMD. On Sat, Nov 24, 2012 at 8:28 AM, Eric Bruneton wrote: > 20/11/2012 21:37, Michael Ernst wrote: >> >> Eric Bruneton said: >> >>> From an ASM point of view, >>> ... >>> The important >>> point is that all paths should be "from outside to inside" (e.g. for >>> nested >>> types, the path should go from outer to inner, not the other way around >>> as >>> proposed in the Nov 7 specification). >> >> >> Eric, I'm willing to make this change, but I would like to be able to give >> a more specific justification for the design choice. Can you explain the >> rationale, or what difference it makes to ASM? Is the reason that the >> identifiers appear left-to-right in the class file and you want to process >> them in that order, or is it something else? > > > Right, it's basically that. For instance, consider the problem of extracting > the part of a type signature in the class file format (e.g. > Ljava.util.Map<+Ljava.lang.String;Ljava.util.List;>;) > that corresponds to a given type_path. With 'outside to inside' paths, this > can be done with a simple recursive function, like > extract(signature,path) > - if empty path return signature > - otherwise parse the first signature element from the left, check > conformity with first path element, and call recursively with tail of > signature and tail of path. > > With 'inside to outside' paths, you have to parse the three inner types in > the signature to realize that > "inner_path{inner_path{inner_path{empty_path}}}", for instance, was in fact > the first parsed type (then you either have to save state during parsing, or > do another parsing pass). > > I'm not sure I'm clear, but I hope you get the idea. > > Eric -- http://www.google.com/profiles/wdietl From wdietl at gmail.com Tue Nov 27 01:18:28 2012 From: wdietl at gmail.com (Werner Dietl) Date: Tue, 27 Nov 2012 01:18:28 -0800 Subject: Annotations on exception parameters Message-ID: In "3.3.8 Exception parameters" the JSR 308 design document from Nov. 7 2012 states that annotations on an exception parameter (e.g. ... catch (@A Exception e) ...) are stored as an exception table index. I'm wondering whether for uniformity this could be stored like a local variable or resource variable, where we store the information for the variables explicitly. Unifying this aspect would simplify both the specification and its implementation. Thoughts? cu, WMD. -- http://www.google.com/profiles/wdietl