From ebruneton at free.fr  Thu Nov  8 16:17:57 2012
From: ebruneton at free.fr (ebruneton at free.fr)
Date: Fri, 9 Nov 2012 01:17:57 +0100 (CET)
Subject: [type-annos-observers] Improving the format of type annotation
	attributes
In-Reply-To: <20121107.103608.795384539929347270.mernst@cs.washington.edu>
Message-ID: <857197387.246847726.1352420277650.JavaMail.root@zimbra73-e12.priv.proxad.net>

>> Thanks Mike. Could you post the URLs to the PDF and HTML versions here
>> please? The links in the changelog are broken.
>
>That is a working copy of the changelog (it's a snapshot of the current
>version of the repository). ?The links will work when the changelog is
>posted to the website as part of a release.
>
>I'm attaching the current version of the spec, in PDF and HTML form.
>People can always generate these from the source in the Mercurial
>repository, but I can also send it around.
>
> ? ? ? ? ? ? ? ? ? ? Thanks,
>
> ? ? ? ? ? ? ? ? ? ?-Mike

Thanks for the update! I haven't looked at it in details yet, but I noticed that inner_type_path examples seem to be in the wrong order, at least not in the order I would have expected (i.e. from outside to inside). For instance, for

@H O1. at E O2<@F S, at G T>. at D O3. at A Nested<@B U, @C V>

I would expect this:

@H empty path
@E inner type path { empty path }
@F inner type path { type argument path { 0, empty path } }
@G inner type path { type argument path { 1, empty path } }
@D inner type path { inner type path { empty path } }
@A inner type path { inner type path { inner type path { empty path } } }
@B inner type path { inner type path { inner type path { type argument path { 0, empty path } } } }
@C inner type path { inner type path { inner type path { type argument path { 1, empty path } } } }

Eric


From wdietl at gmail.com  Mon Nov 12 11:36:29 2012
From: wdietl at gmail.com (Werner Dietl)
Date: Mon, 12 Nov 2012 11:36:29 -0800
Subject: Comments on the JSR 308 specification
Message-ID: <CAJYRO=4yYhqznY+u5Qg5c2cD5FgU+q_1Q4WCEB8_wSFEwyOY3w@mail.gmail.com>

Dear JSR 308 expert group and fellow implementers,

please find some comments about the specification (as of November 9th) below.

page 16:
- The values for "path_type" are not specified. Something like Fig. 1
might be overkill, but I don't just want to assume that they are 0 - 4.

- I'm not quite sure I see the advantage of the nested structure.
Couldn't we use an array instead? We are simply specifying a sequence
of steps to follow, not a complicated tree.

struct type_path_entry {
  u1 path_type;
  union {
    // empty for most, only type_argument_path has info
    u1 type_argument_index;
  }
}

struct type_path {
  u1 path_length;
  type_path_entry [ path_length ];
}

The path_length could be used instead of or in addition to ending each
path with empty_path - like we can have either C-style null-terminated
strings or Pascal-style strings with length :-)
I would prefer using path_length without empty_path.

Then instead of:

type_argument_path { 1, type_argument_path { 0, empty path } }

we would have:

length: 2;
elements: type_argument_path, 1, type_argument_path, 0;

I think the resulting sequence of bytes would be the same (if
path_length is left off), but I would find the presentation as array a
lot simpler.
Am I missing what the advantage of this nested presentation is? If so,
I might be implementing it wrong :-)


- On a related note, I find the representation of nested types weird. It
basically is "go one step up in the nesting". Similarly,
type_argument_path could be a "go one step right in the type
arguments" operation.
I find the solution for type arguments nicer and would suggest that we use:

  union {
    // empty for most
    u1 type_argument_index;
    u1 outer_index;
  }

The location for @M in the first example was:

inner_type_path { inner_type_path { inner_type_path { empty_path } } }

and would now simply be:

length: 1;
elements: inner_type_path, 3;

This is similar to the old way of counting, just ignoring any
array/type argument confusion and just counting the nesting. 0 is the
main modifier and left off, 1 is the first enclosing, etc. This way
the index corresponds to the number of "inner_type_path" elements in
the complicated/current way. We could use 0-based instead.


page 14:
- Section 3.3.6: I'm a bit amazed that the "throws_type_index" is a
u2, but the "method_parameter_index" is a u1. Can there really be so
many more exceptions than parameters?

- I note that annotations in the signature are stored in the
"method_info structure", whereas annotations in method bodies are
stored in "a Code attribute".
Is this an intentional difference? Or is this something that wasn't
updated yet? If it's intentional, I would find a heads-up useful,
maybe together with Fig. 1 that discusses the different categories.


page 3:
- I would put "for casts" and "for type tests" together. I would
mention that there are no runtime checks for these.

- In "for constructor invocation results":
In
  myVar . new @Tainted NestedClass
add "()" at the end.

- Point 4 already talks about receivers, before point 5 introduced the
concept and syntax for that. Maybe the order should be switched?

page 5:
- The last sentence of point 5 is the first to mention TYPE_USE. Would
it make sense to introduce the new ElementType constants earlier?

- Point 6 could mention that it uses ElementType.TYPE_PARAMETER

page 10:
- "Annotations that target instructions are _are_ those..."

- In "How Java SE 7 stores annotations" we are reminded that there are
both Runtime[In]VisibleParameterAnnotations and Runtime[In]VisibleAnnotations.
Would it help to highlight that in JSR 308 we do not add
Runtime[In]VisibleParameterTypeAnnotations and instead store such
annotations with the method?

page 11:
- The last paragraph of Section 3.1 should also have a reference to
Section 3.4 and give a similar overview as for the other sections.

page 12:
- Last sentence of Section 3.3 refers to generic type arguments and
arrays only. It should have the complete list with nested types, etc.

- Copy & paste mistake: Section 3.3.1 mentions
"type_parameter_bound_target" and "bound".

page 13:
- Section 3.3.2 could also mention that the index is 0-based, like
earlier and later subsections do (redundantly, I agree).

Best regards,
cu, WMD.

-- 
http://www.google.com/profiles/wdietl

From wdietl at gmail.com  Tue Nov 13 18:49:33 2012
From: wdietl at gmail.com (Werner Dietl)
Date: Tue, 13 Nov 2012 18:49:33 -0800
Subject: [type-annos-observers] Comments on the Nov 7 specification
In-Reply-To: <50A2FFFE.1040706@oracle.com>
References: <50A2FFFE.1040706@oracle.com>
Message-ID: <CAJYRO=6RyQ0+7=gK8EafGdoU1yKbEbwWf0AqEVoQMw0+cX_LGg@mail.gmail.com>

Alex, experts,

how about:

struct type_path_entry {
  u1 type_path_kind;
    // 0: annotation is deeper in this array type
    // 1: annotation is deeper in this nested type
    // 2: annotation is on the bound of this wildcard type arg
    // 3: annotation is on the i'th type arg of this parameterized type
  u1 additional_index;
    // 0: ignore me
    // non-0: the 1st, 2nd, etc array index, nested type, or type arg
of this parameterized type
}

That is, we use the additional_index (better name?) as argument for
arrays, nested types, and type arguments.

Instead of

array_type_path { array_type_path {array_type_path {}}}

one can then write:

length: 1
elements: array_type_path, 3.

This applies in addition to the comment I already had about nested
types, where I also think it would be easier to use a flag and an
argument instead of repeating a flag multiple times.
This would make handling of arrays, nested types, and type arguments
nicely uniform.

I like re-using one field instead of having a context-dependent union.

About encoding everything within the flag: this will prevent any
future extension of the type_path_kind.
Also, how many type parameters can a class/method have? Are 253
possible values enough? (Similarly for the nesting-depth of arrays and
types.)

cu, WMD.


On Tue, Nov 13, 2012 at 6:20 PM, Alex Buckley <alex.buckley at oracle.com> wrote:
> Experts,
>
> Werner Dietl sent the comments below about the spec in Mike's mail
> "Improving the format of type annotation attributes" of 11/7/12.
>
> Most are useful clarifications to the spec, but one comment proposes an
> array rather than a tree to represent the hierarchical location of a type
> annotation in a compound type. Each level in the tree becomes the succeeding
> array entry.
>
> The proposal has a union with a context-sensitive single member that would
> break pack200, so here is a slightly modified version:
>
> struct type_path {
>   u1              path_length;
>   type_path_entry path[path_length];
> }
>
> struct type_path_entry {
>   u1 type_path_kind;
>     // 0: annotation is deeper in this array type
>     // 1: annotation is deeper in this nested type
>     // 2: annotation is on the bound of this wildcard type arg
>     // 3: annotation is on the i'th type arg of this parameterized type
>   u1 type_argument_index;
>     // 0: ignore me
>     // non-0: the 1st, 2nd, etc type arg of this parameterized type
> }
>
> I think this is fine. A further improvement would be to drop the
> type_argument_index item and encode the i'th type arg into the
> type_path_kind item. Namely, type_path_kind values >=3 represent the
> type_path_kind-2'th type arg of the current parameterized type. This trick
> is used in the stack_map_frame structure (JVMS 4.7.4).
>
> Alex
>
> **********
>
> page 16:
> - The values for "path_type" are not specified. Something like Fig. 1
> might be overkill, but I don't just want to assume that they are 0 - 4.
>
> - I'm not quite sure I see the advantage of the nested structure.
> Couldn't we use an array instead? We are simply specifying a sequence
> of steps to follow, not a complicated tree.
>
> struct type_path_entry {
>   u1 path_type;
>   union {
>     // empty for most, only type_argument_path has info
>     u1 type_argument_index;
>   }
> }
>
> struct type_path {
>   u1 path_length;
>   type_path_entry [ path_length ];
> }
>
> The path_length could be used instead of or in addition to ending each
> path with empty_path - like we can have either C-style null-terminated
> strings or Pascal-style strings with length
> I would prefer using path_length without empty_path.
>
> Then instead of:
>
> type_argument_path { 1, type_argument_path { 0, empty path } }
>
> we would have:
>
> length: 2;
> elements: type_argument_path, 1, type_argument_path, 0;
>
> I think the resulting sequence of bytes would be the same (if
> path_length is left off), but I would find the presentation as array a
> lot simpler.
> Am I missing what the advantage of this nested presentation is? If so,
> I might be implementing it wrong
>
>
> - On a related note, I find the representation of nested types weird. It
> basically is "go one step up in the nesting". Similarly,
> type_argument_path could be a "go one step right in the type
> arguments" operation.
> I find the solution for type arguments nicer and would suggest that we use:
>
>   union {
>     // empty for most
>     u1 type_argument_index;
>     u1 outer_index;
>   }
>
> The location for @M in the first example was:
>
> inner_type_path { inner_type_path { inner_type_path { empty_path } } }
>
> and would now simply be:
>
> length: 1;
> elements: inner_type_path, 3;
>
> This is similar to the old way of counting, just ignoring any
> array/type argument confusion and just counting the nesting. 0 is the
> main modifier and left off, 1 is the first enclosing, etc. This way
> the index corresponds to the number of "inner_type_path" elements in
> the complicated/current way. We could use 0-based instead.
>
>
> page 14:
> - Section 3.3.6: I'm a bit amazed that the "throws_type_index" is a
> u2, but the "method_parameter_index" is a u1. Can there really be so
> many more exceptions than parameters?
>
> - I note that annotations in the signature are stored in the
> "method_info structure", whereas annotations in method bodies are
> stored in "a Code attribute".
> Is this an intentional difference? Or is this something that wasn't
> updated yet? If it's intentional, I would find a heads-up useful,
> maybe together with Fig. 1 that discusses the different categories.
>
>
> page 3:
> - I would put "for casts" and "for type tests" together. I would
> mention that there are no runtime checks for these.
>
> - In "for constructor invocation results":
> In
>   myVar . new @Tainted NestedClass
> add "()" at the end.
>
> - Point 4 already talks about receivers, before point 5 introduced the
> concept and syntax for that. Maybe the order should be switched?
>
> page 5:
> - The last sentence of point 5 is the first to mention TYPE_USE. Would
> it make sense to introduce the new ElementType constants earlier?
>
> - Point 6 could mention that it uses ElementType.TYPE_PARAMETER
>
> page 10:
> - "Annotations that target instructions are _are_ those..."
>
> - In "How Java SE 7 stores annotations" we are reminded that there are
> both Runtime[In]VisibleParameterAnnotations and
> Runtime[In]VisibleAnnotations.
> Would it help to highlight that in JSR 308 we do not add
> Runtime[In]VisibleParameterTypeAnnotations and instead store such
> annotations with the method?
>
> page 11:
> - The last paragraph of Section 3.1 should also have a reference to
> Section 3.4 and give a similar overview as for the other sections.
>
> page 12:
> - Last sentence of Section 3.3 refers to generic type arguments and
> arrays only. It should have the complete list with nested types, etc.
>
> - Copy & paste mistake: Section 3.3.1 mentions
> "type_parameter_bound_target" and "bound".
>
> page 13:
> - Section 3.3.2 could also mention that the index is 0-based, like
> earlier and later subsections do (redundantly, I agree).
>
> **********


-- 
http://www.google.com/profiles/wdietl

From ebruneton at free.fr  Fri Nov 16 09:55:23 2012
From: ebruneton at free.fr (Eric Bruneton)
Date: Fri, 16 Nov 2012 18:55:23 +0100
Subject: [type-annos-observers] Comments on the Nov 7 specification
Message-ID: <50A67E0B.2010108@free.fr>

 >Werner has proposed (see below) an optimization so that multiple 
 >"levels" of the same kind of compound type can be "jumped" in one go. 
 >That is, rather than a type_path_entry for each and every successive 
 >type constructor ([] for array types, . for nested types, < for type 
 >arguments, and ? ... for wildcard bounds),

 From an ASM point of view, both using an array instead of nested 
structures and optimizing multiple levels of the same kind is fine. The 
important point is that all paths should be "from outside to inside" 
(e.g. for nested types, the path should go from outer to inner, not the 
other way around as proposed in the Nov 7 specification).

Eric

From ebruneton at free.fr  Sat Nov 24 08:28:47 2012
From: ebruneton at free.fr (Eric Bruneton)
Date: Sat, 24 Nov 2012 17:28:47 +0100
Subject: [type-annos-observers] Comments on the Nov 7 specification
In-Reply-To: <20121120.123716.298465860904600964.mernst@cs.washington.edu>
References: <50A67E0B.2010108@free.fr>
	<20121120.123716.298465860904600964.mernst@cs.washington.edu>
Message-ID: <50B0F5BF.20006@free.fr>

20/11/2012 21:37, Michael Ernst wrote:
> Eric Bruneton said:
>
>>  From an ASM point of view,
>> ...
>> The important
>> point is that all paths should be "from outside to inside" (e.g. for nested
>> types, the path should go from outer to inner, not the other way around as
>> proposed in the Nov 7 specification).
>
> Eric, I'm willing to make this change, but I would like to be able to give
> a more specific justification for the design choice.  Can you explain the
> rationale, or what difference it makes to ASM?  Is the reason that the
> identifiers appear left-to-right in the class file and you want to process
> them in that order, or is it something else?

Right, it's basically that. For instance, consider the problem of 
extracting the part of a type signature in the class file format (e.g. 
Ljava.util.Map<+Ljava.lang.String;Ljava.util.List<Ljava.lang.Object;>;>;) that 
corresponds to a given type_path. With 'outside to inside' paths, this 
can be done with a simple recursive function, like
extract(signature,path)
- if empty path return signature
- otherwise parse the first signature element from the left, check 
conformity with first path element, and call recursively with tail of 
signature and tail of path.

With 'inside to outside' paths, you have to parse the three inner types 
in the signature to realize that 
"inner_path{inner_path{inner_path{empty_path}}}", for instance, was in 
fact the first parsed type (then you either have to save state during 
parsing, or do another parsing pass).

I'm not sure I'm clear, but I hope you get the idea.

Eric

From wdietl at gmail.com  Mon Nov 26 10:22:36 2012
From: wdietl at gmail.com (Werner Dietl)
Date: Mon, 26 Nov 2012 10:22:36 -0800
Subject: [type-annos-observers] Comments on the Nov 7 specification
In-Reply-To: <50B0F5BF.20006@free.fr>
References: <50A67E0B.2010108@free.fr>
	<20121120.123716.298465860904600964.mernst@cs.washington.edu>
	<50B0F5BF.20006@free.fr>
Message-ID: <CAJYRO=5SvdxF8yNYUW7q3Zbd2zA_Qujd6a5r7uDfPnv-KLF9+A@mail.gmail.com>

Eric, all,

I see your point, but want to give two reasons why "inside-out" makes
more sense to me - one reason being the meaning of the types and the
other being the AST representation. I think the AST should be
considered as much as the bytecode representation.

1. Meaning of the nested types:
For a nested type, we can basically think of a type parameterized by
the outer type. This type parameter is what the programmer can access with
Outer.this.
So for the type:

@A Outer. @B Middle. @C Inner

we can think of the generic type:

@C Inner< @B Middle< @A Outer > >

For the generic type, the locations would be:

@A: 3(1), 3(1)
@B: 3(1)
@C: -

Similarly, with the "inside-out" approach, we would have:

@A: 1(0), 1(0)
@B: 1(0)
@C: -

With the "outside-in" approach, we would have:

@A: -
@B: 1(0)
@C: 1(0), 1(0)

That is, "inside-out" follows the logical structure of the generic
type correspondence, whereas "outside-in" breaks that order.


2. AST structure:
Even if we disregard this correspondence to the generic type as too
academic, I think the AST structure makes "inside-out" preferable.
For the type:

Outer.Inner.Middle

we roughly build the following AST:

    VARIABLE id: f
      MEMBER_SELECT expr: Outer.Middle id: Inner
        MEMBER_SELECT expr: Outer id: Middle
          IDENTIFIER Outer

That is, "Inner" is the root of the type, and "Outer.Middle" is the
receiver of the field select, and so on.
Similarly, for the type

@A Outer. @B Middle. @C Inner

we roughly build the following AST:

    VARIABLE id: g
      MODIFIERS
        ANNOTATION
          IDENTIFIER A
      ANNOTATED_TYPE
        ANNOTATION
          IDENTIFIER C
        MEMBER_SELECT expr: Outer. @B() Middle id: Inner
          ANNOTATED_TYPE
            ANNOTATION
              IDENTIFIER B
            MEMBER_SELECT expr: Outer id: Middle
              IDENTIFIER Outer

That is, type "@C Inner" is at the root of the tree and "Outer. @B
Middle" is the receiver expression.

In this AST representation, determining the nesting position is easier
if we can simply count how deep we descend in the tree to reach a
certain type.

I would be interested in hearing from people that use other ASTs
whether they have a similar issue.

Thanks,
cu, WMD.

On Sat, Nov 24, 2012 at 8:28 AM, Eric Bruneton <ebruneton at free.fr> wrote:
> 20/11/2012 21:37, Michael Ernst wrote:
>>
>> Eric Bruneton said:
>>
>>>  From an ASM point of view,
>>> ...
>>> The important
>>> point is that all paths should be "from outside to inside" (e.g. for
>>> nested
>>> types, the path should go from outer to inner, not the other way around
>>> as
>>> proposed in the Nov 7 specification).
>>
>>
>> Eric, I'm willing to make this change, but I would like to be able to give
>> a more specific justification for the design choice.  Can you explain the
>> rationale, or what difference it makes to ASM?  Is the reason that the
>> identifiers appear left-to-right in the class file and you want to process
>> them in that order, or is it something else?
>
>
> Right, it's basically that. For instance, consider the problem of extracting
> the part of a type signature in the class file format (e.g.
> Ljava.util.Map<+Ljava.lang.String;Ljava.util.List<Ljava.lang.Object;>;>;)
> that corresponds to a given type_path. With 'outside to inside' paths, this
> can be done with a simple recursive function, like
> extract(signature,path)
> - if empty path return signature
> - otherwise parse the first signature element from the left, check
> conformity with first path element, and call recursively with tail of
> signature and tail of path.
>
> With 'inside to outside' paths, you have to parse the three inner types in
> the signature to realize that
> "inner_path{inner_path{inner_path{empty_path}}}", for instance, was in fact
> the first parsed type (then you either have to save state during parsing, or
> do another parsing pass).
>
> I'm not sure I'm clear, but I hope you get the idea.
>
> Eric


-- 
http://www.google.com/profiles/wdietl

From wdietl at gmail.com  Tue Nov 27 01:18:28 2012
From: wdietl at gmail.com (Werner Dietl)
Date: Tue, 27 Nov 2012 01:18:28 -0800
Subject: Annotations on exception parameters
Message-ID: <CAJYRO=7m9k+5kkz7UeX1U-nHLeioxO7BpBCJiUZq=bDbaJHLFQ@mail.gmail.com>

In "3.3.8 Exception parameters" the JSR 308 design document from Nov.
7 2012 states that annotations on an exception parameter (e.g. ...
catch (@A Exception e) ...) are stored as an exception table index.

I'm wondering whether for uniformity this could be stored like a local
variable or resource variable, where we store the information for the
variables explicitly.
Unifying this aspect would simplify both the specification and its
implementation.
Thoughts?

cu, WMD.

-- 
http://www.google.com/profiles/wdietl