diamond operator & implementation strategies (v3)

Neal Gafter neal at gafter.com
Fri Aug 21 12:12:56 PDT 2009


A significant benefit of the "complex" approach is that it interacts nicely
with possible future support for inference in argument positions.  The
"simple" approach cannot be extended in a straightforward way to interact
nicely with inference in argument positions.  If this is indeed the end of
the line of the evolution of the Java programming language, then this
particular choice doesn't matter much.  On the other hand, if future
evolution is likely, then an approach that supports evolution is to be
preferred.

On Fri, Aug 21, 2009 at 11:07 AM, Maurizio Cimadamore <
Maurizio.Cimadamore at sun.com> wrote:

> Hi
> We have spent some time playing with the diamond operator[1, 6] - the
> result of this work are two different prototypes, one implementing the
> 'simple' strategy discussed in my earlier post[2, 3], another one
> implementing a more sophisticated approach that has been proposed by
> Neal [4,5] (in the remainder of this mail I'll refer to the latter as
> the 'complex' approach).
>
> The goal of this work has been to evaluate the impact of both strategies
> on existing code bases - in particular we were interested in knowing the
> fraction of constructors that could take advantage of the diamond
> operator and to see whether the two implementation strategies led to a
> significant difference w.r.t. the applicability of the new diamond
> operator.
>
> We have spent some more time performing benchmarks in order to evaluate
> both implementation strategies. The results produced by the two
> algorithms have been compared in order to check whether the inferred
> type was still applicable to the given site. More formally, let S be the
> generic type specified by the new expression in the pre-diamond code and
> T the type inferred by the diamond inference algorithm for that
> expression. There are 3 possible cases:
>
>    a) T == S
>    b) T != S, but T and S are interchangeable given the call site (e.g.
> because T is a subtype of S)
>    c) T != S, with T and S not interchangeable
>
> a) and b) mean success, while c) means failure (the algorithm yields a
> result which is not equivalent to the explicit type specified by the
> programmer). In order to evaluate the fitness of a given implementation
> strategy, we have instrumented the two prototypes in order to emit
> detailed info about each new expression encountered by the compiler. In
> particular, for any given new expression whose target type is a generic
> type, both inference schemes have been applied and their results
> compared to the original generic type specified in the source code. The
> info generated by the two prototypes can be grouped into 3 categories
> (roughly corresponding to the classification above):
>
> *) "redundant" - this denotes a situation where generic type annotations
> can safely be removed from a new expression (see case (a,b) above). In
> order to provide more detailed info, We have splitted this category into
> two sub-categories:
>
> -) "redundant.normal" - this denotes a new expression appearing in an
> assignment context, more specifically where the type inferred by the
> diamond operator is a subtype of the expected type
>
> -) "redundant.no.context" - this denotes a new expression appearing in a
> place lacking of an assignment context, more specifically where the
> inferred type is a subtype of the original generic type
>
> *) "not.redundant" - this denotes a situation where there is no
> guarantee that removing generic type annotations from a new expression
> would yield to a sound expression (see case (c) above). In this case the
> inference algorithm exploited by the diamond operator has inferred a
> type T which is incompatible with both the expected type PT and the
> original type G (the one including generic type annotations). Note: it
> is possible that, by chance, the type inferred by the diamond operator
> would type-check where it is used (e.g. because that type is the type of
> an actual argument in a method call, where the corresponding formal is
> Object).
>
> *) "failure" - this denotes a failure of the inference algorithm. In
> this case it is definitively not possible to omit the generic type
> annotations, as the type inferred by the diamond operator is not valid
> (e.g. violates some constraints specified by the type-variables'
> declared bounds).
>
> Here's a brief summary of the statistics regarding the applicability of
> the diamond operator in three systems (see [7] for more detailed info):
>
> *** OpenJDK ***
>
> Overview:
>
> new expressions: 104138
>   new expressions w/ generic type: 5076 (4.87%)
>
> Results:
>
> redundant types (overall):  simple 4409 (86.88%) / complex 4533 (89.30%)
>   redundant (assignment): simple 4353 (85.76%) / complex 4352 (85.74%)
>   redundant (other contexts: simple 56 (1.1%) / complex 181 (3.57%)
> non redundant: simple 662 (13.04%) / complex 536 (10.56%)
> failures: simple 4 (0.08%) / 7 (0.13%)
>
> *** Tomcat ***
>
> Overview:
>
> new expressions: 6048
>   new expressions w/ generic type: 153 (2.53%)
>
> Results:
>
> redundant types (overall):  simple 148 (96.73%) / complex 148 (96.73%)
>   redundant (assignment): simple 148 (96.73%) / complex 148 (96.73%)
>   redundant (other contexts: simple 0 (0%) / complex 0 (0%)
> non redundant: simple 5 (3.38%) / complex 5 (3.38%)
> failures: simple 0 (0%) / complex 0 (0%)
>
> *** Netbeans ***
>
> Overview:
>
> new expressions: 94768
>   new expressions w/ generic type: 12010 (12.67%)
>
> Results:
>
> redundant types (overall):  simple 10670 (88.84%) / complex 11085 (92.30%)
>   redundant (assignment): simple 10628 (88.49%) / complex 10610 (88.34%)
>   redundant (other contexts: simple 42 (0.35%) / complex 475 (3.96%)
> non redundant: simple 1338 (11.14%) / complex 907 (7.55%)
> failures: simple 2 (0.02%) / complex 18 (0.15%)
>
>
>
> The numbers are relatively stable across all benchmarks - to summarize,
> around 90% of all new expression whose target type is generic can be
> simplified using diamond (regardless of the chosen implementation
> strategy). The remaining 10% correspond to situations in which some type
> have been inferred - but there is no guarantee that the code would
> type-check (e.g. because the inferred type is not a subtype of the type
> in the code).
>
> No approach looks noticeably better than the other; the simple approach
> is weaker when it comes to infer types in an argument position - on the
> other hand the additional amount of types inferred by the complex
> approach is relatively low, and often the complex approach is less
> stable - that is there are more cases in which complex inference fails
> without inferring any type (this problem could be fixed, in principle,
> by improving standard javac type-inference, but such a task is outside
> the scope of the project Coin).
>
> The area in which the complex approach is stronger is where a class has
> recursive bounds; in such cases there's almost no way for the simple
> approach to infer a correct type, while the complex approach usually can
> rely on the type of some actual arguments in order to infer the right
> type (obviously this doesn't apply if the constructor is a no-arg
> constructor). Here's an example (extracted from OpenJDK):
>
> from DefaultMXBeanMappingFactory.java
>
> private static <T extends Enum<T>> MXBeanMapping
>            makeEnumMapping(Class<?> enumClass, Class<T> fake) {
>        return new EnumMapping<T>(Util.<Class<T>>cast(enumClass));
> //would work with complex but not with basic
>    }
>
> [...]
>
> private static final class EnumMapping<T extends Enum<T>> [...]
>
>
> On the other hand there are situations in which the complex approach
> infers a type which is too specific and is thus incompatible with the
> LHS. Because of that, the amount of failures in the complex approach is
> slightly higher. Consider the following sample snippet (extracted from
> OpenJDK):
>
> from Snapshot.java
>
>    private SoftReference<Vector> finalizablesCache;
>
>    [...]
>
>    Vector<JavaHeapObject> finalizables = new Vector<JavaHeapObject>();
>
>    [...]
>
>    finalizablesCache = new SoftReference<Vector>(finalizables); //would
> work with basic but not with complex
>
>    [...]
>
>
> As a final notice, it can be seen that collection classes are the
> 'killer app' for the diamond operator. Moreover the amount of raw
> constructor calls where the target type is a raw collection is
> relatively high across all benchmarks [7] (the 'most generified source'
> award goes to NetBeans).
>
>
>
>  From our analysis, we conclude that the simple approach should be
> preferred, as the extra complexity required by the complex approach does
> not entirely pay off. As said above, this is because the scope of
> javac's method type inference is limited (e.g. no return type-inference
> is applied when a generic method call occur in argument position).
> Because of that, the complex approach is able to deliver only a small
> improvement over a simpler and straightforward approach. Moreover, our
> initial concerns about code refactoring seem to be confirmed by the
> numbers given in the benchmarks - it is possible to notice how, across
> all benchmark, the simple approach gives better results in straight
> assignment contexts (except for the Tomcat benchmark, where the two
> prototypes give the same results). In order to improve the behavior of
> the complex approach in assignment contexts some non-trivial changes to
> javac's method type inference are required (e.g. perform a single round
> of type inference where all constraints - formals and return type - are
> considered at once). Again, such changes are risky, and definitively
> outside the scope of project Coin.
>
> Here's a brief outline of the JLS changes that would be required in
> order to support the simple approach:
>
> 15.9 (grammar)
>
> This production:
>
> "ClassBodyopt:
>         Primary. new TypeArgumentsopt Identifier TypeArgumentsopt
> (ArgumentListopt) ClassBodyopt"
>
> Should be changed as follows:
>
> "ClassOrInterfaceType:
>         Primary. new TypeArgumentsopt Identifier
> TypeArgumentsOrDiamondopt (ArgumentListopt) ClassBodyopt
>
> TypeArgumentsOrDiamondopt: TypeArguments
>                           <>"
>
>
> Moreover, section 15.9.1 (Determining the Class being Instantiated)
> should be adjusted so that the following paragraph is added:
>
> "If the type denoted by ClassOrInterfaceType has an empty type argument
> list (diamond operator), then the following steps must be applied in
> order to infer the actual type of the class being instantiated:
>
> *) Let C be the name of the (generic) class being instantiated, and X1,
> X2... Xn the formal type arguments declared by class C. Moreover, if the
> instance creation expression occurs in a context where it will be
> subject to assignment conversion (§5.2) to a type S, then let G be C<X1,
> X2 ... Xn>,
>
> *) Consider the following initial set of constraints:
>
>     * the constraint S >> G
>     * additional constraints Bi >> Ti, where Bi is the declared bound
> of Xi,
>
> *) The above constaints are used to infer constraints on the type
> arguments using the algorithm of section (§15.12.2.7). Any equality
> constraints are resolved, and then, for each remaining constraint of the
> form Xi <: Uk, the argument Xi is inferred to be glb(U1, ..., Uk)
> (§5.1.10).
>
> Any remaining type variables that have not yet been inferred are then
> inferred to have type Object"
>
> [1]
> http://mail.openjdk.java.net/pipermail/coin-dev/2009-February/000009.html
> [2] http://mail.openjdk.java.net/pipermail/coin-dev/2009-May/001720.html
> [3] http://cr.openjdk.java.net/~mcimadamore/6840638/webrev.0/<http://cr.openjdk.java.net/%7Emcimadamore/6840638/webrev.0/>
> [4] http://mail.openjdk.java.net/pipermail/coin-dev/2009-March/000075.html
> [5] http://cr.openjdk.java.net/~mcimadamore/6840638/webrev.1/<http://cr.openjdk.java.net/%7Emcimadamore/6840638/webrev.1/>
> [6] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6840638
> [7] http://blogs.sun.com/mcimadamore/resource/diamonds_statistics.pdf
>
> Maurizio
>
>
>
>
>



More information about the coin-dev mailing list