Project Lambda: Java Language Specification draft

Osvaldo Doederlein opinali at gmail.com
Mon Jan 25 05:13:26 PST 2010


Hi,

2010/1/24 Reinier Zwitserloot <reinier at zwitserloot.com>

> You're worried about the performance impact of 1 object allocation that is
> almost guaranteed to be short-lived and may even be entirely eliminated by
> the hotspot compiler?
>

It's not just "1 object allocation". May be a dozen allocations if I'm
building a tree of objects, each node requiring its own builder. May be a
few thousands, if I'm doing that inside some method that happens to be
called from a loop. These *gratuitous* inefficiencies always find a way to
bite you in the butt. The result is often a balkanization of the language,
as performance-critical code avoids some/all higher-level features. (I'm not
talking extreme/niche things, like game engines that preallocate all objects
at startup. I'm talking much more mundane stuff, like XML parsers that
partially avoids/duplicates APIs like java.lang.String.)


> Basing language decisions on that kind of backward thinking is a very very
> bad idea.
> Your argument also makes literally no sense at all, you should read your
> own tripe before you post it. You're complaining about the performance of
> *GENERATED* code such as parsers. WTF?
>

That was just one easy example, granted not a good one - right now, good
parser builders avoid even the overhead of array initialization, with
hideous tricks like encoding numbers into strings (which can be stuffed in
the constant pool).


> I thought basing language decisions on fear of performance impact for a
> short-lived singular object instantiation was as worse as it was going to
> get, but you've outdone yourself in the span of a single post: You're now
> basing decisions of language design on making life easy for code generators.
> You've missed April 1st by a few months, mate.
>

And you are missing much more, if your POV is just ignoring such low-level
efficiency issues if they make things any hard. See, I'm NOT proposing to
twist the language design around such things - if a given, very useful
feature really demands extra allocation, or more complex method dispatch or
whatever, so be it. But, when the design space allows some choices that
impact performance, we should obviously consider this factor as something
important.

Case in point: Java5's enhanced-for, which ALWAYS uses a Iterator object,
even for collections like ArrayList (@see RandomAccess interface; it was
created WITH THAT PURPOSE btw). So I'd expect javac to generate code that
uses size() and get(int) for Iterable objects which static type includes
List&RandomAccess; but it doesn't. I have complained about this in a couple
occasions and I'm still waiting for a justification. Unless somebody
provides me such justification (perhaps there's one - I just don't know it),
this is plain wrong. Performance-critical code continues to be written in a
lower-level style, dodging facilities like enhanced-for because programmers
learn that javac generates unnecessarily inefficient code.

And don't come with cheap talking of "may be eliminated by hotspot" (EA /
scalar replacement). This optimization was not even in the horizon many
years ago, when JDK 5 was released. It's not available yet - the first Sun
HotSpot release to do this will be the one shipping with JDK 7, and even
then, this will be a Server-only optimization. We won't have such goodness
in Client VMs for some years more. We won't have it in the more constrained
JavaME runtimes for many years more. We don't want bogus allocations in RTSJ
VMs either. And I'm not interested in how fast my code may run circa 2015.
And btw, another technology that's yet far from the silver-bullet level is
GC. Allocating objects like it costs nothing, often costs a lot due to cache
pollution alone - even with perfect GC behavior. This trend is only getting
worse as computing platforms evolve to have increasingly deeper memory
hierarchies. More sophisticated GCs usually have some tradeoff - read
barriers, larger heap sizes to get same job done, etc.

You accuse me of "backward thinking", but these lists are discusing a
(pretty minor and conservative - even w/ current lambdas proposal) update of
the Java programming language. You see, Java is not Groovy or Scala or Ruby
or Clojure or even JavaFX Script. Java is a language that was designed back
in 1995, with some important performance tradeoffs - like primitive types,
simple vtable-based dispatch etc. - and like it or not, these tradeoffs did
have a BIG share for the success of the Java language and the Java platform
- the competition at the time was C/C++ and MFC, not Ruby on Rails. Today,
the ability of Java to double as a systems programming language is still
critical, because people are building EVERYTHING in Java, not just
application-level code. We hack protocol stacks, media codecs, web servers,
crypto libraries, imaging, every kind of middleware, and tons of similar
stuff written in Pure Java code. Just look the sources of your typical
JavaEE server, it contains all these things - and it works like a charm (at
least in the server space, where the startup and memory overheads of all
this "pureness" is not a big issue).

High-level features are indeed, often a good opportunity to get more
performance, and not less. This is possible when the source compiler is able
to exploit extra semantic knowledge, and perform sophisticated
transformations in the code. For example, a JavaFX Script "for" loop (which
body has non-Void result) is conceptually a generator, which always returns
a sequence with all values produced in each iteration. But in practice, the
compiler only does this when necessary. If the for's value is not assigned
or used anywhere, that sequence is not generated at all. If the for's value
is inserted in another sequence that was being built in the outer scope, the
compiler generates code that just adds each generated element directly to
the outer sequence. There are also other optimizations to use special
sequences for unboxed primitive types, or temporary mutable sequences for
bursts of updates in conceptually-immutable sequences. All these
optimizations are possible and transparent, because the language makes their
effects opaque to the programmer. This is nothing new, many language
(notably functional langs) have been using such tricks for decades. I would
like to see the java language, and the javac compiler, evolving to enable
such optimizations in our collections - that would blend wonderfully with
frameworks that make intense use of lambdas, because the app code is often a
big stream of temporary collections and function objects (i.e. get this
List, filter it with some lambda X to produce another temp List, then apply
another lambda Y that picks/produces a key for each object to get a Map,
etc.); now, first-class collection support may be necessary to produce
optimal code (e.g., perform a single loop over the initial List, applying
the inlined/combined code from X and Y and populating the final Map, etc.).
If we don't do that eventually, Java will soon be losing benchmarks to the
likes of Clojure - which doesn't make any sense, as we pay - and must
continue to pay forever - the significant cost of Java's lower-level
designs.

The lambdas proposal is another fine example of this kind of opportunity.
Lambdas have a gerate potential to be FASTER than the existing, lower-level
feature they replace (inner classes), basically for two reasons:
1) The compiler will have the choice of not allocating any object, and
producing less code bloat than inner classes, remarkably because the lambda
syntax doesn't carry the dreaded "new" keyword. If a lambda doesn't capture
any enclosing state, it can be allocated statically (singleton). If a lambda
is simple enough and invoked locally (in the same method it's defined) or
invoked from a method that's trivial to inline (like a static helper method
/ control abstraction), the lambda itself can be inlined.
2) Lambdas can be implemented with MethodHandles (see Rémi's "christmas
Gift"), which potentially produces faster and less bulky code, and makes
transformations like currying "accelerated" by the JIT support for
MethodHandle.
Now, these optimizations are only possible if the spec is written with
sufficiently care to not make them dangerous, e.g. by exposing to the
application code some implementation detail that would change if certain
optimizations are applied (and I agree that optimizations in general should
be non-normative; their existence should not cause portability or
linking/RRBC compatibility issues).

I realize that these ideas go against the traditional javac stance of not
doing even most easy optimizations, but maybe it's time to revisit this
decision. It was OK in Java 1.0 when the language was close to a 1-to-1
mapping to the bytecode spec, so there wasn't much opportunity for
optimization (other than trivial "bytecode quickening" opts). But the
language is changing; we are adding such features as lambdas and special
syntax for collections, features that enable important optimizations
(sometimes producing MASSIVE performance gains) which are possible and even
relatively easy to do at source compilation time - but very difficult and
expensive to do at JIT compilation time (remarkably if the JIT must first
pattern-match a very convoluted, non-standard, javac-generated bytecode).
Even if the Sun javac team has no resources/time to perform said
optimizations in time for JDK7-fcs, it's good enough to have a spec that
makes them possible; then we can wait for 7uXX updates, or competing
compilers like ECJ, or special "bytecode optimization" programs like
ProGuard.

A+
Osvaldo


>
> --Reinier Zwitserloot
>
>
>
> On Sun, Jan 24, 2010 at 10:53 PM, Osvaldo Pinali Doederlein <
> opinali at gmail.com> wrote:
>
>> Em 24/01/2010 18:59, Mark Thornton escreveu:
>> > Osvaldo Pinali Doederlein wrote:
>> >>
>> >> The collection literals proposal is already underwhelming IMHO,
>> >> because it boils down to sugar over the Collections.unmodifiableXxx()
>> >> APIs. We
>> > I partly agree. For lists and sets something like this
>> >
>> > public CollectionLiterals {
>> > <T> public static List<T> list(T... e) {...}
>> > }
>> >
>> > used with static import is almost as brief. Maps though are more
>> > annoying to initialise. The best I can manage with existing Java
>>
>> These idioms (like Builder / fluent interfaces) are stuff I won't touch
>> with a 10-foot pole. You are forced to allocate at least one extra
>> object (a Builder), so there's extra overhead unless you rely on
>> optimizations like escape analysis + scalar replacement. I don't like
>> the language/library design of "let's ignore performance, make a mess,
>> and pray our advanced JIT compiler will clean it up". This often fails,
>> remarkably for client-side code that must run on the less featured VMs
>> like HotSpot Client; startup/warmup time is another issue even for the
>> top JITs. On top of that, even a good fluent interface is pathetic wrt
>> readability if compared to proper language-level syntax. I prefer to
>> completely ignore this technique/trend and just write a full page of
>> add() or put() calls. </rant - don't take it personally>
>>
>> Tuples could be even worse, because now we're allocating one temp object
>> per entry; now your runtime performance will certainly suffer miserably
>> if these tuples are not optimized out (granted, that's more likely if
>> tuples are added as "lightweight" headerless objects, which is something
>> MLVM people are researching).
>>
>> It can be argued that the performance of literal collections is not very
>> important because such collections are typically very small - people
>> don't populate a million-element Map with literal code, right? This is a
>> good general assumption, but there are important exceptions like
>> machine-generated code (e.g. parsers) which often contains enormous
>> datasets encoded as initialized variables. This remembers me of another
>> RFE that I never lose an opportunity to remember: the classfile format's
>> poor constant pool - you cannot encode a populated array, or an object
>> that allows compile-time construction (e.g. "new Point(0,0)" -
>> constructor only assigns to fields and is called with compile-time
>> literals). Such initializations are supported by VERY bulky bytecode,
>> that's initialized at class-init time or construction time, when ideally
>> they could just be memcpy'd from the constant pool (or even mmapped,
>> with something like the CDS).
>>
>> A+
>> Osvaldo
>>
>> >
>> > public CollectionLiterals {
>> > public static <K,V> MapBuilder<K,V> mapOf(K key, V value) {...}
>> > }
>> >
>> > public static interface MapBuilder<K, V> {
>> >        MapBuilder<K,V> and(K key, V value);
>> >        Map<K,V> create();
>> > }
>> >
>> > Giving
>> >
>> > mapOf(a,b).and(c,d) ... .create();
>> >
>> > However I have managed a StackOverflow in javac with code like this :-(.
>> >
>> > Now if there was a short way of creating tuples so that
>> >
>> > public static <K,V> Map<K,V> mapOf(Map.Entry<K,V> ... entries) {}
>> >
>> > could be used like
>> >
>> > mapOf((a,b), (c,d), ...)
>> >
>> > preferably without so many parentheses.
>> >
>> >
>> >> concrete implementations for these collections. While higher-level
>> >> langs may contain multiple implementations of some collection and
>> >> automatically pick or change the optimal impl for each situation (even
>> > Easier to do with functional style where changing the implementation,
>> > when a value or mapping is added or removed, presents no problem.
>> >
>> >
>> > Mark
>> >
>>
>>
>>
>



More information about the coin-dev mailing list