Project Lambda: Java Language Specification draft

Osvaldo Doederlein opinali at gmail.com
Mon Jan 25 06:19:08 PST 2010


2010/1/24 Per Bothner <per at bothner.com>

> On 01/24/2010 01:53 PM, Osvaldo Pinali Doederlein wrote:
> > It can be argued that the performance of literal collections is not very
> > important because such collections are typically very small - people
> > don't populate a million-element Map with literal code, right? This is a
> > good general assumption, but there are important exceptions like
> > machine-generated code (e.g. parsers) which often contains enormous
> > datasets encoded as initialized variables.
>
> Luckily, the performance of large literals isn't a problem on the Java
> platform, because you can't write/generate large literals, thanks to the
> limitations of the class file format.
>
> :-(
>
>
It's a problem at least for those literals that are as large as the current
format allows... I remember noticing some JDK7 commits that optimize the
loading time of certain APIs (Unicode encoders?), by refactoring
initialization of large static datasets into resource files or strings. Even
with some extra parsing/decoding effort, the result was faster loading.

Unfortunately Java always suffered from binary formats that were designed
without any concern for loading time or sharing among several processes. The
ZIP envelope, even without compression, is as bad as you can get to organize
a bunch of related classes. The classfile format is justified by
portability, verification etc., and it's generally OK but it could be
better; besides a better constant pool it should support multiple classes in
the same file, this would buy us big reduction in JAR files (remarkably much
less redundancy in constant pool entries), and allow optimized linkage (no
symbol resolution) between classes of that same file. JAR should be replaced
by a good binary format that's optimally designed for quick location of all
objects (without extra cruft in manifest files), with standard unified
data/code/linking/debug-info sections like native formats, etc. The Pack200
format mostly removes the massive redundancy of JAR files; with a better
classfile format, we could approach Pack200's efficiency with just tgz
compression for downloadable JARs - and, for installed JARs, have
significantly smaller files without any compression or other tricks.

If you look at the I/O patterns of Java cold-startup, with utilities like
Windows SysInternal's Process Monitor or Solaris's dtrace, it's just sad,
the VM performs a huge number of tiny reads - 30 bytes here, 40 bytes there,
thousands upon thousands of times. The CDS covers roughly half of the core
libraries, but not app code, frameworks, containers, etc. The Java Applets
(with or without JavaFX) are not yet sufficiently competitive in loading
time; Flash, and even the more similar Silverlight, are still noticeably
better, even after after all improvements from 6u10-6u18. Sun is working
very hard to fix this problem, which is critical to their plans with JavaFX
and JavaStore. JDK7 with Jigsaw will make another (hopefully big) leap
forward, with a much better format for deployed modules, perhaps even
ahead-of-time compilation (JIT caching). But in other words, they are paying
a heavy price -- years of engineering effort since the initial 6uN project,
and the risk of missing narrow time-to-market windows -- to undo the mistake
made years back, when they didn't consider important to design a robust,
optimized deployment format (.NET incudes this in their Assemblies design
since first version). So far the only public info is for the (deployable)
module files, no info yet about installed formats (this doesn't really need
a public spec because it's implementation-specific just like CDS - but it
will suck if important platforms, say MacOSX, don't get it ported, or
develop something similar). And no sign of enhancements to the classfiles
that still live inside modules (at least in the deployable form), so that
seems like yet another missed opportunity (I know, I know, too many RFEs too
little time/resources for 7fcs...).

A+
Osvaldo



More information about the coin-dev mailing list