FINAL PROPOSAL: An automatic process to shrink and optimize the OpenJDK run-time libraries

Eric Lafortune lafortune at users.sourceforge.net
Wed Feb 6 11:06:08 PST 2008


Staying in the spirit of the goals of this project, here's my short

                               FINAL PROPOSAL

An automatic process to shrink and optimize the OpenJDK run-time libraries

                               Eric Lafortune

1. Summary
----------

The renewed enthusiasm for Java on the desktop and on the web has pushed the
size of the Java run-time libraries into the limelight again. This project
aims to create an automatic process to shrink and optimize the OpenJDK
run-time libraries. Processing the entire set of run-time classes is far from
trivial, essentially due to introspection and interaction with native code.
However, early experiments show that it is feasible, and that such a process
can achieve a significant reduction of the library sizes.

The tool of choice in this process is ProGuard, a free shrinker, optimizer,
obfuscator, and preverifier for java bytecode. ProGuard is already highly
successful with developers of commercial software and software for constrained
devices. As the developer of ProGuard, I have the required expertise to make
this project a success. Experience gained in this project will benefit the
ProGuard project as well.

2. Goals
--------

The goals of this project are twofold:

1) Create an automatic process for shrinking, optimizing, and obfuscating the
    OpenJDK run-time libraries. The obvious result of this process are smaller
    libraries that continue to offer the same functionality with the same API.
    Smaller library sizes are beneficial for all Java users, with smaller code
    archives, faster download times, and smaller memory footprints. Notably,
    projects that build on the OpenJDK run-time libraries and that target
    constrained devices can apply the results directly.

2) More generally, gain experience in the automatic processing of the OpenJDK
    run-time libraries. The run-time libraries illustrate many common and less
    common code constructs. Notably, it contains many types and implementations
    of introspection. This knowledge may further the development of ProGuard and
    supporting tools, extending detection techniques and optimization strategies.
    It will also provide additional insights in the structure of the OpenJDK
    run-time classes.

3. Previous work
----------------

In the context of the size of the Java run-time environment, the Java Kernel
project by Ethan Nicholas at Sun has been receiving a lot of positive interest
recently. It mitigates the potentially long download time of the JRE, by
partitioning the libraries into sets that are downloaded individually and
as needed.

This proposed project attacks the heart of the problem: the size of the
run-time libraries. The idea of automatically optimizing the library sizes
is not new. Yet, somewhat surprisingly, there are no reports of successful
attempts. Only the section with results on the ProGuard website presents some
experimental results. ProGuard is a free Java class file shrinker, optimizer,
obfuscator, and preverifier, available under the terms of the GPL:

     http://proguard.sourceforge.net/

In this experiment, ProGuard processed the Java 6 run-time libraries. The
configuration was composed by myself, based on debug output, instrumentation,
trial and error, custom tools, experience, and some old-fashioned hacking.
Its length of more than 1500 lines provides an indication of the complexity of
the problem. The combined shrinking, optimization, and obfuscation reduces the
total library size by an impressive 66% (from 53 MB to 18 MB). The resulting
run-time environment is still perfectly capable of running ProGuard and the
ProGuard GUI, for instance. However, one can expect the configuration to be
incomplete. As a result, ProGuard is undoubtedly optimizing away classes,
fields, and methods that other applications require. Further investigations
and work are therefore required.

4. Approach
-----------

The planning consists of a number of steps, iterating when necessary:

1) Starting point:
    As mentioned in the previous section, an internal, experimental ProGuard
    configuration for processing the Java 6 run-time classes already exists.
    The project will start from this configuration.

2) Development:
    Since the OpenJDK has been released under the GPL, it is now possible to
    review and to instrument the source code of the run-time environment.
    The project will make full use of these possibilities, for extending the
    initial ProGuard configuration.

3) Testing:
    At the same time, Sun's Java Compatibility Kit is becoming available under
    an open license. This license encourages testing run-time environments that
    are derived from the OpenJDK. This is a perfect match for this project, so
    the JCK will be used for testing the processed run-time libraries. Once the
    processed run-time libraries pass the tests of the JCK, they can be
    confidently used as compact drop-in replacements for the original set of
    libraries.

4) Evaluation:
    The results will then be summarized, providing test results, statistics, and
    configurations. The most interesting statistic should be the final size
    of the processed libraries. The configurations will allow to reproduce all
    results.

5. Deliverables
---------------

The success of the project will be measured by its final deliverables:

1) Any custom tools and procedures that were developed to generate the
    processing configurations.

2) The ProGuard configurations that allow to successfully shrink and optimize
    the OpenJDK run-time libraries.

3) For reference, the actual processed OpenJDK run-time libraries that are
    generated using the above configurations.

4) The test results of the Java Compatibility Kit that demonstrate the
    conformance of the processed OpenJDK run-time libraries.

5) A final report that evaluates the results and provides additional
    statistics.

6. Developer
------------

Eric Lafortune has received a PhD in computer science from the Katholieke
Universiteit Leuven, Belgium. He has then worked as a post-doc at
Cornell University's Program of Computer Graphics. He is currently working
at Luciad, a company that develops high-performance software for GIS in Java.
He has been developing and maintaining ProGuard in his spare time, since 2002.

7. Prize
--------

If this project gets awarded, the associated prize will go to the work
of sister Jeanne Devos:
     http://zusterjeannedevos.org/JDenglish/index%20JDengels.html



More information about the challenge-discuss mailing list