ModuleFileParser reasoning

Fri Jun 22 08:24:48 PDT 2012

I realize I sent out a webrev [1] for this proposed change without 
giving any previous context about the decision making to arrive at a 
pull parser API. So, here it is...

We have use cases that require a module file to be read without 
decompressing or extracting one or more sections, to support more than 
on disk extraction, and quickly terminate when you get what you want.

  * Signer tool
    The signer tool needs to extract and validate the module file hashes.
    For performance reasons it is desirable to skip decompression of
    any section content.
  * IDE's
    An IDE will need to be able to easily extract the class files for
    analysis, possibly into memory. Again, for performance reasons it is
    desirable to skip decompressing any other section content, and also
    finish processing directly after the class files have been extracted.
  * Installer
    Extract module file content and install it into the module library.
    Separating out the internal library format from the parsing of
    the module file.
  * Tools for listing contents and/or display module information.
    It is desirable to support a tool for listing the contents of a
    module file, say 'jpkg contents <module file>', and showing
    the module info. Again, it is desirable to skip decompression of
    any section content.

The same interface for reading a jmod file can potentially be used with 
corresponding implementations for wrapping a modular jar and also 
wrapping a module installed into a library. The former enables uniform 
installation into a library, the latter enables easy extraction when 
hooked up to a module file writer (which is also what the signer needs 
to do when signing).

Given that, it is clear we need to separate out the parsing of the 
module file from the other operations. There are two obvious streaming 
parser choices, pull or push. Pull was chosen mainly for simplicity of 
the API, and placing complete control in the hands of the developer. 
While scoping, here are some of the reasons that lead to this conclusion:

  * Supporting both raw and decompressed section content in a push
    parser is a little confusing. The callback could always be supplied
    with the raw data, but then it forces the developer to deal with
    decompressing.
  * A content handler interface could return a value to indicate
    that further processing of the module file is not required, but
    we would still need the ability to retrieve/verify the section hash.
    Doesn't seem to fit well to build a validating parse on top of a
    non validating parser.
  * Some compression formats can be time consuming to extract, handing
    of raw section data to be extracted seems reasonable, in some cases.
    Again, doesn't seem to fit well with push.
  * Simple tests to iterate over section and subsection content,
    validate the module file, and extract a single section, using a
    pull parser show that they are readable and easily understandable.
  * While you can possibly layer a push parser over a pull, it is
    much more difficult to do the opposite.
  * Anonymous inner classes can be a PITA for developers. While lambda
    will improve this for SAM-type classes, such classes for any push
    model may have more than one method and/or implementations will
    likely mutate state.

Along with the usual comments that are made when comparing SAX and StAX ;-)

Thanks,
-Chris.

[1] http://mail.openjdk.java.net/pipermail/jigsaw-dev/2012-June/002794.html