ModuleFileParser reasoning
Chris Hegarty
chris.hegarty at oracle.com
Fri Jun 22 08:24:48 PDT 2012
I realize I sent out a webrev [1] for this proposed change without
giving any previous context about the decision making to arrive at a
pull parser API. So, here it is...
We have use cases that require a module file to be read without
decompressing or extracting one or more sections, to support more than
on disk extraction, and quickly terminate when you get what you want.
* Signer tool
The signer tool needs to extract and validate the module file hashes.
For performance reasons it is desirable to skip decompression of
any section content.
* IDE's
An IDE will need to be able to easily extract the class files for
analysis, possibly into memory. Again, for performance reasons it is
desirable to skip decompressing any other section content, and also
finish processing directly after the class files have been extracted.
* Installer
Extract module file content and install it into the module library.
Separating out the internal library format from the parsing of
the module file.
* Tools for listing contents and/or display module information.
It is desirable to support a tool for listing the contents of a
module file, say 'jpkg contents <module file>', and showing
the module info. Again, it is desirable to skip decompression of
any section content.
The same interface for reading a jmod file can potentially be used with
corresponding implementations for wrapping a modular jar and also
wrapping a module installed into a library. The former enables uniform
installation into a library, the latter enables easy extraction when
hooked up to a module file writer (which is also what the signer needs
to do when signing).
Given that, it is clear we need to separate out the parsing of the
module file from the other operations. There are two obvious streaming
parser choices, pull or push. Pull was chosen mainly for simplicity of
the API, and placing complete control in the hands of the developer.
While scoping, here are some of the reasons that lead to this conclusion:
* Supporting both raw and decompressed section content in a push
parser is a little confusing. The callback could always be supplied
with the raw data, but then it forces the developer to deal with
decompressing.
* A content handler interface could return a value to indicate
that further processing of the module file is not required, but
we would still need the ability to retrieve/verify the section hash.
Doesn't seem to fit well to build a validating parse on top of a
non validating parser.
* Some compression formats can be time consuming to extract, handing
of raw section data to be extracted seems reasonable, in some cases.
Again, doesn't seem to fit well with push.
* Simple tests to iterate over section and subsection content,
validate the module file, and extract a single section, using a
pull parser show that they are readable and easily understandable.
* While you can possibly layer a push parser over a pull, it is
much more difficult to do the opposite.
* Anonymous inner classes can be a PITA for developers. While lambda
will improve this for SAM-type classes, such classes for any push
model may have more than one method and/or implementations will
likely mutate state.
Along with the usual comments that are made when comparing SAX and StAX ;-)
Thanks,
-Chris.
[1] http://mail.openjdk.java.net/pipermail/jigsaw-dev/2012-June/002794.html
More information about the jigsaw-dev
mailing list