Where do empty compilation units belong?
Alex Buckley
alex.buckley at oracle.com
Mon Dec 3 23:50:55 UTC 2018
Thanks to Jon and Jay's testing, we can make the following statement:
Compilers ignore a source file that is physically empty (zero length) or
logically empty (contains only whitespace and/or comments).
(I confirmed this by tweaking Test.java so that the empty case was not
`""` but rather `" /* Comment */ "`. javac still accepted/ignored it.)
In other words, compilers do not observe an ordinary compilation unit if
it has no package, import, or type declarations -- a.k.a. a "vacant"
ordinary compilation unit.
We know this to be true because if compilers did observe a vacant
ordinary compilation unit, then the lack of a package declaration would
cause an error when the empty source file is in a modular location; but
no such error is given.
Compilers are free to take this not-observable stance, per 7.3: "The
host system determines which compilation units are observable". It would
be possible to mandate that the host system MUST NOT observe a vacant
ordinary compilation unit, but such a mandate would probably have
unintended consequences. It would also be possible to define a vacant
ordinary compilation unit out of existence, by tweaking 7.3's grammar as
proposed in the quoted mail below, but again, beware unintended
consequences. What the JLS should do is affirm the compilers' decision
to "accept/ignore" a vacant ordinary compilation unit, by clarifying
that a vacant ordinary compilation unit is exempt from the "part of an
unnamed package" rule in 7.4.2. I have filed spec bug JDK-8214743; "An
ordinary compilation unit that has no package declaration, but has at
least one other kind of declaration, is part of an unnamed package."
Alex
P.S. In the course of examining 7.3's grammar, I realized that
OrdinaryCompilationUnit is not congruent with how 2.1 defines a
production in a context-free grammar as having "a sequence of one or
more nonterminal and terminal symbols as its right-hand side."
2.1's definition is intended to apply _after_ interpretation of 2.4's
grammar notation. For example, the production `A: [B]` is really two
productions, `A: ` and `A: B`. The first has zero symbols as its RHS, so
the grammar is not context-free -- parsing of an A is possible at any
time, based on considerations other than the terminals in hand.
Similarly, the production `C: {D}` is really an infinite number of
productions `C: ` and `C: D` and `C: D D` and `C: D D D` etc.
OrdinaryCompilationUnit is significant for being the only production in
the JLS to allow zero symbols and thus _not_ be context-free. Compilers
provide the context when they lex an empty source file and decide not to
observe an ordinary compilation unit therein.
There's nothing good to be done here. We aren't going to change the
longstanding OrdinaryCompilationUnit production after all, and I don't
want to complicate 2.1 by special-casing its zero-symbols RHS.
On 12/3/2018 8:29 AM, Jayaprakash Artanareeswaran wrote:
> Thanks for the test file Jon. Last week I and Stephan had a discussion
> and agreed with the specified behavior and made some changes to our
> compiler.
>
> I can also confirm that both the compilers behave the same way for all
> the scenarios included in the test file.
>
> Regards,
> Jay
>
> ------------------------------------------------------------------------
> *From:* Jonathan Gibbons <jonathan.gibbons at oracle.com>
> *Sent:* Monday, November 26, 2018 11:22 PM
> *To:* Alex Buckley; Jayaprakash Artanareeswaran;
> jigsaw-dev at openjdk.java.net; compiler-dev
> *Subject:* Re: Where do empty compilation units belong?
>
>
>
> On 11/26/2018 01:44 PM, Alex Buckley wrote:
>> // Adding compiler-dev since the parsing of files into compilation
>> units is not a Jigsaw issue.
>>
>> On 11/20/2018 9:14 PM, Jayaprakash Artanareeswaran wrote:
>>> "jigsaw-dev" <jigsaw-dev-bounces at openjdk.java.net>
>>> <mailto:jigsaw-dev-bounces at openjdk.java.net> wrote on 21/11/2018
>>> 01:56:42 AM:
>>> > Jon points out that `OrdinaryCompilationUnit` will match an empty
>>> stream
>>> > of tokens (I dislike the syntax-driven optionality here, but it's
>>> > longstanding) so the file D.java could be regarded as a
>>> compilation unit
>>> > with no package declaration, no import declarations, and no type
>>> > declarations.
>>> >
>>> > Per JLS 7.4.2, such a compilation unit is in an unnamed package, and
>>> > must be associated with an unnamed module.
>>> >
>>> > I would prefer 7.4.2 to say only that a compilation unit with no
>>> package
>>> > declarations _and at least one type declaration_ is in an unnamed
>>> > package (and must be associated with an unnamed module; 7.3 should
>>> > enumerate that possibility). A compilation unit with no package
>>> > declarations _and no type declarations_ would be deemed
>>> unobservable by
>>> > 7.3, and all these questions about what to do with empty files would
>>> > disappear.
>>>
>>> That would be perfect and make things unambiguous. But for now, the
>>> paragraph above is good enough for me.
>>
>> Unfortunately, import declarations can have side effects (compile-time
>> errors) so to be sure that the "no package or type decl ===
>> unobservable" rule is suitable for a file containing just an import
>> decl, we would have to do a case analysis of how javac and ecj handle
>> the eight combinations of the three parts allowed in an ordinary
>> compilation unit. That's overkill for the situation involving empty
>> files that keeps coming up and that I really want to clarify. I don't
>> think anyone loves that an ordinary compilation unit matches the empty
>> stream, so let's define away that scenario. As Jon said, an empty file
>> doesn't present anything to be checked; there is no compilation unit
>> there, so let's be unambiguous about that.
>>
>> We can rule out the empty stream in 7.3 with grammar or with
>> semantics. Usually a semantic description is clearest (gives everyone
>> the proper terminology and concepts) but in this case we don't want
>> the description to wrestle with "consists of one, two, or three parts"
>> when the grammar allows zero. So, a new grammatical description is
>> appropriate, and straightforward:
>>
>> OrdinaryCompilationUnit:
>> PackageDeclaration {ImportDeclaration} {TypeDeclaration}
>> ImportDeclaration {ImportDeclaration} {TypeDeclaration}
>> TypeDeclaration {TypeDeclaration}
>>
>> The "three parts, each of which is optional" description is still
>> accurate. The package decl part is optional (as long as you have the
>> import decls part and/or the type decls part); the import decls part
>> is optional (as long as you have either the package decl part or ...)
>> ... you get the picture.
>>
>> I would leave 7.4.2 alone; an ordinary compilation unit with no
>> package or type decls but with import decls is part of the unnamed
>> package (and thus unnamed module) as before, and compilers can handle
>> that, I think.
>>
>> Any comments?
>>
>> Alex
>
> That seems good to me.
>
> To summarize the javac behavior ...
>
> * javac accepts/ignores an empty file
> * javac treats import-only compilation units as in the unnamed
> package, which is not allowed in a named module
> * javac enforces file naming constraints when declaring a public class
> * javac uses file naming constraints when looking on the (module)
> source path for a file for a class
>
>
> Attached is a toy class to generate combinations of package, import and
> type declarations. You can use the source-launcher feature to run it.
>
> -- Jon
>
More information about the jigsaw-dev
mailing list