Snippet specification feedback

Thu Mar 30 11:40:35 UTC 2023

Hello, Pavel!

Thank you for the detailed answer.

> 1. Attribute value syntax. The spec says "An attribute value may be an
> identifier, unsigned integer, or enclosed in either single or double quote
> characters; no escape characters are supported". I assume that {@snippet
> class=pkg.Class} is a malformed tag, as pkg.Class is not an identifier and
> not quoted. Nevertheless, it's parsed by the javadoc tool and displayed, as
> if it were {@snippet class="pkg.Class"}. Similarly {@snippet
> file=pkg/Class.java} also works, while according to the spec it should not.
> Is it an implementation problem (implementation is more permissive than
> required by spec) or a spec problem? If such tag should be accepted, could
> you specify which non-identifier symbols exactly are allowed in unquoted
> values?
>
> That seems like a pure specification issue. I think the Standard Doclet
> Specification (spec) switches between Java, HTML and maybe some other type
> of identifiers too freely, without proper indication. You can get a hint of
> that when the snippet section suddenly starts talking about _simple_
> identifiers: as far as I know, Java does not categorise identifiers.
> _Names_ can be qualified and simple, identifiers cannot [^1][^2].
>

So which behavior is correct? Should we assume that the qualified name like
pkg.Class should be accepted without quotes? How should the specification
be read here instead?

> > 2. "region" attribute. It's not explicitly specified that such an
> attribute exists. One can only guess from existing samples and javadoc tool
> implementation. Only "id", "lang", "class", and "file" attributes are
> mentioned in the specification. It would be nice to specify the "region"
> attribute as well.
>
> Not sure what you mean here. There's (i) a subsection on regions and also
> (ii) individual snippet tag subsections that mention this attribute.
>

Subsection on regions speaks about how to declare a region inside the
snippet. It doesn't say a word about how exactly to select the region to be
rendered. Individual snippet tag subsections don't say either. I checked
all the 34 occurrences of the 'region' word inside the document but haven't
found anything about the "region" attribute on the @snippet tag. Note that
the argument of the markup tag (such as @start) is not the same thing as
the attribute of the @snippet tag, as @snippet tag is not a markup tag. The
closest thing is see is:

> The file for an external snippet may contain multiple regions, to be
referenced in different snippet tags, appearing in different parts of the
documentation.

But it's never specified how exactly should I reference to multiple
regions. It's never said that the syntax is {@snippet
region=<region_name>}. Please correct me if I'm wrong.

> > 4. Markup tags placement. It's specified: "They are placed in //
> comments (or the equivalent in other languages or formats)". To me, it's
> quite a vague statement. Apparently, parser cannot understand every single
> existing file format in the world, so it may have no idea how comments are
> represented in the target file format. Moreover, target file format may
> have no formal specification. For example, if we have external snippet with
> .txt extension, which kind of comment prefix should we use to define
> regions? I tried
> >
> > outside
> > # @start region="hello"
> > mytest
> > # @end
> >
> > The javadoc tool fails to find the region in this case. But I can argue
> that # represents a comment line in my text files. I strongly feel that
> this part should be specified more precisely: either list all possible
> preceding symbols, or provide another exact description about which
> preceding characters are recognized as comment start. Should the parser
> behavior actually depend on the language (specified by 'lang' attribute or
> file extension)?
>
> If I recall correctly, initially, we didn't want to allow authors to
> choose the EOL-comment marker. Instead, the marker was and is inferred from
> the type (the lang attribute) of the snippet. I cannot remember the
> rationale behind it. It might be because we felt it was "too much too
> soon", or it might have had something to do with the fact that EOL comments
> aren't simple: for example, in .properties, # or ! means a comment line
> [^3], not an end-of-line comment.
>
> Naturally, snippets whose lang attribute has value "java" or "properties"
> assume such markers. Inline snippet whose lang attribute is unspecified
> uses //.
>

By the way, it's not specified that "properties" is a valid language name
which should be recognized (or is it actually at the implementation
discretion?). Which other languages should be recognized by default?

It's interesting that if I create the following properties file, then the
region is properly recognized by javadoc tool:

a=b\
# @start region=x
c=d
# @end

But in fact it's not a comment but part of a multiline string literal.
Which is completely expected, as we cannot assume that the javadoc tool
will actually lex every snippet file. But it also illustrates that the
understanding of 'comment' by the javadoc tool differs from the actual
understanding in the target language. And it really raises the question
about special support of some files (# is recognized in *.properties but
not in *.txt) which is an unspecified implementation detail.

> Eventually, someone will need to parse an external or hybrid snippet that
> does not use any of those. We should carefuly think about it; I'm not ready
> to propose anything at this time.
>

Yes, exactly. I would just specify a possible list of recognized prefixes,
like #, !, //, maybe 'rem' or something else. And the same prefixes should
work in any file, regardless of the extension or specified language. This
will be simple. If you want to specify the markup inside something which is
not a valid comment in the target language (e.g., using # in Java), it's up
to you.

>
> > 5. Markup tag arguments format. It's not specified completely. There is
> a sample `@start region=name` which implies that "name=value" format is
> used for arguments, but it's completely unclear which characters are
> allowed, which are not, whether the quotation is supported, are there any
> escape characters, etc. This is especially important, as arguments may
> contain regular expressions which are known to contain non-trivial
> characters. One may guess that markup tag arguments are formatted exactly
> like snippet tag attributes, but it would be nice to specify this
> explicitly.
>
> Generally, what the snippet parser wants to avoid is ambiguities related
> to these symbols: ", ', }. Aside from those and the unicode escapes [^4],
> there are no escapes in snippets and only one special character combination
> to avoid in inline snippets, */, which wouldn't be an issue if doc comments
> were hosted in // instead of /* ... */ comments; but that's a discussion
> for another day.
>

This doesn't clarify much, even adds confusion. You've mentioned unicode
escapes. I'm ok with them in the inline snippets but are they supported in
external snippets (either Java, or non-Java)? Can I use @start
region="multi word region name"? Nothing is said in the spec, but it looks
like this works. What if I want to use a double quotation character inside
the multi-word value? Is it possible?

> > 6. Whitespace rendering. While it's said that "Markup comments do not
> appear in the generated output", the spec does not say anything about
> preceding whitespace. E.g., consider the following snippet:
> >
> > /**
> >  * {@snippet lang=Java :
> >  * System.out.println(1);
> >  * // @replace substring=2 replacement=3:
> >  * System.out.println(2);
> >  * }
> >  */
> >
> > We exclude the // comment from the rendering. However, there are four
> spaces before it. Should they be rendered? The javadoc tool does not render
> them. It would be nice to specify this behavior.
>
> It spawned (an internal?) discussion just before the feature was
> integrated. Early experiments suggested that authors do expect standalone
> markup to disappear without a trace. So not only should the markup comment
> and any preceding whitespace go away, but the freed-up empty line should
> too.
>

In fact, the current implementation looks somewhat inconsistent, but it's
unclear whether it's correct or not, as we have no specification. For
example, consider the external snippet:

class Test {
// @start region=r0
// @start region=r1
void one() {}
// @end
void two() {}
// @end
void three() {}
}

I want to render the region r0. I write {@snippet file="Test.java"
region=r0} and get in rendered HTML

void one() {}
void two() {}

No empty line between one and two. Ok, it was stripped. But if I use a
hybrid snippet and want to remove markup from the inline version, the
javadoc tool requires me to write this:

/**
* {@snippet file="Test.java" region=r0:
*
*
* void one() {}
*
* void two() {}
* }
*/

So, in this case empty lines are not stripped (including even the line that
starts the r0 region). Moreover, now the rendering is spoiled, as it's
rendered with empty lines as well. It looks like, for rendering an inline
version is preferred over the external version. This is not specified, and
these versions may produce different result.

>
> > 7. Common indentation. It looks like the common indentation is stripped
> from the rendered snippet, similarly to text blocks. Is it an
> implementation detail or should be specified?
>
> This is intentional and was spelled in JEP, but somehow went missing from
> the spec. This behavior is to facilitate pasting snippet content into a
> documentation comment without the need to reindent that content afterwards,
> which might be painful in some code editors.
>

Well, I was not entirely correct. This is spelled in the 'Inline snippets'
subsection:

Surrounding whitespace is stripped from the content using
String::stripIndent
<https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/lang/String.html#stripIndent()>
.

However, the same rule is applied to the external snippets. Moreover,
stripping de-facto applied after the region is selected. stripIndent input
is the contents of the selected region, not the contents of the whole
snippet, both for external and inline snippets. So there's definitely a
room to improve the specification. It's also not specified whether the
stripping is applied before or after replacements. Looks like before.
Example:

/**
* {@snippet :
* void test() { // @replace substring="void" replacement=" "
* System.out.println("Hello");
* }
* }
*/

The rendered version is indented by two spaces. So one cannot say that
stripping is applied before any markup processing or after any markup
processing. De-facto it's applied in-between, after extracting the region
but before applying everything else. This is not specified.

There are more tricky questions like in which order several replacements or
highlightings are applied, but probably it's ok to keep this part
unspecified...

With best regards,
Tagir Valeev.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20230330/978938a4/attachment-0001.htm>