RFR: JDK-8298405: Support Markdown in Documentation Comments [v4]
Jonathan Gibbons
jjg at openjdk.org
Wed Nov 15 00:26:41 UTC 2023
On Wed, 15 Nov 2023 00:09:17 GMT, Jonathan Gibbons <jjg at openjdk.org> wrote:
>> test/langtools/tools/javac/doctree/FirstSentenceTest.java line 423:
>>
>>> 421: DocComment[DOC_COMMENT, pos:0
>>> 422: firstSentence: 1
>>> 423: RawText[MARKDOWN, pos:0, abc.|def.]
>>
>> It took me a while to understand why this test expects the first sentence to include the second line:
>>
>> ///abc.
>> ///def.
>> void simpleMarkdown() { }
>>
>> It's not because it's Markdown or the new `///` comment syntax. It's because the breakiterator thinks that `abc.\ndef.` is one sentence.
>>
>> Now, in this same file, on [line 123](https://github.com/openjdk/jdk/blob/128363bf3b57dfa05b3807271b47851733c1afb9/test/langtools/tools/javac/doctree/FirstSentenceTest.java#L119-L142), there's this test:
>>
>> /**
>> * abc def ghi.
>> * jkl mno pqr
>> */
>> void dot_newline() { }
>>
>> If you compare its expectation to that of simpleMarkdown(), you'll see that the first sentence consists of the first line only:
>>
>> BREAK_ITERATOR
>> DocComment[DOC_COMMENT, pos:1
>> firstSentence: 1
>> Text[TEXT, pos:1, abc_def_ghi.]
>> body: 1
>> Text[TEXT, pos:15, jkl_mno_pqr]
>> block tags: empty
>> ]
>>
>> So, why does it seem to work differently for `///` and `/** */` comments? It turns out that a whitespace character immediately after newline is important for breakiterator to do its thing:
>> ```
>> abc def ghi.\n jkl mno pqr
>> ^
>>
>> The problem with the simpleMarkdown test is that we cannot just indent the comment. That indentation will be deemed incidental whitespace and, thus, will removed. For example, suppose we do this:
>>
>> /// abc.
>> /// def.
>> void simpleMarkdown() { }
>>
>> The breakiterator will still see `abc.\ndef.`. Of course, we could do this:
>>
>> ///abc.
>> /// def.
>> void simpleMarkdown() { }
>>
>> But it would be a different comment.
>>
>> There's another test in this PR, [TestMarkdown.testFirstSentence](https://github.com/openjdk/jdk/blob/128363bf3b57dfa05b3807271b47851733c1afb9/test/langtools/jdk/javadoc/doclet/testMarkdown/TestMarkdown.java#L68-L83). There, the first sentence consists only of the first line. This is likely because the second line starts with a capital letter.
>
> Good and interesting catch.
>
> As a general matter, there are two scenarios to consider: using a system-provided `BreakIterator` and using the fallback "simple iterator" in DocTreeMaker.getSentenceBreak.
>
> The latter is just a period followed by a single whitespace, or a paragraph break if there is no period.
>
> The system-provided `BreakIterator` is locale specific. I cannot easily find a public spec of the sentence breaker for the US locale, but experiments indicate it wants one of
>
> * period, single whitespace, uppercase letter
> * period, multiple whitespace
>
> Arguably, the test is lazy/broken for using all lower case, which is an atypical form for sentences and use of a sentence breaker. I'll modify at least the new parts of the test (using Markdown comments.). We can separately decide whether to likewise modify the original parts (using traditional comments) or whether to do that as a separate `noreg-self` bug fix for this test.
After modifying the test to capitalize the pseudo-sentences, the test again passes, and the output with the break iterator is now the same as the default simple iterator. Yay.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16388#discussion_r1393474011
More information about the compiler-dev
mailing list