RFR: 8187521: In some corner cases the javadoc tool can reuse id attribute

Sat Oct 7 02:18:07 UTC 2017

Updated webrev, with comment added:
http://cr.openjdk.java.net/~jjg/8187521/webrev.02

-- Jon

On 09/29/2017 01:34 PM, Kumar Srinivasan wrote:
> Jon,
>
> Looks good with a caveat, suggest a comment:
> testDocEncoding.java
>
> +        checkOutput("stylesheet.css", true,
> +                "body {\n"
> +                + "    background-color:#ffffff;");
> +
> +        charset = Charset.forName("UTF-8");   <--- this line needs an 
> explanation comment.
>
> I don't need to see another iteration.
>
> Kumar
>
>
>
>> Updated patch to address a test failure on Windows.
>>
>> The test failure was specifically caused by a defensive check on the 
>> default platform encoding.
>> The fix for that test failure is to enhance JavadocTester to 
>> determine the file encoding used to
>> write files, and to use that when reading files in order to check 
>> their content.
>>
>> That enhancement to JavadocTester triggered the need to update 
>> another test, TestDocEncoding.java.
>> which can now do a better job because of the better suport for using 
>> the correct doc encoding.
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8187521
>> Webrev: http://cr.openjdk.java.net/~jjg/8187521/webrev.01/index.html
>>
>> -- Jon
>>
>> On 09/26/2017 03:21 PM, Jonathan Gibbons wrote:
>>> Please review this fix regarding the handling of ID values generated 
>>> by the standard javadoc doclet.
>>>
>>> The root cause of the specific issue is a corner case in Java. A 
>>> class may contain a method
>>> with the same name as the enclosing class, and any corresponding 
>>> constructor.
>>>
>>> The fix is to use a different name for the name of the constructor 
>>> that cannot clash with
>>> any method name, with `<init>` being the obvious choice.
>>>
>>> The fix is restricted, along with some other changes, to when 
>>> generating HTML5 docs.
>>> There are other problems with IDs/anchor names when using HTML 4.01, 
>>> which are all
>>> better addressed by using HTML5, leaving support for HTML 4.01 
>>> unchanged at this point.
>>>
>>> In HTML5, there are no restrictions on the individual characters in 
>>> an ID other than to prohibit
>>> whitespace. Therefore, there is no longer any need to use the 
>>> javadoc-specific encoding
>>> using `:` and `-` to encode otherwise invalid characters. Thus, the 
>>> ID for a member of the class
>>> is just the signature of the member: the name for a field, the name 
>>> and parameter-type list
>>> for a method, and `<init>` and parameter-type list for a constructor.
>>>
>>> Finally, because this opens up the possibility of square brackets 
>>> appearing in a signature
>>> (where previously "[]" was encoded as ":A"), the encoding of URLs 
>>> which have fragments
>>> containing [] had to be improved, by using the standard URL 
>>> %-encoding for these characters.
>>>
>>> The tests are updated, with additional tests being added for members 
>>> with non-ASCII
>>> identifiers, to verify that IDs and defined and referenced correctly.
>>>
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8187521
>>> Webrev: http://cr.openjdk.java.net/~jjg/8187521/webrev.00/index.html
>>>
>>> -- Jon
>>
>