RFR: 8187521: In some corner cases the javadoc tool can reuse id attribute

Fri Sep 29 20:34:17 UTC 2017

Jon,

Looks good with a caveat, suggest a comment:
testDocEncoding.java

+        checkOutput("stylesheet.css", true,
+                "body {\n"
+                + "    background-color:#ffffff;");
+
+        charset = Charset.forName("UTF-8");   <--- this line needs an explanation comment.

I don't need to see another iteration.

Kumar

> Updated patch to address a test failure on Windows.
>
> The test failure was specifically caused by a defensive check on the 
> default platform encoding.
> The fix for that test failure is to enhance JavadocTester to determine 
> the file encoding used to
> write files, and to use that when reading files in order to check 
> their content.
>
> That enhancement to JavadocTester triggered the need to update another 
> test, TestDocEncoding.java.
> which can now do a better job because of the better suport for using 
> the correct doc encoding.
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8187521
> Webrev: http://cr.openjdk.java.net/~jjg/8187521/webrev.01/index.html
>
> -- Jon
>
> On 09/26/2017 03:21 PM, Jonathan Gibbons wrote:
>> Please review this fix regarding the handling of ID values generated 
>> by the standard javadoc doclet.
>>
>> The root cause of the specific issue is a corner case in Java. A 
>> class may contain a method
>> with the same name as the enclosing class, and any corresponding 
>> constructor.
>>
>> The fix is to use a different name for the name of the constructor 
>> that cannot clash with
>> any method name, with `<init>` being the obvious choice.
>>
>> The fix is restricted, along with some other changes, to when 
>> generating HTML5 docs.
>> There are other problems with IDs/anchor names when using HTML 4.01, 
>> which are all
>> better addressed by using HTML5, leaving support for HTML 4.01 
>> unchanged at this point.
>>
>> In HTML5, there are no restrictions on the individual characters in 
>> an ID other than to prohibit
>> whitespace. Therefore, there is no longer any need to use the 
>> javadoc-specific encoding
>> using `:` and `-` to encode otherwise invalid characters. Thus, the 
>> ID for a member of the class
>> is just the signature of the member: the name for a field, the name 
>> and parameter-type list
>> for a method, and `<init>` and parameter-type list for a constructor.
>>
>> Finally, because this opens up the possibility of square brackets 
>> appearing in a signature
>> (where previously "[]" was encoded as ":A"), the encoding of URLs 
>> which have fragments
>> containing [] had to be improved, by using the standard URL 
>> %-encoding for these characters.
>>
>> The tests are updated, with additional tests being added for members 
>> with non-ASCII
>> identifiers, to verify that IDs and defined and referenced correctly.
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8187521
>> Webrev: http://cr.openjdk.java.net/~jjg/8187521/webrev.00/index.html
>>
>> -- Jon
>