RFR [15] 8239487: Better links generation for system properties found in HTML files

Tue Mar 10 02:11:15 UTC 2020

Meta comment:
This review is getting out of hand; I think it was a mistake for me to 
have ordered my comments according to your phases, and I apologize; I 
won't do it again.

---

Ther's some naming stuff and minor feedback, but I think you're getting 
close.

-- Jon

On 03/09/2020 10:48 AM, Pavel Rappo wrote:
> Jon, the replies are inline and the updated webrev is here:
>
>    http://cr.openjdk.java.net/~prappo/8239487/webrev.01/
>
>> On 7 Mar 2020, at 01:53, Jonathan Gibbons <jonathan.gibbons at oracle.com> wrote:
>>
>> phase-1
>>
>> In AbstractIndexWriter,
>> the use of "anyOf" with a singleton list seems a bit weird.
> I've changed methods' names to "ofCategories" and "containsOfCategories".
> I didn't change them to just "of" and "contains" because of the possibility
> of having a richer selection of those in the near future. For instance,
> ofModule(...), ofPackage(...), havingFirstLetter(...), etc.

"containsOfCategories" doesn't make grammatical sense. Maybe you should 
go back to "anyOf" naming, which allows for the possibility of 
"allOf".   It's just that both containsAnyOf and containsAllOf collapse 
to just 'contains' for a single argument.

>> In SearchIndexItem:
>> The word "index" is heavily overloaded in javadoc. Would it be better in this case to use "INDEX_TAG"? (Just asking).
>> If nothing else, add doc-comments on the members of Category.
> I've added comments instead of renaming that constant.
OK, at least for now. ( I may change my mind in the future :-) )

>
>> Folding "boolean systemProperty" into the Category seems like a good idea.
>>
>> In SearchIndexItems:   ugh for having to put the apiNote after the scary comment. Nothing we can do, except maybe one day think about removing the scary comments, which have maybe outlived their usefulness. Maybe we could limit them to package-info.java fies.
>>
>>   120      * <p> The returned stream consists of all items {@code i} for which there's
>> It's often considered better to avoid contractions (like "there's") in formal writing.
>>
>>   120      * <p> The returned stream consists of all items {@code i} for which there's
>>   121      * a category {@code c}, from the specified categories, such that
>>   122      * {@code i.getCategory().equals(c)}. The stream may be empty, if there are
>>   123      * no such items.
>> The commas are questionable around "from the specified categories". Don't use commas when the enclosed text is important to the understanding of the sentence.
> Grammar has been fixed. Thanks.
>
>> `concat` seems overkill.  For an internal method, I'd simplify the anyOfCategories, and either use overloads or a single varargs and just verify the length is not zero.  By inspection, it seems to only ever use 1 or 2 args. Use overloads!
> I still think this method has a value, and I do hope to reuse it really soon.
> The reason is, I hold compile-time checks in high regard.

It's not the argument pattern (first, rest) as much as the 
implementation. I'm unconvinced of the need to detect duplicates, and 
concat is the wrong name if you do want to suppress duplicates.

>
> Quick search [*] through the codebase shows there are other places where this
> pattern is used. Many of those ultimately push arguments down to either of
>
>      java.util.EnumSet.of(E first, E... rest)
>      java.nio.file.FileSystem.getPath(String first, String... more)
>
> with a couple of exceptions. For example, these methods delegate to neither
> of the above:
>
>      java.net.http.WebSocket.Builder.subprotocols(String mostPreferred,
>                                                   String... lesserPreferred)
>      com.sun.tools.javac.util.List.of(A x1, A x2, A x3, A... rest)
>
> Among the things I found are these:
>
>      javax.tools.StandardJavaFileManager.PathFactory.getPath
>      (ahem) jdk.javadoc.internal.doclets.formats.html.markup.HtmlTree.UL
>      
> If you think of alternatives like overloads, as you suggested, or plain vararg,
> they have their own problems. Both will contain roughly the same code addressing
> duplicates and nulls, albeit a bit more simple. Simple, but not trivial. If so,
> we might as well provide this code once. It's not a performance-sensitive
> place by any means.
>
> For overloads like
>
>      ofCategories(Category category)
>      ofCategories(Category firstCategory, Category secondCategory)
>      ofCategories(Category firstCategory, Category secondCategory, Category thirdCategory)
>      ...
>      
> the former method also has a naming problem: should "category" be plural or singular?
>
> After I posted the initial webrev, I realized that there's a potential problem
> (but not yet a bug) in that `concat` method of mine. Namely, ordering. Set.of()
> is ordering-unfriendly by design. So a stream from that set would be of an
> unknown encounter order. I fixed in a straightforward way:
>
>      @SafeVarargs
>      @SuppressWarnings("varargs")
>      private static <T> Stream<T> concat(T required, T... optional) {
>          Set<T> set = new LinkedHashSet<>(optional.length + 1, 1.0f);
>          set.add(Objects.requireNonNull(required));
>          for (T t : optional) {
>              if (!set.add(Objects.requireNonNull(t)))
>                  throw new IllegalArgumentException();
>          }
>          return set.stream();
>      }
>
>> phase-2
>>
>> AbstractIndexWriter
>> line 511,512: horrible non-standard formatting: was this suggested by an IDE?
> No, it was done by hand. Default IDE's style is not to blame. Fixed.
>
>> keyCharacter: when is this method called with an empty string?  It looks like it is related to the SearchIndexItem having en empty label ... perhaps we should check/fix the problem there.
>> Be careful of using Character.toUpperCase/toLowerCase ... they are locale-dependent.
>> That being said, maybe we do want the locale-dependent version here, which is unusual.
> One more of these things can be found here:
>
>      jdk.javadoc.internal.doclets.toolkit.util.IndexBuilder#keyCharacter
>
> This is all preexisting code. That said, we should definitely investigate it.
> Do you want me to do it in the context of this task though?
No
>
>> While it is reasonable to get the first character of a Java identifier, I wonder if it is reasonable
>> to get the first character of an arbitrary string, i.e. from an {@index} tag.
>>
>> various: it's not wrong, but I personally don't like the style of importing static methods, like groupingBy and toList. Keep it if you want, but if I were writing the code, I would use Collectors.methodName
> I share your dislike in this case. I guess I'm still trying to cram code into
> 80-character-long lines. Old habits die hard.
The convention for this code is more like 100-characters or so. The days 
of punched cards and video display units with 80x24 character screens 
are long gone.

>
>> phase-3
>>
>> HtmlDocletWriter: this class is one instance per page ... do you really mean to put it here, as compared to Configuration (more global) or some more specific kind of page?  (I see it moved in phase 4!)
>>
>> SystemPropertiesWriter:80
>> I don't know if this is the right or wrong place for the check, but for reference I would compare with other "optional" pages, like Constant Values and Serialized Form.
> If in doubt, at least be consistent with what there is at the moment. That makes
> sense. As far as I understand you are pointing me to the "builders" infrastructure,
> right?
>
> SystemPropertyWriter has never had a builder. If we are to use it, may I suggest
> we address this in a follow-up issue?

Uugh, builders, uugh.  That's a big can of worms that we don't want to 
open right now.

>
>> SystemPropertiesWriter:155
>> I like the intention, but I would have thought the link factory would have returned a link in code font. I guess I would check other places where links are generated.
> Jon, I found
>
>      jdk.javadoc.internal.doclets.formats.html.HtmlDocletWriter#plainOrCode
>
> and a family of overloaded createLink methods consuming some enigmatic
>
>      @param strong     whether to wrap the {@code label} in a SPAN element
>
> Is there anything in particular I should be looking for?
I guess I was thinking to go look at other places where references to 
java elements are created. If you chase down code starting from 
AbstractMemberWriter.Signature, via methods like 'addParam', you end up 
at HtmlDocletWriter.getLink, which you can call with a suitably 
configured LinkInfoInfo.
>
>> TagletWriterImpl:457
>> The comment is unclear: does "those" refer back to DocFileELement?
> I've refined that comment.
>
>> I'm also curious why you think using identity equals and hashCode is inefficient.
> It's not that those methods are inefficient, it's that this *cache* of
> DocletElement instances *might* be inefficient. The worst-case scenario is where
> titles for the same HTML document represented by different DocletElement objects
> are repeatedly calculated and stored. The comment is just about that. I was lazy
> and didn't investigate where those instances come from, I also saw that those
> types do not override equals & hashCode, hence this comment.
OK, so I guess you're concerned about space-inefficient, not 
time-inefficient
because identity .equals and .hashCode are fast ;-)
>
>> Utils:  uuugh ... I keep trying to get stuff *out* of the Utils dumping ground, and here you are adding to it. This is OK for now, but there may be a different utility class waiting to be written for processing DocTrees. This is not the only method to be working on DocTrees (although other work may be elsewhere, down in jdk.compiler module.)
> Please propose a more appropriate place.
It's OK for now.

>
>> test: the new convention is to generate files on the fly ... which will be even easier when we have text blocks. The ToolBox library class has code to help write out smal files like these ... and using text strings avoids the heavy copyright overhead.
> Not that I'm opposing it, just asking: are there any other benefits or using
> on-the-fly file generation beyond simply avoiding "the heavy copyright overhead"
> in this case?

In all cases where I've had to create such sets of files, I find it's 
generally easier to visualize the overall pattern of the files when they 
are co-located.

>
> Both of the said techniques have their advantages and disadvantages. On-the-fly
> generation is convenient for example, when the output is small or structurally
> flat, or when it is parametric. Here it is neither. Pre-baked files can benefit
> from IDE support, which is quite convenient. I wouldn't like to give it up this
> lightly.
OK.
>
>> phase-4
>>
>> In SystemPropertiesWriter, there is no need to use a WeakHashMap compared to a standard HashMap.
> Right. Using WeakHashMap is more of a declaration: it's cache, the whole cache,
> and nothing but the cache. I like this extra bit of information, but I can swap
> it to HashMap if you want.
It's just an unnecessary overhead to be creating all those weak references.
But, your point is taken.
>
>> Elements created by javac, which is the dominant collection of elements, will be permanent until the end of the javadoc run. And moreover, SystemPropertiesWriter should have a short lifetime, and no elements will be garbage-collectible while it is alive.
> Right. (For the sake of the argument, those DocletElement instances are
> created by javadoc.)
>
>> Line 172: I see the comment moved here from TagletWriterImpl. Same comments apply that I made for phase-3.
>>
>> In SearchIndexItem, if I understand it correctly, it seems strange to add a field of type DocletElement that is specific to certain types of search index item.  And, there is a cognizant disparity between the name of the type (DocletElement) and the name of the method (getElement). Generally, I would expect a method named `getElement` to return an Element.
> Understood and fixed.
>
>> It seems like all other serach-index items are associated with n element anyway!
> Not sure what can be done here. The signature of `getElement` doesn't require
> any argument or `int` value for that matter, which could be mistaken for
> some sort of index/position such as in List.get(int).
I'm not sure I understand what you're saying here. I was trying to say 
that since all
search index items seems to be abstractly associate with some element 
representing
the "target" of the search item, then it was reasonable to have a 
general getter/setter
for the associate element, and that seems to be what you've done, 
although maybe not
so far as to set it in all cases for all SearchIndexItems.

>
>    ***
>
> What about feedback to the user in case no <title> in the document is found?
> Do you have any suggestions?
Which user? the one running javadoc, or the one reading the generated files.
The base name of the file (e.g. net-properties.html)
>
> -Pavel
>
> -------------------------------------------------------------------------------
> [*] You can repeat that search using Java-flavored regex:
>
>      (\w+)\s+\w+\s*,\s*\1\s*\.\.\.
>
> There should be more civilized ways of doing that :-) I tried to use the "Search
> Structurally" feature of my IDE, but couldn't make it work. The problem is that
> it gives me a lot of noise because I haven't found a way to make it tell `[]`
> and varargs apart.
>
> One has to be careful when looking for the "(T required, T optional...)"
> pattern. Not all the results of the above search have the required semantics.
> Some happen to use the same type for semantically different entities. Usually
> it's a name and aliases, a base and extensions, a value and args, a delimiter
> and elements, an actual and a list of expected, etc. We are specifically
> interested in method signatures where these parameters are used to pass a
> *uniform* collection of things, but are nevertheless separate, solely for the
> sake of friendlier APIs.
>