RFR [15] 8239487: Better links generation for system properties found in HTML files

Pavel Rappo pavel.rappo at oracle.com
Mon Mar 9 17:48:40 UTC 2020


Jon, the replies are inline and the updated webrev is here:

  http://cr.openjdk.java.net/~prappo/8239487/webrev.01/

> On 7 Mar 2020, at 01:53, Jonathan Gibbons <jonathan.gibbons at oracle.com> wrote:
> 
> phase-1
> 
> In AbstractIndexWriter, 
> the use of "anyOf" with a singleton list seems a bit weird.

I've changed methods' names to "ofCategories" and "containsOfCategories".
I didn't change them to just "of" and "contains" because of the possibility
of having a richer selection of those in the near future. For instance,
ofModule(...), ofPackage(...), havingFirstLetter(...), etc.

> In SearchIndexItem:
> The word "index" is heavily overloaded in javadoc. Would it be better in this case to use "INDEX_TAG"? (Just asking).
> If nothing else, add doc-comments on the members of Category.

I've added comments instead of renaming that constant.

> Folding "boolean systemProperty" into the Category seems like a good idea.
> 
> In SearchIndexItems:   ugh for having to put the apiNote after the scary comment. Nothing we can do, except maybe one day think about removing the scary comments, which have maybe outlived their usefulness. Maybe we could limit them to package-info.java fies.
> 
>  120      * <p> The returned stream consists of all items {@code i} for which there's
> It's often considered better to avoid contractions (like "there's") in formal writing.
> 
>  120      * <p> The returned stream consists of all items {@code i} for which there's
>  121      * a category {@code c}, from the specified categories, such that
>  122      * {@code i.getCategory().equals(c)}. The stream may be empty, if there are
>  123      * no such items.
> The commas are questionable around "from the specified categories". Don't use commas when the enclosed text is important to the understanding of the sentence.

Grammar has been fixed. Thanks.

> `concat` seems overkill.  For an internal method, I'd simplify the anyOfCategories, and either use overloads or a single varargs and just verify the length is not zero.  By inspection, it seems to only ever use 1 or 2 args. Use overloads!

I still think this method has a value, and I do hope to reuse it really soon.
The reason is, I hold compile-time checks in high regard.

Quick search [*] through the codebase shows there are other places where this
pattern is used. Many of those ultimately push arguments down to either of

    java.util.EnumSet.of(E first, E... rest)
    java.nio.file.FileSystem.getPath(String first, String... more)

with a couple of exceptions. For example, these methods delegate to neither
of the above:

    java.net.http.WebSocket.Builder.subprotocols(String mostPreferred,
                                                 String... lesserPreferred)
    com.sun.tools.javac.util.List.of(A x1, A x2, A x3, A... rest)

Among the things I found are these:

    javax.tools.StandardJavaFileManager.PathFactory.getPath
    (ahem) jdk.javadoc.internal.doclets.formats.html.markup.HtmlTree.UL
    
If you think of alternatives like overloads, as you suggested, or plain vararg,
they have their own problems. Both will contain roughly the same code addressing
duplicates and nulls, albeit a bit more simple. Simple, but not trivial. If so,
we might as well provide this code once. It's not a performance-sensitive
place by any means.

For overloads like

    ofCategories(Category category)
    ofCategories(Category firstCategory, Category secondCategory)
    ofCategories(Category firstCategory, Category secondCategory, Category thirdCategory)
    ...
    
the former method also has a naming problem: should "category" be plural or singular?

After I posted the initial webrev, I realized that there's a potential problem
(but not yet a bug) in that `concat` method of mine. Namely, ordering. Set.of()
is ordering-unfriendly by design. So a stream from that set would be of an
unknown encounter order. I fixed in a straightforward way:

    @SafeVarargs
    @SuppressWarnings("varargs")
    private static <T> Stream<T> concat(T required, T... optional) {
        Set<T> set = new LinkedHashSet<>(optional.length + 1, 1.0f);
        set.add(Objects.requireNonNull(required));
        for (T t : optional) {
            if (!set.add(Objects.requireNonNull(t)))
                throw new IllegalArgumentException();
        }
        return set.stream();
    }

> phase-2
> 
> AbstractIndexWriter
> line 511,512: horrible non-standard formatting: was this suggested by an IDE?

No, it was done by hand. Default IDE's style is not to blame. Fixed.

> keyCharacter: when is this method called with an empty string?  It looks like it is related to the SearchIndexItem having en empty label ... perhaps we should check/fix the problem there.
> Be careful of using Character.toUpperCase/toLowerCase ... they are locale-dependent.
> That being said, maybe we do want the locale-dependent version here, which is unusual.

One more of these things can be found here:

    jdk.javadoc.internal.doclets.toolkit.util.IndexBuilder#keyCharacter

This is all preexisting code. That said, we should definitely investigate it.
Do you want me to do it in the context of this task though?

> While it is reasonable to get the first character of a Java identifier, I wonder if it is reasonable
> to get the first character of an arbitrary string, i.e. from an {@index} tag.
> 
> various: it's not wrong, but I personally don't like the style of importing static methods, like groupingBy and toList. Keep it if you want, but if I were writing the code, I would use Collectors.methodName

I share your dislike in this case. I guess I'm still trying to cram code into
80-character-long lines. Old habits die hard.

> phase-3
> 
> HtmlDocletWriter: this class is one instance per page ... do you really mean to put it here, as compared to Configuration (more global) or some more specific kind of page?  (I see it moved in phase 4!)
> 
> SystemPropertiesWriter:80
> I don't know if this is the right or wrong place for the check, but for reference I would compare with other "optional" pages, like Constant Values and Serialized Form.

If in doubt, at least be consistent with what there is at the moment. That makes
sense. As far as I understand you are pointing me to the "builders" infrastructure,
right?

SystemPropertyWriter has never had a builder. If we are to use it, may I suggest
we address this in a follow-up issue?

> SystemPropertiesWriter:155
> I like the intention, but I would have thought the link factory would have returned a link in code font. I guess I would check other places where links are generated.

Jon, I found

    jdk.javadoc.internal.doclets.formats.html.HtmlDocletWriter#plainOrCode

and a family of overloaded createLink methods consuming some enigmatic

    @param strong     whether to wrap the {@code label} in a SPAN element

Is there anything in particular I should be looking for?

> TagletWriterImpl:457
> The comment is unclear: does "those" refer back to DocFileELement?  

I've refined that comment. 

> I'm also curious why you think using identity equals and hashCode is inefficient.

It's not that those methods are inefficient, it's that this *cache* of
DocletElement instances *might* be inefficient. The worst-case scenario is where
titles for the same HTML document represented by different DocletElement objects
are repeatedly calculated and stored. The comment is just about that. I was lazy
and didn't investigate where those instances come from, I also saw that those
types do not override equals & hashCode, hence this comment.

> Utils:  uuugh ... I keep trying to get stuff *out* of the Utils dumping ground, and here you are adding to it. This is OK for now, but there may be a different utility class waiting to be written for processing DocTrees. This is not the only method to be working on DocTrees (although other work may be elsewhere, down in jdk.compiler module.)

Please propose a more appropriate place.

> test: the new convention is to generate files on the fly ... which will be even easier when we have text blocks. The ToolBox library class has code to help write out smal files like these ... and using text strings avoids the heavy copyright overhead.

Not that I'm opposing it, just asking: are there any other benefits or using
on-the-fly file generation beyond simply avoiding "the heavy copyright overhead"
in this case?

Both of the said techniques have their advantages and disadvantages. On-the-fly
generation is convenient for example, when the output is small or structurally
flat, or when it is parametric. Here it is neither. Pre-baked files can benefit
from IDE support, which is quite convenient. I wouldn't like to give it up this
lightly.

> phase-4
> 
> In SystemPropertiesWriter, there is no need to use a WeakHashMap compared to a standard HashMap.

Right. Using WeakHashMap is more of a declaration: it's cache, the whole cache,
and nothing but the cache. I like this extra bit of information, but I can swap
it to HashMap if you want.

> Elements created by javac, which is the dominant collection of elements, will be permanent until the end of the javadoc run. And moreover, SystemPropertiesWriter should have a short lifetime, and no elements will be garbage-collectible while it is alive.

Right. (For the sake of the argument, those DocletElement instances are
created by javadoc.)

> Line 172: I see the comment moved here from TagletWriterImpl. Same comments apply that I made for phase-3.
> 
> In SearchIndexItem, if I understand it correctly, it seems strange to add a field of type DocletElement that is specific to certain types of search index item.  And, there is a cognizant disparity between the name of the type (DocletElement) and the name of the method (getElement). Generally, I would expect a method named `getElement` to return an Element.

Understood and fixed.

> It seems like all other serach-index items are associated with n element anyway!

Not sure what can be done here. The signature of `getElement` doesn't require
any argument or `int` value for that matter, which could be mistaken for
some sort of index/position such as in List.get(int).

  ***

What about feedback to the user in case no <title> in the document is found?
Do you have any suggestions?

-Pavel

-------------------------------------------------------------------------------
[*] You can repeat that search using Java-flavored regex:

    (\w+)\s+\w+\s*,\s*\1\s*\.\.\.

There should be more civilized ways of doing that :-) I tried to use the "Search
Structurally" feature of my IDE, but couldn't make it work. The problem is that
it gives me a lot of noise because I haven't found a way to make it tell `[]`
and varargs apart.

One has to be careful when looking for the "(T required, T optional...)"
pattern. Not all the results of the above search have the required semantics.
Some happen to use the same type for semantically different entities. Usually
it's a name and aliases, a base and extensions, a value and args, a delimiter
and elements, an actual and a list of expected, etc. We are specifically
interested in method signatures where these parameters are used to pass a
*uniform* collection of things, but are nevertheless separate, solely for the
sake of friendlier APIs.



More information about the javadoc-dev mailing list