RFR [15] 8239487: Better links generation for system properties found in HTML files

Tue Mar 10 16:02:42 UTC 2020

Oops. Here is a link to the forgotten updated webrev:

  http://cr.openjdk.java.net/~prappo/8239487/webrev.02/

> On 10 Mar 2020, at 15:45, Pavel Rappo <pavel.rappo at oracle.com> wrote:
> 
> The replies are inline.
> 
>> On 10 Mar 2020, at 02:11, Jonathan Gibbons <jonathan.gibbons at oracle.com> wrote:
>> 
>> Meta comment:
>> This review is getting out of hand; I think it was a mistake for me to have ordered my comments according to your phases, and I apologize; I won't do it again.
> 
> Sorry about that.
> 
>> ---
>> 
>> Ther's some naming stuff and minor feedback, but I think you're getting close.
>> 
>> -- Jon
>> 
>> 
>> On 03/09/2020 10:48 AM, Pavel Rappo wrote:
>>> Jon, the replies are inline and the updated webrev is here:
>>> 
>>>  http://cr.openjdk.java.net/~prappo/8239487/webrev.01/
>>> 
>>>> On 7 Mar 2020, at 01:53, Jonathan Gibbons <jonathan.gibbons at oracle.com> wrote:
>>>> 
>>>> phase-1
>>>> 
>>>> In AbstractIndexWriter,
>>>> the use of "anyOf" with a singleton list seems a bit weird.
>>> I've changed methods' names to "ofCategories" and "containsOfCategories".
>>> I didn't change them to just "of" and "contains" because of the possibility
>>> of having a richer selection of those in the near future. For instance,
>>> ofModule(...), ofPackage(...), havingFirstLetter(...), etc.
>> 
>> "containsOfCategories" doesn't make grammatical sense. Maybe you should go back to "anyOf" naming, which allows for the possibility of "allOf".  
> 
> I know that naming is hard. On the one hand, all code, both internal and external,
> should be of the same good quality. On the other hand, it's an internal API
> inaccessible from the outside. How much effort do we want to put into it?
> 
> Here's another perspective. On the one hand, the name should be a good mnemonic
> and ideally should be sufficient to see what's going on without having to reach
> for documentation. On the other hand, it is an idealistic and probably
> unattainable goal. Sigh.
> 
> I think that the "contains" method should have "any" in its name. The reason is
> that I can easily imagine reading the code and being puzzled by "Does this mean
> that true is returned when ALL of the listed categories or ANY of them are
> found?" Methods that return Stream<SearchIndexItem> are different in that
> respect. One glance at Category resolves any ambiguities. Indeed, an item may be
> of only one category. Thus, no item can be of many categories, thus the implicit
> semantics is "any/or" rather than "all/and".
> 
>> It's just that both containsAnyOf and containsAllOf collapse to just 'contains' for a single argument.
> 
> I can only say that this is not a unique case. java.util.concurrent.CompletableFuture's
> allOf and anyOf come to mind.
> 
>>>> In SearchIndexItem:
>>>> The word "index" is heavily overloaded in javadoc. Would it be better in this case to use "INDEX_TAG"? (Just asking).
>>>> If nothing else, add doc-comments on the members of Category.
>>> I've added comments instead of renaming that constant.
>> OK, at least for now. ( I may change my mind in the future :-) )
>> 
>>> 
>>>> Folding "boolean systemProperty" into the Category seems like a good idea.
>>>> 
>>>> In SearchIndexItems:   ugh for having to put the apiNote after the scary comment. Nothing we can do, except maybe one day think about removing the scary comments, which have maybe outlived their usefulness. Maybe we could limit them to package-info.java fies.
>>>> 
>>>> 120      * <p> The returned stream consists of all items {@code i} for which there's
>>>> It's often considered better to avoid contractions (like "there's") in formal writing.
>>>> 
>>>> 120      * <p> The returned stream consists of all items {@code i} for which there's
>>>> 121      * a category {@code c}, from the specified categories, such that
>>>> 122      * {@code i.getCategory().equals(c)}. The stream may be empty, if there are
>>>> 123      * no such items.
>>>> The commas are questionable around "from the specified categories". Don't use commas when the enclosed text is important to the understanding of the sentence.
>>> Grammar has been fixed. Thanks.
>>> 
>>>> `concat` seems overkill.  For an internal method, I'd simplify the anyOfCategories, and either use overloads or a single varargs and just verify the length is not zero.  By inspection, it seems to only ever use 1 or 2 args. Use overloads!
>>> I still think this method has a value, and I do hope to reuse it really soon.
>>> The reason is, I hold compile-time checks in high regard.
>> 
>> It's not the argument pattern (first, rest) as much as the implementation. I'm unconvinced of the need to detect duplicates, and concat is the wrong name if you do want to suppress duplicates.
> 
> I see. I don't have a strong opinion about nulls and duplicates. However, I'm
> convinced that those extra runtime checks have their value.
> 
> That said, I don't want this thread to rathole into a discussion that has been
> done many times in many places. It really is not that essential in this context.
> So I hope you'll welcome this "skimmed milk" version of the "concat" functionality:
> 
>    /**
>     * Returns a concatenated stream of elements.
>     *
>     * <p> The elements of the returned stream are encountered in the following order:
>     * {@code first, remaining[0], remaining[1], ..., remaining[remaining.length - 1]}.
>     *
>     * @param first
>     *         the first element
>     * @param remaining
>     *         the remaining elements, if any
>     * @param <T>
>     *         the type of elements
>     *
>     * @return the stream of elements
>     *
>     * @throws NullPointerException
>     *         if {@code remaining} is {@code null}
>     */
>    private static <T> Stream<T> concatenatedStreamOf(T first, T[] remaining) {
>        return Stream.concat(Stream.of(first), Stream.of(remaining));
>    }
> 
> Just bare-bones functionality of concatenation for internal consumption.
> Duplicates and nulls are dealt with on the client side, if required.
> 
>>> 
>>> Quick search [*] through the codebase shows there are other places where this
>>> pattern is used. Many of those ultimately push arguments down to either of
>>> 
>>>    java.util.EnumSet.of(E first, E... rest)
>>>    java.nio.file.FileSystem.getPath(String first, String... more)
>>> 
>>> with a couple of exceptions. For example, these methods delegate to neither
>>> of the above:
>>> 
>>>    java.net.http.WebSocket.Builder.subprotocols(String mostPreferred,
>>>                                                 String... lesserPreferred)
>>>    com.sun.tools.javac.util.List.of(A x1, A x2, A x3, A... rest)
>>> 
>>> Among the things I found are these:
>>> 
>>>    javax.tools.StandardJavaFileManager.PathFactory.getPath
>>>    (ahem) jdk.javadoc.internal.doclets.formats.html.markup.HtmlTree.UL
>>>    If you think of alternatives like overloads, as you suggested, or plain vararg,
>>> they have their own problems. Both will contain roughly the same code addressing
>>> duplicates and nulls, albeit a bit more simple. Simple, but not trivial. If so,
>>> we might as well provide this code once. It's not a performance-sensitive
>>> place by any means.
>>> 
>>> For overloads like
>>> 
>>>    ofCategories(Category category)
>>>    ofCategories(Category firstCategory, Category secondCategory)
>>>    ofCategories(Category firstCategory, Category secondCategory, Category thirdCategory)
>>>    ...
>>>    the former method also has a naming problem: should "category" be plural or singular?
>>> 
>>> After I posted the initial webrev, I realized that there's a potential problem
>>> (but not yet a bug) in that `concat` method of mine. Namely, ordering. Set.of()
>>> is ordering-unfriendly by design. So a stream from that set would be of an
>>> unknown encounter order. I fixed in a straightforward way:
>>> 
>>>    @SafeVarargs
>>>    @SuppressWarnings("varargs")
>>>    private static <T> Stream<T> concat(T required, T... optional) {
>>>        Set<T> set = new LinkedHashSet<>(optional.length + 1, 1.0f);
>>>        set.add(Objects.requireNonNull(required));
>>>        for (T t : optional) {
>>>            if (!set.add(Objects.requireNonNull(t)))
>>>                throw new IllegalArgumentException();
>>>        }
>>>        return set.stream();
>>>    }
>>> 
>>>> phase-2
>>>> 
>>>> AbstractIndexWriter
>>>> line 511,512: horrible non-standard formatting: was this suggested by an IDE?
>>> No, it was done by hand. Default IDE's style is not to blame. Fixed.
>>> 
>>>> keyCharacter: when is this method called with an empty string?  It looks like it is related to the SearchIndexItem having en empty label ... perhaps we should check/fix the problem there.
>>>> Be careful of using Character.toUpperCase/toLowerCase ... they are locale-dependent.
>>>> That being said, maybe we do want the locale-dependent version here, which is unusual.
>>> One more of these things can be found here:
>>> 
>>>    jdk.javadoc.internal.doclets.toolkit.util.IndexBuilder#keyCharacter
>>> 
>>> This is all preexisting code. That said, we should definitely investigate it.
>>> Do you want me to do it in the context of this task though?
>> No
>>> 
>>>> While it is reasonable to get the first character of a Java identifier, I wonder if it is reasonable
>>>> to get the first character of an arbitrary string, i.e. from an {@index} tag.
>>>> 
>>>> various: it's not wrong, but I personally don't like the style of importing static methods, like groupingBy and toList. Keep it if you want, but if I were writing the code, I would use Collectors.methodName
>>> I share your dislike in this case. I guess I'm still trying to cram code into
>>> 80-character-long lines. Old habits die hard.
>> The convention for this code is more like 100-characters or so. The days of punched cards and video display units with 80x24 character screens are long gone.
> 
> ok, from now on I'll try to use that extra 25% for the benefit of the jdk.javadoc codebase.
> 
>>> 
>>>> phase-3
>>>> 
>>>> HtmlDocletWriter: this class is one instance per page ... do you really mean to put it here, as compared to Configuration (more global) or some more specific kind of page? (I see it moved in phase 4!)
>>>> 
>>>> SystemPropertiesWriter:80
>>>> I don't know if this is the right or wrong place for the check, but for reference I would compare with other "optional" pages, like Constant Values and Serialized Form.
>>> If in doubt, at least be consistent with what there is at the moment. That makes
>>> sense. As far as I understand you are pointing me to the "builders" infrastructure,
>>> right?
>>> 
>>> SystemPropertyWriter has never had a builder. If we are to use it, may I suggest
>>> we address this in a follow-up issue?
>> 
>> Uugh, builders, uugh.  That's a big can of worms that we don't want to open right now.
> 
> Right. I have put an extra comment there, if you don't mind.
> 
>>> 
>>>> SystemPropertiesWriter:155
>>>> I like the intention, but I would have thought the link factory would have returned a link in code font. I guess I would check other places where links are generated.
>>> Jon, I found
>>> 
>>>    jdk.javadoc.internal.doclets.formats.html.HtmlDocletWriter#plainOrCode
>>> 
>>> and a family of overloaded createLink methods consuming some enigmatic
>>> 
>>>    @param strong     whether to wrap the {@code label} in a SPAN element
>>> 
>>> Is there anything in particular I should be looking for?
>> I guess I was thinking to go look at other places where references to java elements are created. If you chase down code starting from AbstractMemberWriter.Signature, via methods like 'addParam', you end up at HtmlDocletWriter.getLink, which you can call with a suitably configured LinkInfoInfo.
> 
> I'm looking into it and will update you soon. Thanks.
> 
>>>> TagletWriterImpl:457
>>>> The comment is unclear: does "those" refer back to DocFileELement?
>>> I've refined that comment.
>>> 
>>>> I'm also curious why you think using identity equals and hashCode is inefficient.
>>> It's not that those methods are inefficient, it's that this *cache* of
>>> DocletElement instances *might* be inefficient. The worst-case scenario is where
>>> titles for the same HTML document represented by different DocletElement objects
>>> are repeatedly calculated and stored. The comment is just about that. I was lazy
>>> and didn't investigate where those instances come from, I also saw that those
>>> types do not override equals & hashCode, hence this comment.
>> OK, so I guess you're concerned about space-inefficient, not time-inefficient
>> because identity .equals and .hashCode are fast ;-)
> 
> Well, kind of. It's when the cache behaves as if it weren't there and the title
> is parsed out of HTML each and every time.
> 
>>>> Utils:  uuugh ... I keep trying to get stuff *out* of the Utils dumping ground, and here you are adding to it. This is OK for now, but there may be a different utility class waiting to be written for processing DocTrees. This is not the only method to be working on DocTrees (although other work may be elsewhere, down in jdk.compiler module.)
>>> Please propose a more appropriate place.
>> It's OK for now.
>> 
>>> 
>>>> test: the new convention is to generate files on the fly ... which will be even easier when we have text blocks. The ToolBox library class has code to help write out smal files like these ... and using text strings avoids the heavy copyright overhead.
>>> Not that I'm opposing it, just asking: are there any other benefits or using
>>> on-the-fly file generation beyond simply avoiding "the heavy copyright overhead"
>>> in this case?
>> 
>> In all cases where I've had to create such sets of files, I find it's generally easier to visualize the overall pattern of the files when they are co-located.
>> 
>>> 
>>> Both of the said techniques have their advantages and disadvantages. On-the-fly
>>> generation is convenient for example, when the output is small or structurally
>>> flat, or when it is parametric. Here it is neither. Pre-baked files can benefit
>>> from IDE support, which is quite convenient. I wouldn't like to give it up this
>>> lightly.
>> OK.
>>> 
>>>> phase-4
>>>> 
>>>> In SystemPropertiesWriter, there is no need to use a WeakHashMap compared to a standard HashMap.
>>> Right. Using WeakHashMap is more of a declaration: it's cache, the whole cache,
>>> and nothing but the cache. I like this extra bit of information, but I can swap
>>> it to HashMap if you want.
>> It's just an unnecessary overhead to be creating all those weak references.
>> But, your point is taken.
>>> 
>>>> Elements created by javac, which is the dominant collection of elements, will be permanent until the end of the javadoc run. And moreover, SystemPropertiesWriter should have a short lifetime, and no elements will be garbage-collectible while it is alive.
>>> Right. (For the sake of the argument, those DocletElement instances are
>>> created by javadoc.)
>>> 
>>>> Line 172: I see the comment moved here from TagletWriterImpl. Same comments apply that I made for phase-3.
>>>> 
>>>> In SearchIndexItem, if I understand it correctly, it seems strange to add a field of type DocletElement that is specific to certain types of search index item.  And, there is a cognizant disparity between the name of the type (DocletElement) and the name of the method (getElement). Generally, I would expect a method named `getElement` to return an Element.
>>> Understood and fixed.
>>> 
>>>> It seems like all other serach-index items are associated with n element anyway!
>>> Not sure what can be done here. The signature of `getElement` doesn't require
>>> any argument or `int` value for that matter, which could be mistaken for
>>> some sort of index/position such as in List.get(int).
>> I'm not sure I understand what you're saying here. I was trying to say that since all
>> search index items seems to be abstractly associate with some element representing
>> the "target" of the search item, then it was reasonable to have a general getter/setter
>> for the associate element, and that seems to be what you've done, although maybe not
>> so far as to set it in all cases for all SearchIndexItems.
> 
> I see.
> 
>>>  ***
>>> 
>>> What about feedback to the user in case no <title> in the document is found?
>>> Do you have any suggestions?
>> Which user? the one running javadoc, or the one reading the generated files.
> 
> Bob the javadoc-builder. That is, the one running the javadoc tool.
> 
>> The base name of the file (e.g. net-properties.html)
> 
> This is how the fallback is performed in this change.
> 
> -Pavel