RFR: 8248001: javadoc generates invalid HTML pages whose ftp:// links are broken

Masanori Yano myano at openjdk.java.net
Fri Aug 27 08:26:24 UTC 2021


On Tue, 24 Aug 2021 09:59:59 GMT, Hannes Wallnöfer <hannesw at openjdk.org> wrote:

>> That said a stricter regexp (unless I'm mistaken) could be: `^[a-zA-Z][a-zA-Z0-9+-.]*:.+$`
>> [ from RFC 2396:     scheme        = alpha *( alpha | digit | "+" | "-" | "." ) ]
>
> I would normally opt for a generic regexp-based solution such as proposed by @dfuch, but there is a security aspect to this as well (e.g. script invocation), so I'd go with the more conservative approach here to just add `ftp:` protocol to the list.

I decided the regex `^[^:/?#]+:.+$` from the description in RFC 2396.

B. Parsing a URI Reference with a Regular Expression

   The following line is the regular expression for breaking-down a URI
   reference into its components.

      ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?
       12            3  4          5       6  7        8 9
  ...
   Therefore, we can determine the value of the four components and fragment as
  ...
      scheme    = $2

I agree that adding `ftp:` is better for the viewpoint of security. However, in addition to ftp, schemes such as javascript and git may be specified, so it's difficult to cover all commonly used schemes.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5198


More information about the javadoc-dev mailing list