RFR: 8248001: javadoc generates invalid HTML pages whose ftp:// links are broken

Daniel Fuchs dfuchs at openjdk.java.net
Fri Aug 27 09:15:25 UTC 2021


On Fri, 27 Aug 2021 08:23:17 GMT, Masanori Yano <myano at openjdk.org> wrote:

>> I would normally opt for a generic regexp-based solution such as proposed by @dfuch, but there is a security aspect to this as well (e.g. script invocation), so I'd go with the more conservative approach here to just add `ftp:` protocol to the list.
>
> I decided the regex `^[^:/?#]+:.+$` from the description in RFC 2396.
> 
> B. Parsing a URI Reference with a Regular Expression
> 
>    The following line is the regular expression for breaking-down a URI
>    reference into its components.
> 
>       ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?
>        12            3  4          5       6  7        8 9
>   ...
>    Therefore, we can determine the value of the four components and fragment as
>   ...
>       scheme    = $2
> 
> I agree that adding `ftp:` is better for the viewpoint of security. However, in addition to ftp, schemes such as javascript and git may be specified, so it's difficult to cover all commonly used schemes.

That regexp will correctly break the URI  into its different components but it doesn't guarantee that each of the component is syntactically correct - as further syntax restriction may apply on each of the components.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5198


More information about the javadoc-dev mailing list