RFR: 8315767: InetAddress: constructing objects from BSD literal addresses [v7]
Michael McMahon
michaelm at openjdk.org
Thu Apr 18 10:26:57 UTC 2024
On Wed, 17 Apr 2024 22:23:41 GMT, Sergey Chernyshev <schernyshev at openjdk.org> wrote:
>> There are two distinct approaches to parsing IPv4 literal addresses. One is the Java baseline "strict" syntax (all-decimal d.d.d.d form family), another one is the "loose" syntax of RFC 6943 section 3.1.1 [1] (POSIX `inet_addr` allowing octal and hexadecimal forms [2]). The goal of this PR is to provide interface to construct InetAddress objects from literal addresses in POSIX form, to applications that need to mimic the behavior of `inet_addr` used by standard network utilities such as netcat/curl/wget and the majority of web browsers. At present time, there's no way to construct `InetAddress` object from such literal addresses because the existing APIs such as `InetAddress.getByName()`, `InetAddress#ofLiteral()` and `Inet4Address#ofLiteral()` will consume an address and successfully parse it as decimal, regardless of the octal prefix. Hence, the resulting object will point to a different IP address.
>>
>> Historically `InetAddress.getByName()/.getAllByName()` were the only way to convert a literal address into an InetAddress object. `getAllByName()` API relies on POSIX `getaddrinfo` / `inet_addr` which parses IP address segments with `strtoul` (accepts octal and hexadecimal bases).
>>
>> The fallback to `getaddrinfo` is undesirable as it may end up with network queries (blocking mode), if `inet_addr` rejects the input literal address. The Java standard explicitly says that
>>
>> "If a literal IP address is supplied, only the validity of the address format is checked."
>>
>> @AlekseiEfimov contributed JDK-8272215 [3] that adds new factory methods `.ofLiteral()` to `InetAddress` classes. Although the new API is not affected by the `getaddrinfo` fallback issue, it is not sufficient for an application that needs to interoperate with external tooling that follows POSIX standard. In the current state, `InetAddress#ofLiteral()` and `Inet4Address#ofLiteral()` will consume the input literal address and (regardless of the octal prefix) parse it as decimal numbers. Hence, it's not possible to reliably construct an `InetAddress` object from a literal address in POSIX form that would point to the desired host.
>>
>> It is proposed to extend the factory methods with `Inet4Address#ofPosixLiteral()` that allows parsing literal IP(v4) addresses in "loose" syntax, compatible with `inet_addr` POSIX api. The implementation is based on `.isBsdParsableV4()` method added along with JDK-8277608 [4]. The changes in the original algorithm are as follows:
>>
>> - `IPAddressUtil#parseB...
>
> Sergey Chernyshev has updated the pull request incrementally with one additional commit since the last revision:
>
> addressed more review comments
The sentence at line 73 (of Inet4Address) isn't correct any more.
"These forms support parts specified in decimal format only."
Forms here refers to the number of components in the address, not the methods used to parse the address. The new method also supports the multiple "forms" of an address.
I think it might be best to have a new section in the class doc
"Parsing of literal addresses"
which lists the methods that parse as decimal only, and the new method which parses using the "loose" syntax. Then the existing snippet showing examples of parsing as decimal only can be shown. The syntax for loose parsing should remain in the method definition imo.
-------------
PR Review: https://git.openjdk.org/jdk/pull/18493#pullrequestreview-2008458196
More information about the net-dev
mailing list