RFR [7u6]: 7166896: DocumentBuilder.parse(String uri) is not IPv6 enabled. It throws MalformedURLException
Paul Sandoz
paul.sandoz at oracle.com
Wed Jun 27 06:51:46 UTC 2012
Hi,
On Jun 26, 2012, at 11:59 PM, Joe Wang wrote:
> Hi Paul,
>
> That method was contributed by engineers from Korea and intended to handle paths that contained international characters, so that was how it was named. It was an extra processing added. Outside of that scenario, we'd want to skip the process and get back to letting URL handle the input, whether the system id contains space or '[', and etc.
>
Your fix will fail if there is an IPv6 encoded address for the host part and there are non-ASCII characters present in, for example, the path part.
If the intent is to *never* percent encode ASCII characters you should change the following (and JavaDoc) to be consistent:
2638 // for each byte
2639 for (i = 0; i < len; i++) {
2640 b = bytes[i];
2641 // for non-ascii character: make it positive, then escape
2642 if (b < 0) {
2643 ch = b + 256;
2644 buffer.append('%');
2645 buffer.append(gHexChs[ch >> 4]);
2646 buffer.append(gHexChs[ch & 0xf]);
2647 }
2648 else if (b != '%' && b != '#' && gNeedEscaping[b]) { //<--- remove this block
2649 buffer.append('%');
2650 buffer.append(gAfterEscaping1[b]);
2651 buffer.append(gAfterEscaping2[b]);
2652 }
2653 else {
2654 buffer.append((char)b);
2655 }
2656 }
Thankfully java.net.URL is much more forgiving (a polite way of saying buggy!) than java.net.URI and accepts unencoded reserved ASCII characters as part of the URI encoded characters.
Something does not smell right here. Arguably the system ID should be a correctly encoded URI to begin with otherwise an error should result.
Paul.
> -Joe
>
> On 6/25/2012 9:13 AM, Paul Sandoz wrote:
>> Hi Joe,
>>
>> What happens if there is a space character or other characters in the string that should be encoded ?
>>
>> http://greenbytes.de/tech/webdav/rfc2396.html#rfc.section.2.4.3
>>
>> I suspect "escapeNonUSAscii" is slightly misleading, it should be really called something like "escapeCharactersInUriString".
>>
>> Note that '[' and ']' are not valid URI characters outside of an IPv6 encoded address.
>>
>> Paul.
>>
>> On Jun 23, 2012, at 1:09 AM, Joe Wang wrote:
>>
>>> Hi,
>>>
>>> This is a patch to fix the IPv6 issue.
>>>
>>> In a previous patch to fix an issue with system id containing international characters, an extra character escaping was added so that any URL passed to the parser goes through method escapeNonUSAscii before it's used to construct an URL.
>>>
>>> However, literal IPv6 addresses are enclosed in square brackets. The escapeNonUSAscii encoded address will become unrecognizable to URL that would throw a java.net.MalformedURLException. An address such as http://[fe80::la03:73ff:fead:f7b0]/note.xml is encoded as http://%5Bfe80::la03:73ff:fead:f7b0%5D/note.xml", resulting in java.net.MalformedURLException: For input string: ":la03:73ff:fead:f7b0%5D".
>>>
>>> This patch skips the encoding process and returns it as is if there're no non-ascii characters.
>>>
>>> webrev: http://cr.openjdk.java.net/~joehw/7u6/7166896/webrev/
>>>
>>> Please review.
>>>
>>> Thanks,
>>> Joe
More information about the core-libs-dev
mailing list