Request for comments: Bug 6306820

Tue May 29 01:57:58 PDT 2007

Richard Kennard wrote:
> Michael,
>
> >> 1) java.net.URL is discouraged... I would agree with Alan on this.
>
> Fair enough: I shall remove those methods.
>
> Can you confirm you want the naming convention changed to Url? It's 
> just that everything else in the package uses uppercase URL (for 
> legacy reasons, I'm sure). Note that the class is called 
> URLEncodedQueryString because it models a 'www-form-urlencoded' query 
> string, not because of the java.net.URL class.
>
Url does seem to fit the new conventions better. It is also more 
readable in my opinion.
> > What if a string to be parsed uses ';' as separator, but contains 
> '&' chars embedded within it,
> > which are not to be interpreted as separators?
>
> When parsing, ALL separators are recognised. So if a string contains a 
> mix of ';' and '&' both will be recognised. You do not specify the 
> separator to use at parsing time - only at toString() time.
>
So, this means that an '&' embedded in a parameter could not be 
recognised when parsing, but it would
be recognised if added through one of the add parameter methods (in the 
latter case, it would get encoded
into %xy). This sounds wrong to me. I'm not saying we shouldn't allow 
the parsing that you describe above,
but just that it should be possible somehow to do a "roundtrip" of 
constructing a query piece by piece, outputting
the string, and then parsing the string again later, back into the same 
query object.
All that's needed is an additional parse() method which specifies the 
separator char.

BTW, I meant to also suggest shortening the ParameterSeparator name to 
just Separator.
> > Should we have the possibility to specify the character set, perhaps
> > in the toString() method? In my experience, in some parts of the 
> world, particularly Asia,
> > other character sets are often used for web applications.
>
> Earlier versions of URIBuilder did this, but either Alan or yourself 
> thought it complicated matters too much. The HTML spec's 
> recommendation is UTF-8...
>
>    http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars
>
> ...note that this only applies to URIs - it is a quite separate issue 
> than what character set is used on the HTML page.
>
Yes, I suppose that is consistent with the URI spec as well. But the 
apidocs should state that UTF-8 is used
in order to avoid any doubt.

Thanks
Michael