Code review request: 6961765: Double byte characters corrupted in DN for LDAP referrals

Tue Mar 6 11:55:11 UTC 2012

Your fix looks fine.

On 03/ 6/12 08:32 AM, Weijun Wang wrote:
> Hi Vinnie
>
> This bug is about using UrlUtil.decode() to decode a URL that is not
> fully encoded, i.e. including non-ASCII characters.
>
> The webrev is at
>
> http://cr.openjdk.java.net/~weijun/6961765/webrev.00/
>
> It simply delegates the call to URLDecoder.decode().
>
> LDAP URL (RFC 4516 2.1) specifies that only <reserved>, <unreserved>,
> and <pct-encoded> chars can be used, which do not include general
> non-ASCII unicode. So precisely the user input in the bug report is
> illegal, but since it's already a valid URL/URI in Java, we can somehow
> be more friendly.
>
> In fact, the javadoc of URLDecoder [1] also only allows these
> characters, but at the same time it says --
>
> There are two possible ways in which this decoder could deal with
> illegal strings. It could either leave illegal characters alone or
> it could throw an IllegalArgumentException. Which approach the
> decoder takes is left to the implementation.
>
> Now the Oracle implementation of the class "leave illegal characters
> alone". In this sense, UrlUtil is not as good as URLDecoder. It neither
> leaves them alone nor throws an exception.
>
> To be more correct, I think we can update URLDecoder so that it leaves
> Unicode in the "other" category (non-control, non-whitespace non-ASCII
> Unicode chars, as described in URI's spec) unchanged, and throw an
> exception otherwise (that is, non-ASCII, and control or space). But I'll
> leave that to another RFE.
>
> Thanks
> Max
>
>
> -------- Original Message --------
> *Change Request ID*: 6961765
> *Synopsis*: Double byte characters corrupted in DN for LDAP referrals
>
>
> === *Description*
> ============================================================
> SYNOPSIS
> --------
> Double byte characters corrupted in DN for LDAP referrals
>
> OPERATING SYSTEM
> ----------------
> All
>
> FULL JDK VERSION
> ----------------
> All
>
> DESCRIPTION
> -----------
>
> If the DN component of an LDAP URL contains double byte characters, it
> is corrupted by com.sun.jndi.toolkit.url.UrlUtil.decode(). This
> corruption leads to application level failures.
>
> Consider the following scenario:
>
> 1. Application connects to an LDAP server and searches for the string
> uid=???,??? (where ??? are double byte characters)
>
> 2. JNDI code receives a referral, for example:
> ldap://www.test.com/uid=???,???,ou=people,ou=test,ou=test,o=test
>
> 3. The referral is then parsed to split the hostname, port number and
> the DN element of the URI via
> com.sun.jndi.ldap.LdapURL.parsePathAndQuery()
>
> 4. The DN element is decoded using
> com.sun.jndi.toolkit.url.UrlUtil.decode()
>
> 5. This method expects the characters to be ASCII. If the characters
> are non-ASCII, as in our example, then those characters are not
> converted properly.
>
> 6. This corrupted DN is then passed to the LDAP server, resulting in an
> unexpected failure.
>
> TESTCASE
> --------
> This testcase does not represent normal application code. It highlights
> the problem by calling into com.sun.* internal classes directly. This
> allows the problem to be demonstrated without setting up an LDAP server.
>
> import java.net.URI;
> import java.net.URLDecoder;
> import com.sun.jndi.ldap.LdapURL;
>
> public class LdapURLTest {
> public static void main (String args[]) throws Exception {
> String testString =
> ("ldap://www.test.com/uid=\u3070\u3073\u3076,\u3079\u307C\u307E,ou=test,ou=test,ou=test,o=test");
>
> LdapURL ldURL = new LdapURL(testString);
> System.out.println(" LDAP URL String: " + testString);
> System.out.println(" decoded DN: " + ldURL.getDN());
>
> // suggested fix demonstration
> String DN;
> String path = new URI(testString).getPath();
>
> DN = path.startsWith("/") ? path.substring(1) : path;
> String proposedDN = URLDecoder.decode(DN, "UTF8");
>
> System.out.println("\nDN from proposed fix: " + proposedDN);
> }
> }
>
> SUGGESTED FIX
> -------------
> Use java.net.URLDecoder rather than com.sun.jndi.toolkit.url.UrlUtil to
> conduct the URL decoding in parsePathAndQuery().
>
> Specifically, change the line that decodes the DN element in
> com.sun.jndi.ldap.LdapURL.parsePathAndQuery() from:
>
> DN = path.startsWith("/") ? path.substring(1) : path;
> if (DN.length() > 0) {
> --> DN = UrlUtil.decode(DN, "UTF8"); <--
> }
>
> to:
>
> DN = path.startsWith("/") ? path.substring(1) : path;
> if (DN.length() > 0) {
> --> DN = URLDecoder.decode(DN, "UTF8"); <--
> }
>
>
> === *Evaluation*
> =============================================================
> The URL in the testcase has an invalid encoding. Its Unicode characters
> must be encoded in UTF-8. For example,
>
> \u3070 -> \e3\81\b0 -> %5Ce3%5C81%5Cb0
>