Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs

Tue Jan 8 20:04:01 PST 2013

On 1/8/13 6:55 PM, Vitaly Davidovich wrote:
>
> Also, I'm not sure how hot this method is in practice but allocating 
> StringBuilder seems a bit heavy (maybe it gets partially escape 
> analyzed out though).  Can you instead loop over all the chars in the 
> string and build up the hash code as you go along? If you see a % then 
> you handle next 2 chars specially, like you do now.  Or are you trying 
> to take advantage of String intrinsic support in the JIT? I guess if 
> perf is a concern you can write a micro benchmark comparing the 
> approaches ...
>
That did occur to me, but I guess we have to be consistent with the 
value that people have already been using, and that means I have
to duplicate the code in String.hashCode() (that is what the original 
implementation was calling) - I was trying to avoid that. Also,
String.hashCode() uses its internal char[] - whereas charAt() will 
involve several additional bound checks - but
using toCharArray() may be better. Let me take another look at this, and 
get back with another webrev.

> Sent from my phone
>
> On Jan 8, 2013 9:45 PM, "Vitaly Davidovich" <vitalyd at gmail.com 
> <mailto:vitalyd at gmail.com>> wrote:
>
>     Hi Kurchi,
>
>     In the hash method, I suggest you move handling of strings with %
>     into a separate method to keep the hash method small for common
>     case (no %).  Otherwise, there's a chance this method won't get
>     inlined anymore due to its (newly increased) size.
>
- Yep, will do.

>     Also, I realize toLower does the same thing, but why does toUpper
>     return an int whereas it's really a char? Might be cleaner to
>     declare return type as char and do the cast inside the method as
>     needed.
>
- I followed the format of toLower(). But I agree this way it will be 
cleaner.

Thanks a lot,
Kurchi


>     Thanks
>
>     Sent from my phone
>
>     On Jan 8, 2013 8:20 PM, "Kurchi Hazra"
>     <kurchi.subhra.hazra at oracle.com
>     <mailto:kurchi.subhra.hazra at oracle.com>> wrote:
>
>         Hi,
>
>             According to RFC 3986[1], hexadecimal digits encoded by a
>         '%' should be case-insensitive, for example,%A2 and %a2 should be
>         considered equal. Although, URI.equals() does take this into
>         consideration, the implementation of URI.hashCode() does not and
>         returns different hashcodes for two URIs that are similar in
>         all respects except for the case of the percent-encoded
>         hexadecimal digits.
>         This fix attempts to construct a normalized string from the
>         string representing a component before calculating its hashCode.
>         I converted to upper case for the normalization(and not lower
>         case) as required by [1].
>
>             For testing the fix, I added an additional test scenario
>         to an existing test (jdk/test/java/net/URI/Test.java). While I
>         was there, I also made
>         minor changes to the test so that it does not produce rawtype
>         and other lint warnings.
>
>         Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7171415
>         Webrev: http://cr.openjdk.java.net/~khazra/7171415/webrev.00/
>         <http://cr.openjdk.java.net/%7Ekhazra/7171415/webrev.00/>
>
>         URI.compareTo() still suffers from the same problem - I am not
>         sure if it should be dealt with as a separate bug.
>
>         Thanks,
>         Kurchi
>
>         [1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/net-dev/attachments/20130108/3de15e09/attachment.html