Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs

Tue Jan 15 08:53:03 PST 2013

Looks good to me too. Thanks Kurchi.

-Chris.

On 01/15/2013 01:25 PM, Vitaly Davidovich wrote:
> Looks good.  You can move ch, i, and index declarations into the for
> loop scope, but it's fine as-is.
>
> Thanks
>
> Sent from my phone
>
> On Jan 14, 2013 3:05 PM, "Kurchi Hazra" <kurchi.subhra.hazra at oracle.com
> <mailto:kurchi.subhra.hazra at oracle.com>> wrote:
>
>     Thank you all for your comments. Here is an updated webrev where
>     URI.hashCode() calculates the hashCode itself
>     instead of relying on String.hashCode():
>
>     http://cr.openjdk.java.net/~khazra/7171415/webrev.01/
>
>     Thanks,
>     Kurchi
>
>     On 08.01.2013 20:29, Vitaly Davidovich wrote:
>>
>>     Yes, I also thought about range checks not being eliminated if
>>     using charAt() but I guess that depends on how smart the JIT is -
>>     if charAt is inlined there's technically enough info there for the
>>     compiler to see that range checks aren't needed.  Whether that
>>     happens or not I haven't checked. toCharArray brings us back to
>>     having allocations unless, again, EA helps out.  I think a
>>     microbenchmark would help here (along with verbose GC logging) to
>>     see which is better if this is a concern.
>>
>>     Why do you say you need to duplicate String.hashCode to be
>>     consistent with what people are using already? As long as the hash
>>     quality is at least as good as today (or not significantly worse)
>>     shouldn't you be able to change the impl? If someone's relying on
>>     specific value for some input then their code is broken.  Besides,
>>     doing toUpper will change the hash for URIs with % anyway.
>>     Perhaps I misunderstood your point though ...
>>
>>     Vitaly
>>
>>     Sent from my phone
>>
>>     On Jan 8, 2013 11:04 PM, "Kurchi Subhra Hazra"
>>     <kurchi.subhra.hazra at oracle.com
>>     <mailto:kurchi.subhra.hazra at oracle.com>> wrote:
>>
>>         On 1/8/13 6:55 PM, Vitaly Davidovich wrote:
>>>
>>>         Also, I'm not sure how hot this method is in practice but
>>>         allocating StringBuilder seems a bit heavy (maybe it gets
>>>         partially escape analyzed out though). Can you instead loop
>>>         over all the chars in the string and build up the hash code
>>>         as you go along? If you see a % then you handle next 2 chars
>>>         specially, like you do now.  Or are you trying to take
>>>         advantage of String intrinsic support in the JIT? I guess if
>>>         perf is a concern you can write a micro benchmark comparing
>>>         the approaches ...
>>>
>>         That did occur to me, but I guess we have to be consistent
>>         with the value that people have already been using, and that
>>         means I have
>>         to duplicate the code in String.hashCode() (that is what the
>>         original implementation was calling) - I was trying to avoid
>>         that. Also,
>>         String.hashCode() uses its internal char[] - whereas charAt()
>>         will involve several additional bound checks - but
>>         using toCharArray() may be better. Let me take another look at
>>         this, and get back with another webrev.
>>
>>>         Sent from my phone
>>>
>>>         On Jan 8, 2013 9:45 PM, "Vitaly Davidovich"
>>>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>>>
>>>             Hi Kurchi,
>>>
>>>             In the hash method, I suggest you move handling of
>>>             strings with % into a separate method to keep the hash
>>>             method small for common case (no %). Otherwise, there's a
>>>             chance this method won't get inlined anymore due to its
>>>             (newly increased) size.
>>>
>>         - Yep, will do.
>>
>>>             Also, I realize toLower does the same thing, but why does
>>>             toUpper return an int whereas it's really a char? Might
>>>             be cleaner to declare return type as char and do the cast
>>>             inside the method as needed.
>>>
>>         - I followed the format of toLower(). But I agree this way it
>>         will be cleaner.
>>
>>         Thanks a lot,
>>         Kurchi
>>
>>
>>
>>>             Thanks
>>>
>>>             Sent from my phone
>>>
>>>             On Jan 8, 2013 8:20 PM, "Kurchi Hazra"
>>>             <kurchi.subhra.hazra at oracle.com
>>>             <mailto:kurchi.subhra.hazra at oracle.com>> wrote:
>>>
>>>                 Hi,
>>>
>>>                     According to RFC 3986[1], hexadecimal digits
>>>                 encoded by a '%' should be case-insensitive, for
>>>                 example,%A2 and %a2 should be
>>>                 considered equal. Although, URI.equals() does take
>>>                 this into consideration, the implementation of
>>>                 URI.hashCode() does not and
>>>                 returns different hashcodes for two URIs that are
>>>                 similar in all respects except for the case of the
>>>                 percent-encoded hexadecimal digits.
>>>                 This fix attempts to construct a normalized string
>>>                 from the string representing a component before
>>>                 calculating its hashCode.
>>>                 I converted to upper case for the normalization(and
>>>                 not lower case) as required by [1].
>>>
>>>                     For testing the fix, I added an additional test
>>>                 scenario to an existing test
>>>                 (jdk/test/java/net/URI/Test.java). While I was there,
>>>                 I also made
>>>                 minor changes to the test so that it does not produce
>>>                 rawtype and other lint warnings.
>>>
>>>                 Bug:
>>>                 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7171415
>>>                 Webrev:
>>>                 http://cr.openjdk.java.net/~khazra/7171415/webrev.00/
>>>                 <http://cr.openjdk.java.net/%7Ekhazra/7171415/webrev.00/>
>>>
>>>                 URI.compareTo() still suffers from the same problem -
>>>                 I am not sure if it should be dealt with as a
>>>                 separate bug.
>>>
>>>                 Thanks,
>>>                 Kurchi
>>>
>>>                 [1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1
>>>
>>
>
>     --
>     -Kurchi
>