Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs

Mon Jan 14 12:04:59 PST 2013

Thank you all for your comments. Here is an updated webrev where 
URI.hashCode() calculates the hashCode itself
instead of relying on String.hashCode():

http://cr.openjdk.java.net/~khazra/7171415/webrev.01/

Thanks,
Kurchi

On 08.01.2013 20:29, Vitaly Davidovich wrote:
>
> Yes, I also thought about range checks not being eliminated if using 
> charAt() but I guess that depends on how smart the JIT is - if charAt 
> is inlined there's technically enough info there for the compiler to 
> see that range checks aren't needed.  Whether that happens or not I 
> haven't checked.  toCharArray brings us back to having allocations 
> unless, again, EA helps out.  I think a microbenchmark would help here 
> (along with verbose GC logging) to see which is better if this is a 
> concern.
>
> Why do you say you need to duplicate String.hashCode to be consistent 
> with what people are using already? As long as the hash quality is at 
> least as good as today (or not significantly worse) shouldn't you be 
> able to change the impl? If someone's relying on specific value for 
> some input then their code is broken.  Besides, doing toUpper will 
> change the hash for URIs with % anyway.  Perhaps I misunderstood your 
> point though ...
>
> Vitaly
>
> Sent from my phone
>
> On Jan 8, 2013 11:04 PM, "Kurchi Subhra Hazra" 
> <kurchi.subhra.hazra at oracle.com 
> <mailto:kurchi.subhra.hazra at oracle.com>> wrote:
>
>     On 1/8/13 6:55 PM, Vitaly Davidovich wrote:
>>
>>     Also, I'm not sure how hot this method is in practice but
>>     allocating StringBuilder seems a bit heavy (maybe it gets
>>     partially escape analyzed out though).  Can you instead loop over
>>     all the chars in the string and build up the hash code as you go
>>     along? If you see a % then you handle next 2 chars specially,
>>     like you do now.  Or are you trying to take advantage of String
>>     intrinsic support in the JIT? I guess if perf is a concern you
>>     can write a micro benchmark comparing the approaches ...
>>
>     That did occur to me, but I guess we have to be consistent with
>     the value that people have already been using, and that means I have
>     to duplicate the code in String.hashCode() (that is what the
>     original implementation was calling) - I was trying to avoid that.
>     Also,
>     String.hashCode() uses its internal char[] - whereas charAt() will
>     involve several additional bound checks - but
>     using toCharArray() may be better. Let me take another look at
>     this, and get back with another webrev.
>
>>     Sent from my phone
>>
>>     On Jan 8, 2013 9:45 PM, "Vitaly Davidovich" <vitalyd at gmail.com
>>     <mailto:vitalyd at gmail.com>> wrote:
>>
>>         Hi Kurchi,
>>
>>         In the hash method, I suggest you move handling of strings
>>         with % into a separate method to keep the hash method small
>>         for common case (no %).  Otherwise, there's a chance this
>>         method won't get inlined anymore due to its (newly increased)
>>         size.
>>
>     - Yep, will do.
>
>>         Also, I realize toLower does the same thing, but why does
>>         toUpper return an int whereas it's really a char? Might be
>>         cleaner to declare return type as char and do the cast inside
>>         the method as needed.
>>
>     - I followed the format of toLower(). But I agree this way it will
>     be cleaner.
>
>     Thanks a lot,
>     Kurchi
>
>
>
>>         Thanks
>>
>>         Sent from my phone
>>
>>         On Jan 8, 2013 8:20 PM, "Kurchi Hazra"
>>         <kurchi.subhra.hazra at oracle.com
>>         <mailto:kurchi.subhra.hazra at oracle.com>> wrote:
>>
>>             Hi,
>>
>>                 According to RFC 3986[1], hexadecimal digits encoded
>>             by a '%' should be case-insensitive, for example,%A2 and
>>             %a2 should be
>>             considered equal. Although, URI.equals() does take this
>>             into consideration, the implementation of URI.hashCode()
>>             does not and
>>             returns different hashcodes for two URIs that are similar
>>             in all respects except for the case of the
>>             percent-encoded hexadecimal digits.
>>             This fix attempts to construct a normalized string from
>>             the string representing a component before calculating
>>             its hashCode.
>>             I converted to upper case for the normalization(and not
>>             lower case) as required by [1].
>>
>>                 For testing the fix, I added an additional test
>>             scenario to an existing test
>>             (jdk/test/java/net/URI/Test.java). While I was there, I
>>             also made
>>             minor changes to the test so that it does not produce
>>             rawtype and other lint warnings.
>>
>>             Bug:
>>             http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7171415
>>             Webrev:
>>             http://cr.openjdk.java.net/~khazra/7171415/webrev.00/
>>             <http://cr.openjdk.java.net/%7Ekhazra/7171415/webrev.00/>
>>
>>             URI.compareTo() still suffers from the same problem - I
>>             am not sure if it should be dealt with as a separate bug.
>>
>>             Thanks,
>>             Kurchi
>>
>>             [1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1
>>
>

-- 
-Kurchi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/net-dev/attachments/20130114/cbdb4021/attachment.html