Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs
Kurchi Hazra
kurchi.subhra.hazra at oracle.com
Mon Jan 14 12:04:59 PST 2013
Thank you all for your comments. Here is an updated webrev where
URI.hashCode() calculates the hashCode itself
instead of relying on String.hashCode():
http://cr.openjdk.java.net/~khazra/7171415/webrev.01/
Thanks,
Kurchi
On 08.01.2013 20:29, Vitaly Davidovich wrote:
>
> Yes, I also thought about range checks not being eliminated if using
> charAt() but I guess that depends on how smart the JIT is - if charAt
> is inlined there's technically enough info there for the compiler to
> see that range checks aren't needed. Whether that happens or not I
> haven't checked. toCharArray brings us back to having allocations
> unless, again, EA helps out. I think a microbenchmark would help here
> (along with verbose GC logging) to see which is better if this is a
> concern.
>
> Why do you say you need to duplicate String.hashCode to be consistent
> with what people are using already? As long as the hash quality is at
> least as good as today (or not significantly worse) shouldn't you be
> able to change the impl? If someone's relying on specific value for
> some input then their code is broken. Besides, doing toUpper will
> change the hash for URIs with % anyway. Perhaps I misunderstood your
> point though ...
>
> Vitaly
>
> Sent from my phone
>
> On Jan 8, 2013 11:04 PM, "Kurchi Subhra Hazra"
> <kurchi.subhra.hazra at oracle.com
> <mailto:kurchi.subhra.hazra at oracle.com>> wrote:
>
> On 1/8/13 6:55 PM, Vitaly Davidovich wrote:
>>
>> Also, I'm not sure how hot this method is in practice but
>> allocating StringBuilder seems a bit heavy (maybe it gets
>> partially escape analyzed out though). Can you instead loop over
>> all the chars in the string and build up the hash code as you go
>> along? If you see a % then you handle next 2 chars specially,
>> like you do now. Or are you trying to take advantage of String
>> intrinsic support in the JIT? I guess if perf is a concern you
>> can write a micro benchmark comparing the approaches ...
>>
> That did occur to me, but I guess we have to be consistent with
> the value that people have already been using, and that means I have
> to duplicate the code in String.hashCode() (that is what the
> original implementation was calling) - I was trying to avoid that.
> Also,
> String.hashCode() uses its internal char[] - whereas charAt() will
> involve several additional bound checks - but
> using toCharArray() may be better. Let me take another look at
> this, and get back with another webrev.
>
>> Sent from my phone
>>
>> On Jan 8, 2013 9:45 PM, "Vitaly Davidovich" <vitalyd at gmail.com
>> <mailto:vitalyd at gmail.com>> wrote:
>>
>> Hi Kurchi,
>>
>> In the hash method, I suggest you move handling of strings
>> with % into a separate method to keep the hash method small
>> for common case (no %). Otherwise, there's a chance this
>> method won't get inlined anymore due to its (newly increased)
>> size.
>>
> - Yep, will do.
>
>> Also, I realize toLower does the same thing, but why does
>> toUpper return an int whereas it's really a char? Might be
>> cleaner to declare return type as char and do the cast inside
>> the method as needed.
>>
> - I followed the format of toLower(). But I agree this way it will
> be cleaner.
>
> Thanks a lot,
> Kurchi
>
>
>
>> Thanks
>>
>> Sent from my phone
>>
>> On Jan 8, 2013 8:20 PM, "Kurchi Hazra"
>> <kurchi.subhra.hazra at oracle.com
>> <mailto:kurchi.subhra.hazra at oracle.com>> wrote:
>>
>> Hi,
>>
>> According to RFC 3986[1], hexadecimal digits encoded
>> by a '%' should be case-insensitive, for example,%A2 and
>> %a2 should be
>> considered equal. Although, URI.equals() does take this
>> into consideration, the implementation of URI.hashCode()
>> does not and
>> returns different hashcodes for two URIs that are similar
>> in all respects except for the case of the
>> percent-encoded hexadecimal digits.
>> This fix attempts to construct a normalized string from
>> the string representing a component before calculating
>> its hashCode.
>> I converted to upper case for the normalization(and not
>> lower case) as required by [1].
>>
>> For testing the fix, I added an additional test
>> scenario to an existing test
>> (jdk/test/java/net/URI/Test.java). While I was there, I
>> also made
>> minor changes to the test so that it does not produce
>> rawtype and other lint warnings.
>>
>> Bug:
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7171415
>> Webrev:
>> http://cr.openjdk.java.net/~khazra/7171415/webrev.00/
>> <http://cr.openjdk.java.net/%7Ekhazra/7171415/webrev.00/>
>>
>> URI.compareTo() still suffers from the same problem - I
>> am not sure if it should be dealt with as a separate bug.
>>
>> Thanks,
>> Kurchi
>>
>> [1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1
>>
>
--
-Kurchi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/net-dev/attachments/20130114/cbdb4021/attachment.html
More information about the net-dev
mailing list