Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs
Chris Hegarty
chris.hegarty at oracle.com
Tue Jan 15 08:53:03 PST 2013
Looks good to me too. Thanks Kurchi.
-Chris.
On 01/15/2013 01:25 PM, Vitaly Davidovich wrote:
> Looks good. You can move ch, i, and index declarations into the for
> loop scope, but it's fine as-is.
>
> Thanks
>
> Sent from my phone
>
> On Jan 14, 2013 3:05 PM, "Kurchi Hazra" <kurchi.subhra.hazra at oracle.com
> <mailto:kurchi.subhra.hazra at oracle.com>> wrote:
>
> Thank you all for your comments. Here is an updated webrev where
> URI.hashCode() calculates the hashCode itself
> instead of relying on String.hashCode():
>
> http://cr.openjdk.java.net/~khazra/7171415/webrev.01/
>
> Thanks,
> Kurchi
>
> On 08.01.2013 20:29, Vitaly Davidovich wrote:
>>
>> Yes, I also thought about range checks not being eliminated if
>> using charAt() but I guess that depends on how smart the JIT is -
>> if charAt is inlined there's technically enough info there for the
>> compiler to see that range checks aren't needed. Whether that
>> happens or not I haven't checked. toCharArray brings us back to
>> having allocations unless, again, EA helps out. I think a
>> microbenchmark would help here (along with verbose GC logging) to
>> see which is better if this is a concern.
>>
>> Why do you say you need to duplicate String.hashCode to be
>> consistent with what people are using already? As long as the hash
>> quality is at least as good as today (or not significantly worse)
>> shouldn't you be able to change the impl? If someone's relying on
>> specific value for some input then their code is broken. Besides,
>> doing toUpper will change the hash for URIs with % anyway.
>> Perhaps I misunderstood your point though ...
>>
>> Vitaly
>>
>> Sent from my phone
>>
>> On Jan 8, 2013 11:04 PM, "Kurchi Subhra Hazra"
>> <kurchi.subhra.hazra at oracle.com
>> <mailto:kurchi.subhra.hazra at oracle.com>> wrote:
>>
>> On 1/8/13 6:55 PM, Vitaly Davidovich wrote:
>>>
>>> Also, I'm not sure how hot this method is in practice but
>>> allocating StringBuilder seems a bit heavy (maybe it gets
>>> partially escape analyzed out though). Can you instead loop
>>> over all the chars in the string and build up the hash code
>>> as you go along? If you see a % then you handle next 2 chars
>>> specially, like you do now. Or are you trying to take
>>> advantage of String intrinsic support in the JIT? I guess if
>>> perf is a concern you can write a micro benchmark comparing
>>> the approaches ...
>>>
>> That did occur to me, but I guess we have to be consistent
>> with the value that people have already been using, and that
>> means I have
>> to duplicate the code in String.hashCode() (that is what the
>> original implementation was calling) - I was trying to avoid
>> that. Also,
>> String.hashCode() uses its internal char[] - whereas charAt()
>> will involve several additional bound checks - but
>> using toCharArray() may be better. Let me take another look at
>> this, and get back with another webrev.
>>
>>> Sent from my phone
>>>
>>> On Jan 8, 2013 9:45 PM, "Vitaly Davidovich"
>>> <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>>>
>>> Hi Kurchi,
>>>
>>> In the hash method, I suggest you move handling of
>>> strings with % into a separate method to keep the hash
>>> method small for common case (no %). Otherwise, there's a
>>> chance this method won't get inlined anymore due to its
>>> (newly increased) size.
>>>
>> - Yep, will do.
>>
>>> Also, I realize toLower does the same thing, but why does
>>> toUpper return an int whereas it's really a char? Might
>>> be cleaner to declare return type as char and do the cast
>>> inside the method as needed.
>>>
>> - I followed the format of toLower(). But I agree this way it
>> will be cleaner.
>>
>> Thanks a lot,
>> Kurchi
>>
>>
>>
>>> Thanks
>>>
>>> Sent from my phone
>>>
>>> On Jan 8, 2013 8:20 PM, "Kurchi Hazra"
>>> <kurchi.subhra.hazra at oracle.com
>>> <mailto:kurchi.subhra.hazra at oracle.com>> wrote:
>>>
>>> Hi,
>>>
>>> According to RFC 3986[1], hexadecimal digits
>>> encoded by a '%' should be case-insensitive, for
>>> example,%A2 and %a2 should be
>>> considered equal. Although, URI.equals() does take
>>> this into consideration, the implementation of
>>> URI.hashCode() does not and
>>> returns different hashcodes for two URIs that are
>>> similar in all respects except for the case of the
>>> percent-encoded hexadecimal digits.
>>> This fix attempts to construct a normalized string
>>> from the string representing a component before
>>> calculating its hashCode.
>>> I converted to upper case for the normalization(and
>>> not lower case) as required by [1].
>>>
>>> For testing the fix, I added an additional test
>>> scenario to an existing test
>>> (jdk/test/java/net/URI/Test.java). While I was there,
>>> I also made
>>> minor changes to the test so that it does not produce
>>> rawtype and other lint warnings.
>>>
>>> Bug:
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7171415
>>> Webrev:
>>> http://cr.openjdk.java.net/~khazra/7171415/webrev.00/
>>> <http://cr.openjdk.java.net/%7Ekhazra/7171415/webrev.00/>
>>>
>>> URI.compareTo() still suffers from the same problem -
>>> I am not sure if it should be dealt with as a
>>> separate bug.
>>>
>>> Thanks,
>>> Kurchi
>>>
>>> [1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1
>>>
>>
>
> --
> -Kurchi
>
More information about the net-dev
mailing list