Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs
Kurchi Subhra Hazra
kurchi.subhra.hazra at oracle.com
Tue Jan 8 20:04:01 PST 2013
On 1/8/13 6:55 PM, Vitaly Davidovich wrote:
>
> Also, I'm not sure how hot this method is in practice but allocating
> StringBuilder seems a bit heavy (maybe it gets partially escape
> analyzed out though). Can you instead loop over all the chars in the
> string and build up the hash code as you go along? If you see a % then
> you handle next 2 chars specially, like you do now. Or are you trying
> to take advantage of String intrinsic support in the JIT? I guess if
> perf is a concern you can write a micro benchmark comparing the
> approaches ...
>
That did occur to me, but I guess we have to be consistent with the
value that people have already been using, and that means I have
to duplicate the code in String.hashCode() (that is what the original
implementation was calling) - I was trying to avoid that. Also,
String.hashCode() uses its internal char[] - whereas charAt() will
involve several additional bound checks - but
using toCharArray() may be better. Let me take another look at this, and
get back with another webrev.
> Sent from my phone
>
> On Jan 8, 2013 9:45 PM, "Vitaly Davidovich" <vitalyd at gmail.com
> <mailto:vitalyd at gmail.com>> wrote:
>
> Hi Kurchi,
>
> In the hash method, I suggest you move handling of strings with %
> into a separate method to keep the hash method small for common
> case (no %). Otherwise, there's a chance this method won't get
> inlined anymore due to its (newly increased) size.
>
- Yep, will do.
> Also, I realize toLower does the same thing, but why does toUpper
> return an int whereas it's really a char? Might be cleaner to
> declare return type as char and do the cast inside the method as
> needed.
>
- I followed the format of toLower(). But I agree this way it will be
cleaner.
Thanks a lot,
Kurchi
> Thanks
>
> Sent from my phone
>
> On Jan 8, 2013 8:20 PM, "Kurchi Hazra"
> <kurchi.subhra.hazra at oracle.com
> <mailto:kurchi.subhra.hazra at oracle.com>> wrote:
>
> Hi,
>
> According to RFC 3986[1], hexadecimal digits encoded by a
> '%' should be case-insensitive, for example,%A2 and %a2 should be
> considered equal. Although, URI.equals() does take this into
> consideration, the implementation of URI.hashCode() does not and
> returns different hashcodes for two URIs that are similar in
> all respects except for the case of the percent-encoded
> hexadecimal digits.
> This fix attempts to construct a normalized string from the
> string representing a component before calculating its hashCode.
> I converted to upper case for the normalization(and not lower
> case) as required by [1].
>
> For testing the fix, I added an additional test scenario
> to an existing test (jdk/test/java/net/URI/Test.java). While I
> was there, I also made
> minor changes to the test so that it does not produce rawtype
> and other lint warnings.
>
> Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7171415
> Webrev: http://cr.openjdk.java.net/~khazra/7171415/webrev.00/
> <http://cr.openjdk.java.net/%7Ekhazra/7171415/webrev.00/>
>
> URI.compareTo() still suffers from the same problem - I am not
> sure if it should be dealt with as a separate bug.
>
> Thanks,
> Kurchi
>
> [1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/net-dev/attachments/20130108/3de15e09/attachment.html
More information about the net-dev
mailing list