Strings in Switch

Wed Dec 9 09:58:14 PST 2009

Alternative (correction):
    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = count;
            if (n == anotherString.count) {
                if (hash == 0)
                    hash = -1;        // mark 1st invokation of equals()
                else if (anotherString.hash == 0)
                    anotherString.hash == -1; // mark 1st invokation "
                // on 2nd invocation now first try hash code comparision:
                else if (hash() != anotherString.hash())
                    return false;
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = offset;
                int j = anotherString.offset;
                while (n-- != 0)
                    if (v1[i++] != v2[j++])
                        return false;
                return true;

            }
        }
        return false;
    }

    public int hashCode() {
        int h = hash;
        if (h == 0 || ++h == 0) {
            int off = offset;
            char val[] = value;
            int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }


-Ulf


Am 09.12.2009 13:50, Ulf Zibis schrieb:
> +1
>
> ... but isn't 'case "Hello2":' superfluous ? I guess it's covered by 
> 'default:'
>
> Additionally String#equals(Object object) could be optimized to benefit 
> from the hash codes:
>     int equalByHashThreshold = 2;
>
>     public boolean equals(Object anObject) {
>         if (this == anObject) {
>             return true;
>         }
>         if (anObject instanceof String) {
>             String anotherString = (String)anObject;
>             int n = count;
>             if (n == anotherString.count &&
>                     (equalByHashThreshold == 0 || --equalByHashThreshold 
> == 0) &&
>                     (anotherString.equalByHashThreshold == 0 || 
> --anotherString.equalByHashThreshold == 0) &&
>                     hash() == anotherString.hash()) {
>                 char v1[] = value;
>                 char v2[] = anotherString.value;
>                 int i = offset;
>                 int j = anotherString.offset;
>                 while (n-- != 0) {
>                     if (v1[i++] != v2[j++])
>                         return false;
>                 }
>                 return true;
>             }
>         }
>         return false;
>     }
>
>     public int hashCode() {
>         int h = hash;
>         if (h == 0) {
>             int off = offset;
>             char val[] = value;
>             int len = count;
>
>             for (int i = 0; i < len; i++) {
>                 h = 31*h + val[off++];
>             }
>             hash = h;
>             equalByHashThreshold = 0;
>         }
>         return h;
>     }
>
> Alternative:
>     public boolean equals(Object anObject) {
>         if (this == anObject) {
>             return true;
>         }
>         if (anObject instanceof String) {
>             String anotherString = (String)anObject;
>             int n = count;
>             if (n == anotherString.count &&
>                     hash != 0 && anotherString.hash != 0 &&
>                     hash() == anotherString.hash()) {
>                 hash = -1;
>                 anotherString.hash = -1;
>                 char v1[] = value;
>                 char v2[] = anotherString.value;
>                 int i = offset;
>                 int j = anotherString.offset;
>                 while (n-- != 0) {
>                     if (v1[i++] != v2[j++])
>                         return false;
>                 }
>                 return true;
>             }
>         }
>         return false;
>     }
>
>     public int hashCode() {
>         int h = hash;
>         if (h == 0 || --h == 0) {
>             int off = offset;
>             char val[] = value;
>             int len = count;
>
>             for (int i = 0; i < len; i++) {
>                 h = 31*h + val[off++];
>             }
>             hash = h;
>         }
>         return h;
>     }
>
>
> -Ulf
>
>
> Am 09.12.2009 11:53, Reinier Zwitserloot schrieb:
>   
>> As I understand it, switch-in-strings is handled during the "lower" phase of
>> javac, which must desugar the string switch into legal java code.
>>
>> This makes a series of if/elseif cases actually impossible, due to switch's
>> unique behaviour in regards to fall-through.... I think. Let's try this out.
>> If we have:
>>
>> switch(someString) {
>>     case "Hello1":
>>        m1();
>>     default:
>>     case "Hello2":
>>        m2();
>>        break;
>>     case "Hello3":
>>        m3();
>> }
>>
>> how should this translate to a series of if statements, in a way that is
>> easier than the current nested double switch scenario? I don't really see a
>> way.
>>
>> There's a compromise, where the original string-to-integer conversion is
>> done with a series of ifs instead of a switch on hashCode. I don't really
>> care about removing the dependency on string's hashCode, but if this is
>> simpler, than, by all means. Until there's proof otherwise, I side with
>> Fredrik that a switch on hashCodes is not going to have a measurable
>> performance impact. As an example, the above would desugar to (with optional
>> switch on string's length during string-to-number conversion omitted. That
>> may actually be a good idea; it's straight forward and does have an obvious
>> performance benefit):
>>
>> int $unique;
>> if ("Hello1".equals(someString)) $unique = 0;
>> else if ("Hello2".equals(someString)) $unique = 1;
>> else if ("Hello3".equals(someString)) $unique = 2;
>> else $unique = 3;
>>
>> switch ($unique) {
>>     case 0:
>>        m1();
>>     case 3:
>>     case 1:
>>        m2();
>>        break;
>>     case 2:
>>        m3();
>> }
>>
>>
>> It avoids dependency on string hashcode (which, for the record, I do not
>> think needs to be avoided), and it's straightforward and simple for all
>> possible forms of string-in-switch that I can think of.
>>
>> --Reinier Zwitserloot
>>
>>
>>
>> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik Öhrström <
>> fredrik.ohrstrom at oracle.com> wrote:
>>
>>   
>>     
>>> This discussion reeks of premature optimization.... A tableswitch on
>>> arbitrary large numbers (aka hashcodes) must be compiled into a sequence
>>> of compares anyway (at least on the x86 platform). If the tableswitch
>>> happens on a sequence of relatively consecutive  numbers, then the JVM
>>> can create a jump table. But for hashcodes, no way!
>>>
>>> Therefore a sequence of compares that work with the interned string
>>> pointers will be faster. If interning is slow (and/or wastes memory)
>>> then a sequence of tailored compares that work directly on the
>>> characters will be the fastest. For example:
>>>
>>> switch (s) {
>>>  case "Hello World"  : .... break;
>>>  case "Hello Wooot" : .... break;
>>>  default: ....
>>> }
>>>
>>> Could, for example, be compiled into the pseudo-c code:
>>>
>>> if (s.length == 11) {
>>>  if (s.chars[8] == L'r' && !wcscmp(s.chars,  L"Hello World")) { ...;
>>> goto done; }
>>>  if (s.chars[8] == L'o' && !wcscmp(s.chars,  L"Hello Wooot")) { ...;
>>> goto done; }
>>> }
>>> /*default*/
>>>  ....
>>> done:
>>>
>>> Now should javac do this advanced analysis? No! Javac should only
>>> generate straight forward string compares and jumps that is a relatively
>>> easy pattern for the JVM to recognize as a string switch. Then the JVM
>>> can do the advanced optimizations if and when the code is actually
>>> determined to be a hot spot.
>>>
>>> //Fredrik
>>>
>>> Paul Benedict skrev:
>>>     
>>>       
>>>> Jon,
>>>>
>>>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
>>>> <Jonathan.Gibbons at sun.com> wrote:
>>>>
>>>>       
>>>>         
>>>>> If hell were to freeze over, and String.hashCode were to change in JDK
>>>>>         
>>>>>           
>>> n, n
>>>     
>>>       
>>>>>> =8, then javac could emit different code for Strings in switch,
>>>>>>           
>>>>>>             
>>> depending
>>>     
>>>       
>>>>> on the value of -target.
>>>>>
>>>>>         
>>>>>           
>>>> Regarding the state of hell, I don't think a compiler implementation
>>>> should ever rely on such a gamble. The implication is obvious: if JDK
>>>> N makes a change (by Oracle, by some future owner of OpenJDK -- who
>>>> knows what happens 10+ years from now), then class files using the
>>>> OpenJDK de-sugaring would break. The emitted hash results would no
>>>> longer match the runtime hashes and execution would be unpredictable.
>>>>
>>>> To safely emit hash results into byte code, I think you obviously need
>>>> to go the extra stretch and make a ruling on the algorithm never
>>>> changing. Isn't that just simply called being responsible?
>>>>
>>>> Paul
>>>>
>>>>
>>>>       
>>>>         
>>>     
>>>       
>>   
>>     
>
>
>