Strings in Switch

Wed Dec 9 04:50:03 PST 2009

+1

... but isn't 'case "Hello2":' superfluous ? I guess it's covered by 
'default:'

Additionally String#equals(Object object) could be optimized to benefit 
from the hash codes:
    int equalByHashThreshold = 2;

    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = count;
            if (n == anotherString.count &&
                    (equalByHashThreshold == 0 || --equalByHashThreshold 
== 0) &&
                    (anotherString.equalByHashThreshold == 0 || 
--anotherString.equalByHashThreshold == 0) &&
                    hash() == anotherString.hash()) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = offset;
                int j = anotherString.offset;
                while (n-- != 0) {
                    if (v1[i++] != v2[j++])
                        return false;
                }
                return true;
            }
        }
        return false;
    }

    public int hashCode() {
        int h = hash;
        if (h == 0) {
            int off = offset;
            char val[] = value;
            int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
            equalByHashThreshold = 0;
        }
        return h;
    }

Alternative:
    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = count;
            if (n == anotherString.count &&
                    hash != 0 && anotherString.hash != 0 &&
                    hash() == anotherString.hash()) {
                hash = -1;
                anotherString.hash = -1;
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = offset;
                int j = anotherString.offset;
                while (n-- != 0) {
                    if (v1[i++] != v2[j++])
                        return false;
                }
                return true;
            }
        }
        return false;
    }

    public int hashCode() {
        int h = hash;
        if (h == 0 || --h == 0) {
            int off = offset;
            char val[] = value;
            int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }

-Ulf

Am 09.12.2009 11:53, Reinier Zwitserloot schrieb:
> As I understand it, switch-in-strings is handled during the "lower" phase of
> javac, which must desugar the string switch into legal java code.
>
> This makes a series of if/elseif cases actually impossible, due to switch's
> unique behaviour in regards to fall-through.... I think. Let's try this out.
> If we have:
>
> switch(someString) {
>     case "Hello1":
>        m1();
>     default:
>     case "Hello2":
>        m2();
>        break;
>     case "Hello3":
>        m3();
> }
>
> how should this translate to a series of if statements, in a way that is
> easier than the current nested double switch scenario? I don't really see a
> way.
>
> There's a compromise, where the original string-to-integer conversion is
> done with a series of ifs instead of a switch on hashCode. I don't really
> care about removing the dependency on string's hashCode, but if this is
> simpler, than, by all means. Until there's proof otherwise, I side with
> Fredrik that a switch on hashCodes is not going to have a measurable
> performance impact. As an example, the above would desugar to (with optional
> switch on string's length during string-to-number conversion omitted. That
> may actually be a good idea; it's straight forward and does have an obvious
> performance benefit):
>
> int $unique;
> if ("Hello1".equals(someString)) $unique = 0;
> else if ("Hello2".equals(someString)) $unique = 1;
> else if ("Hello3".equals(someString)) $unique = 2;
> else $unique = 3;
>
> switch ($unique) {
>     case 0:
>        m1();
>     case 3:
>     case 1:
>        m2();
>        break;
>     case 2:
>        m3();
> }
>
>
> It avoids dependency on string hashcode (which, for the record, I do not
> think needs to be avoided), and it's straightforward and simple for all
> possible forms of string-in-switch that I can think of.
>
> --Reinier Zwitserloot
>
>
>
> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik Öhrström <
> fredrik.ohrstrom at oracle.com> wrote:
>
>   
>> This discussion reeks of premature optimization.... A tableswitch on
>> arbitrary large numbers (aka hashcodes) must be compiled into a sequence
>> of compares anyway (at least on the x86 platform). If the tableswitch
>> happens on a sequence of relatively consecutive  numbers, then the JVM
>> can create a jump table. But for hashcodes, no way!
>>
>> Therefore a sequence of compares that work with the interned string
>> pointers will be faster. If interning is slow (and/or wastes memory)
>> then a sequence of tailored compares that work directly on the
>> characters will be the fastest. For example:
>>
>> switch (s) {
>>  case "Hello World"  : .... break;
>>  case "Hello Wooot" : .... break;
>>  default: ....
>> }
>>
>> Could, for example, be compiled into the pseudo-c code:
>>
>> if (s.length == 11) {
>>  if (s.chars[8] == L'r' && !wcscmp(s.chars,  L"Hello World")) { ...;
>> goto done; }
>>  if (s.chars[8] == L'o' && !wcscmp(s.chars,  L"Hello Wooot")) { ...;
>> goto done; }
>> }
>> /*default*/
>>  ....
>> done:
>>
>> Now should javac do this advanced analysis? No! Javac should only
>> generate straight forward string compares and jumps that is a relatively
>> easy pattern for the JVM to recognize as a string switch. Then the JVM
>> can do the advanced optimizations if and when the code is actually
>> determined to be a hot spot.
>>
>> //Fredrik
>>
>> Paul Benedict skrev:
>>     
>>> Jon,
>>>
>>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
>>> <Jonathan.Gibbons at sun.com> wrote:
>>>
>>>       
>>>> If hell were to freeze over, and String.hashCode were to change in JDK
>>>>         
>> n, n
>>     
>>>>> =8, then javac could emit different code for Strings in switch,
>>>>>           
>> depending
>>     
>>>> on the value of -target.
>>>>
>>>>         
>>> Regarding the state of hell, I don't think a compiler implementation
>>> should ever rely on such a gamble. The implication is obvious: if JDK
>>> N makes a change (by Oracle, by some future owner of OpenJDK -- who
>>> knows what happens 10+ years from now), then class files using the
>>> OpenJDK de-sugaring would break. The emitted hash results would no
>>> longer match the runtime hashes and execution would be unpredictable.
>>>
>>> To safely emit hash results into byte code, I think you obviously need
>>> to go the extra stretch and make a ruling on the algorithm never
>>> changing. Isn't that just simply called being responsible?
>>>
>>> Paul
>>>
>>>
>>>       
>>
>>     
>
>