Strings in Switch

Wed Dec 9 07:52:23 PST 2009

Ulf Zibis skrev:
> ... but isn't 'case "Hello2":' superfluous ? I guess it's covered by 
> 'default:'
Yes, but I think the example highlighted the intricacies of the switch 
syntax. :)
> Additionally String#equals(Object object) could be optimized to 
> benefit from the hash codes:
Maintaining up to date hashcodes for every string will essentially force 
you to access every strings twice.
There are not enough equals operations on the strings to make up for 
this precalculation cost. Especially since a compare can be so much more 
efficient than a hashcode calculation. Besides, a lot of strings are 
already interned which means that you can always start by checking identity.

//Fredrik
>    int equalByHashThreshold = 2;
>
>    public boolean equals(Object anObject) {
>        if (this == anObject) {
>            return true;
>        }
>        if (anObject instanceof String) {
>            String anotherString = (String)anObject;
>            int n = count;
>            if (n == anotherString.count &&
>                    (equalByHashThreshold == 0 || 
> --equalByHashThreshold == 0) &&
>                    (anotherString.equalByHashThreshold == 0 || 
> --anotherString.equalByHashThreshold == 0) &&
>                    hash() == anotherString.hash()) {
>                char v1[] = value;
>                char v2[] = anotherString.value;
>                int i = offset;
>                int j = anotherString.offset;
>                while (n-- != 0) {
>                    if (v1[i++] != v2[j++])
>                        return false;
>                }
>                return true;
>            }
>        }
>        return false;
>    }
>
>    public int hashCode() {
>        int h = hash;
>        if (h == 0) {
>            int off = offset;
>            char val[] = value;
>            int len = count;
>
>            for (int i = 0; i < len; i++) {
>                h = 31*h + val[off++];
>            }
>            hash = h;
>            equalByHashThreshold = 0;
>        }
>        return h;
>    }
>
> Alternative:
>    public boolean equals(Object anObject) {
>        if (this == anObject) {
>            return true;
>        }
>        if (anObject instanceof String) {
>            String anotherString = (String)anObject;
>            int n = count;
>            if (n == anotherString.count &&
>                    hash != 0 && anotherString.hash != 0 &&
>                    hash() == anotherString.hash()) {
>                hash = -1;
>                anotherString.hash = -1;
>                char v1[] = value;
>                char v2[] = anotherString.value;
>                int i = offset;
>                int j = anotherString.offset;
>                while (n-- != 0) {
>                    if (v1[i++] != v2[j++])
>                        return false;
>                }
>                return true;
>            }
>        }
>        return false;
>    }
>
>    public int hashCode() {
>        int h = hash;
>        if (h == 0 || --h == 0) {
>            int off = offset;
>            char val[] = value;
>            int len = count;
>
>            for (int i = 0; i < len; i++) {
>                h = 31*h + val[off++];
>            }
>            hash = h;
>        }
>        return h;
>    }
>
>
> -Ulf
>
>
> Am 09.12.2009 11:53, Reinier Zwitserloot schrieb:
>> As I understand it, switch-in-strings is handled during the "lower" 
>> phase of
>> javac, which must desugar the string switch into legal java code.
>>
>> This makes a series of if/elseif cases actually impossible, due to 
>> switch's
>> unique behaviour in regards to fall-through.... I think. Let's try 
>> this out.
>> If we have:
>>
>> switch(someString) {
>>     case "Hello1":
>>        m1();
>>     default:
>>     case "Hello2":
>>        m2();
>>        break;
>>     case "Hello3":
>>        m3();
>> }
>>
>> how should this translate to a series of if statements, in a way that is
>> easier than the current nested double switch scenario? I don't really 
>> see a
>> way.
>>
>> There's a compromise, where the original string-to-integer conversion is
>> done with a series of ifs instead of a switch on hashCode. I don't 
>> really
>> care about removing the dependency on string's hashCode, but if this is
>> simpler, than, by all means. Until there's proof otherwise, I side with
>> Fredrik that a switch on hashCodes is not going to have a measurable
>> performance impact. As an example, the above would desugar to (with 
>> optional
>> switch on string's length during string-to-number conversion omitted. 
>> That
>> may actually be a good idea; it's straight forward and does have an 
>> obvious
>> performance benefit):
>>
>> int $unique;
>> if ("Hello1".equals(someString)) $unique = 0;
>> else if ("Hello2".equals(someString)) $unique = 1;
>> else if ("Hello3".equals(someString)) $unique = 2;
>> else $unique = 3;
>>
>> switch ($unique) {
>>     case 0:
>>        m1();
>>     case 3:
>>     case 1:
>>        m2();
>>        break;
>>     case 2:
>>        m3();
>> }
>>
>>
>> It avoids dependency on string hashcode (which, for the record, I do not
>> think needs to be avoided), and it's straightforward and simple for all
>> possible forms of string-in-switch that I can think of.
>>
>> --Reinier Zwitserloot
>>
>>
>>
>> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik Öhrström <
>> fredrik.ohrstrom at oracle.com> wrote:
>>
>>  
>>> This discussion reeks of premature optimization.... A tableswitch on
>>> arbitrary large numbers (aka hashcodes) must be compiled into a 
>>> sequence
>>> of compares anyway (at least on the x86 platform). If the tableswitch
>>> happens on a sequence of relatively consecutive  numbers, then the JVM
>>> can create a jump table. But for hashcodes, no way!
>>>
>>> Therefore a sequence of compares that work with the interned string
>>> pointers will be faster. If interning is slow (and/or wastes memory)
>>> then a sequence of tailored compares that work directly on the
>>> characters will be the fastest. For example:
>>>
>>> switch (s) {
>>>  case "Hello World"  : .... break;
>>>  case "Hello Wooot" : .... break;
>>>  default: ....
>>> }
>>>
>>> Could, for example, be compiled into the pseudo-c code:
>>>
>>> if (s.length == 11) {
>>>  if (s.chars[8] == L'r' && !wcscmp(s.chars,  L"Hello World")) { ...;
>>> goto done; }
>>>  if (s.chars[8] == L'o' && !wcscmp(s.chars,  L"Hello Wooot")) { ...;
>>> goto done; }
>>> }
>>> /*default*/
>>>  ....
>>> done:
>>>
>>> Now should javac do this advanced analysis? No! Javac should only
>>> generate straight forward string compares and jumps that is a 
>>> relatively
>>> easy pattern for the JVM to recognize as a string switch. Then the JVM
>>> can do the advanced optimizations if and when the code is actually
>>> determined to be a hot spot.
>>>
>>> //Fredrik
>>>
>>> Paul Benedict skrev:
>>>    
>>>> Jon,
>>>>
>>>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
>>>> <Jonathan.Gibbons at sun.com> wrote:
>>>>
>>>>      
>>>>> If hell were to freeze over, and String.hashCode were to change in 
>>>>> JDK
>>>>>         
>>> n, n
>>>    
>>>>>> =8, then javac could emit different code for Strings in switch,
>>>>>>           
>>> depending
>>>    
>>>>> on the value of -target.
>>>>>
>>>>>         
>>>> Regarding the state of hell, I don't think a compiler implementation
>>>> should ever rely on such a gamble. The implication is obvious: if JDK
>>>> N makes a change (by Oracle, by some future owner of OpenJDK -- who
>>>> knows what happens 10+ years from now), then class files using the
>>>> OpenJDK de-sugaring would break. The emitted hash results would no
>>>> longer match the runtime hashes and execution would be unpredictable.
>>>>
>>>> To safely emit hash results into byte code, I think you obviously need
>>>> to go the extra stretch and make a ruling on the algorithm never
>>>> changing. Isn't that just simply called being responsible?
>>>>
>>>> Paul
>>>>
>>>>
>>>>       
>>>
>>>     
>>
>>   
>