Strings in Switch

Wed Dec 9 09:03:25 PST 2009

Le 09/12/2009 16:52, Fredrik Öhrström a écrit :
> Ulf Zibis skrev:
>    
>> ... but isn't 'case "Hello2":' superfluous ? I guess it's covered by
>> 'default:'
>>      
> Yes, but I think the example highlighted the intricacies of the switch
> syntax. :)
>    
>> Additionally String#equals(Object object) could be optimized to
>> benefit from the hash codes:
>>      
> Maintaining up to date hashcodes for every string will essentially force
> you to access every strings twice.
> There are not enough equals operations on the strings to make up for
> this precalculation cost. Especially since a compare can be so much more
> efficient than a hashcode calculation. Besides, a lot of strings are
> already interned which means that you can always start by checking identity.
>    

Hi Fredrik,
Don't forget that String.hashCode() are only calculated once
(in openjdk implementation), String is an non-mutable object.

Rémi

> //Fredrik
>    

>>     int equalByHashThreshold = 2;
>>
>>     public boolean equals(Object anObject) {
>>         if (this == anObject) {
>>             return true;
>>         }
>>         if (anObject instanceof String) {
>>             String anotherString = (String)anObject;
>>             int n = count;
>>             if (n == anotherString.count&&
>>                     (equalByHashThreshold == 0 ||
>> --equalByHashThreshold == 0)&&
>>                     (anotherString.equalByHashThreshold == 0 ||
>> --anotherString.equalByHashThreshold == 0)&&
>>                     hash() == anotherString.hash()) {
>>                 char v1[] = value;
>>                 char v2[] = anotherString.value;
>>                 int i = offset;
>>                 int j = anotherString.offset;
>>                 while (n-- != 0) {
>>                     if (v1[i++] != v2[j++])
>>                         return false;
>>                 }
>>                 return true;
>>             }
>>         }
>>         return false;
>>     }
>>
>>     public int hashCode() {
>>         int h = hash;
>>         if (h == 0) {
>>             int off = offset;
>>             char val[] = value;
>>             int len = count;
>>
>>             for (int i = 0; i<  len; i++) {
>>                 h = 31*h + val[off++];
>>             }
>>             hash = h;
>>             equalByHashThreshold = 0;
>>         }
>>         return h;
>>     }
>>
>> Alternative:
>>     public boolean equals(Object anObject) {
>>         if (this == anObject) {
>>             return true;
>>         }
>>         if (anObject instanceof String) {
>>             String anotherString = (String)anObject;
>>             int n = count;
>>             if (n == anotherString.count&&
>>                     hash != 0&&  anotherString.hash != 0&&
>>                     hash() == anotherString.hash()) {
>>                 hash = -1;
>>                 anotherString.hash = -1;
>>                 char v1[] = value;
>>                 char v2[] = anotherString.value;
>>                 int i = offset;
>>                 int j = anotherString.offset;
>>                 while (n-- != 0) {
>>                     if (v1[i++] != v2[j++])
>>                         return false;
>>                 }
>>                 return true;
>>             }
>>         }
>>         return false;
>>     }
>>
>>     public int hashCode() {
>>         int h = hash;
>>         if (h == 0 || --h == 0) {
>>             int off = offset;
>>             char val[] = value;
>>             int len = count;
>>
>>             for (int i = 0; i<  len; i++) {
>>                 h = 31*h + val[off++];
>>             }
>>             hash = h;
>>         }
>>         return h;
>>     }
>>
>>
>> -Ulf
>>
>>
>> Am 09.12.2009 11:53, Reinier Zwitserloot schrieb:
>>      
>>> As I understand it, switch-in-strings is handled during the "lower"
>>> phase of
>>> javac, which must desugar the string switch into legal java code.
>>>
>>> This makes a series of if/elseif cases actually impossible, due to
>>> switch's
>>> unique behaviour in regards to fall-through.... I think. Let's try
>>> this out.
>>> If we have:
>>>
>>> switch(someString) {
>>>      case "Hello1":
>>>         m1();
>>>      default:
>>>      case "Hello2":
>>>         m2();
>>>         break;
>>>      case "Hello3":
>>>         m3();
>>> }
>>>
>>> how should this translate to a series of if statements, in a way that is
>>> easier than the current nested double switch scenario? I don't really
>>> see a
>>> way.
>>>
>>> There's a compromise, where the original string-to-integer conversion is
>>> done with a series of ifs instead of a switch on hashCode. I don't
>>> really
>>> care about removing the dependency on string's hashCode, but if this is
>>> simpler, than, by all means. Until there's proof otherwise, I side with
>>> Fredrik that a switch on hashCodes is not going to have a measurable
>>> performance impact. As an example, the above would desugar to (with
>>> optional
>>> switch on string's length during string-to-number conversion omitted.
>>> That
>>> may actually be a good idea; it's straight forward and does have an
>>> obvious
>>> performance benefit):
>>>
>>> int $unique;
>>> if ("Hello1".equals(someString)) $unique = 0;
>>> else if ("Hello2".equals(someString)) $unique = 1;
>>> else if ("Hello3".equals(someString)) $unique = 2;
>>> else $unique = 3;
>>>
>>> switch ($unique) {
>>>      case 0:
>>>         m1();
>>>      case 3:
>>>      case 1:
>>>         m2();
>>>         break;
>>>      case 2:
>>>         m3();
>>> }
>>>
>>>
>>> It avoids dependency on string hashcode (which, for the record, I do not
>>> think needs to be avoided), and it's straightforward and simple for all
>>> possible forms of string-in-switch that I can think of.
>>>
>>> --Reinier Zwitserloot
>>>
>>>
>>>
>>> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik Öhrström<
>>> fredrik.ohrstrom at oracle.com>  wrote:
>>>
>>>
>>>        
>>>> This discussion reeks of premature optimization.... A tableswitch on
>>>> arbitrary large numbers (aka hashcodes) must be compiled into a
>>>> sequence
>>>> of compares anyway (at least on the x86 platform). If the tableswitch
>>>> happens on a sequence of relatively consecutive  numbers, then the JVM
>>>> can create a jump table. But for hashcodes, no way!
>>>>
>>>> Therefore a sequence of compares that work with the interned string
>>>> pointers will be faster. If interning is slow (and/or wastes memory)
>>>> then a sequence of tailored compares that work directly on the
>>>> characters will be the fastest. For example:
>>>>
>>>> switch (s) {
>>>>   case "Hello World"  : .... break;
>>>>   case "Hello Wooot" : .... break;
>>>>   default: ....
>>>> }
>>>>
>>>> Could, for example, be compiled into the pseudo-c code:
>>>>
>>>> if (s.length == 11) {
>>>>   if (s.chars[8] == L'r'&&  !wcscmp(s.chars,  L"Hello World")) { ...;
>>>> goto done; }
>>>>   if (s.chars[8] == L'o'&&  !wcscmp(s.chars,  L"Hello Wooot")) { ...;
>>>> goto done; }
>>>> }
>>>> /*default*/
>>>>   ....
>>>> done:
>>>>
>>>> Now should javac do this advanced analysis? No! Javac should only
>>>> generate straight forward string compares and jumps that is a
>>>> relatively
>>>> easy pattern for the JVM to recognize as a string switch. Then the JVM
>>>> can do the advanced optimizations if and when the code is actually
>>>> determined to be a hot spot.
>>>>
>>>> //Fredrik
>>>>
>>>> Paul Benedict skrev:
>>>>
>>>>          
>>>>> Jon,
>>>>>
>>>>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
>>>>> <Jonathan.Gibbons at sun.com>  wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> If hell were to freeze over, and String.hashCode were to change in
>>>>>> JDK
>>>>>>
>>>>>>              
>>>> n, n
>>>>
>>>>          
>>>>>>> =8, then javac could emit different code for Strings in switch,
>>>>>>>
>>>>>>>                
>>>> depending
>>>>
>>>>          
>>>>>> on the value of -target.
>>>>>>
>>>>>>
>>>>>>              
>>>>> Regarding the state of hell, I don't think a compiler implementation
>>>>> should ever rely on such a gamble. The implication is obvious: if JDK
>>>>> N makes a change (by Oracle, by some future owner of OpenJDK -- who
>>>>> knows what happens 10+ years from now), then class files using the
>>>>> OpenJDK de-sugaring would break. The emitted hash results would no
>>>>> longer match the runtime hashes and execution would be unpredictable.
>>>>>
>>>>> To safely emit hash results into byte code, I think you obviously need
>>>>> to go the extra stretch and make a ruling on the algorithm never
>>>>> changing. Isn't that just simply called being responsible?
>>>>>
>>>>> Paul
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>
>>>>          
>>>
>>>        
>>      
>
>