Strings in Switch

Wed Dec 9 07:34:38 PST 2009

Reinier Zwitserloot skrev:
> int $unique;
> if ("Hello1".equals(someString)) $unique = 0;
> else if ("Hello2".equals(someString)) $unique = 1;
> else if ("Hello3".equals(someString)) $unique = 2;
> else $unique = 3;
>
> switch ($unique) {
>     case 0:
>        m1();
>     case 3:
>     case 1:
>        m2();
>        break;
>     case 2:
>        m3();
> }
>
>
> It avoids dependency on string hashcode (which, for the record, I do 
> not think needs to be avoided), and it's straightforward and simple 
> for all possible forms of string-in-switch that I can think of.
Yes, this is much better. This is much easier to understand and the 
pattern is trivial to catch in the JVM. There are  many opportunities 
for the compiler (even without strings-in-switch awareness) to optimize 
this sequence of compares and it avoids forcing a full calculation of a 
hash code that has to traverse the full string. Remember that a string 
compare can terminate early, a hashcode calculation cannot. Also a 
string compare works on as large blocks as possible per iteration 
(8bytes in 64 bit machines, even 16byte blocks with SSE2). If the JVM 
decides that it would be beneficial to use its own internal hashcodes to 
optimize the code, then it can do so.

//Fredrik
> --Reinier Zwitserloot
>
>
>
> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik Öhrström 
> <fredrik.ohrstrom at oracle.com <mailto:fredrik.ohrstrom at oracle.com>> wrote:
>
>     This discussion reeks of premature optimization.... A tableswitch on
>     arbitrary large numbers (aka hashcodes) must be compiled into a
>     sequence
>     of compares anyway (at least on the x86 platform). If the tableswitch
>     happens on a sequence of relatively consecutive  numbers, then the JVM
>     can create a jump table. But for hashcodes, no way!
>
>     Therefore a sequence of compares that work with the interned string
>     pointers will be faster. If interning is slow (and/or wastes memory)
>     then a sequence of tailored compares that work directly on the
>     characters will be the fastest. For example:
>
>     switch (s) {
>      case "Hello World"  : .... break;
>      case "Hello Wooot" : .... break;
>      default: ....
>     }
>
>     Could, for example, be compiled into the pseudo-c code:
>
>     if (s.length == 11) {
>      if (s.chars[8] == L'r' && !wcscmp(s.chars,  L"Hello World")) { ...;
>     goto done; }
>      if (s.chars[8] == L'o' && !wcscmp(s.chars,  L"Hello Wooot")) { ...;
>     goto done; }
>     }
>     /*default*/
>      ....
>     done:
>
>     Now should javac do this advanced analysis? No! Javac should only
>     generate straight forward string compares and jumps that is a
>     relatively
>     easy pattern for the JVM to recognize as a string switch. Then the JVM
>     can do the advanced optimizations if and when the code is actually
>     determined to be a hot spot.
>
>     //Fredrik
>
>     Paul Benedict skrev:
>     > Jon,
>     >
>     > On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
>     > <Jonathan.Gibbons at sun.com <mailto:Jonathan.Gibbons at sun.com>> wrote:
>     >
>     >> If hell were to freeze over, and String.hashCode were to change
>     in JDK n, n
>     >>
>     >>> =8, then javac could emit different code for Strings in
>     switch, depending
>     >>>
>     >> on the value of -target.
>     >>
>     >
>     > Regarding the state of hell, I don't think a compiler implementation
>     > should ever rely on such a gamble. The implication is obvious:
>     if JDK
>     > N makes a change (by Oracle, by some future owner of OpenJDK -- who
>     > knows what happens 10+ years from now), then class files using the
>     > OpenJDK de-sugaring would break. The emitted hash results would no
>     > longer match the runtime hashes and execution would be
>     unpredictable.
>     >
>     > To safely emit hash results into byte code, I think you
>     obviously need
>     > to go the extra stretch and make a ruling on the algorithm never
>     > changing. Isn't that just simply called being responsible?
>     >
>     > Paul
>     >
>     >
>
>
>