Strings in Switch

Reinier Zwitserloot reinier at zwitserloot.com
Wed Dec 9 02:53:34 PST 2009


As I understand it, switch-in-strings is handled during the "lower" phase of
javac, which must desugar the string switch into legal java code.

This makes a series of if/elseif cases actually impossible, due to switch's
unique behaviour in regards to fall-through.... I think. Let's try this out.
If we have:

switch(someString) {
    case "Hello1":
       m1();
    default:
    case "Hello2":
       m2();
       break;
    case "Hello3":
       m3();
}

how should this translate to a series of if statements, in a way that is
easier than the current nested double switch scenario? I don't really see a
way.

There's a compromise, where the original string-to-integer conversion is
done with a series of ifs instead of a switch on hashCode. I don't really
care about removing the dependency on string's hashCode, but if this is
simpler, than, by all means. Until there's proof otherwise, I side with
Fredrik that a switch on hashCodes is not going to have a measurable
performance impact. As an example, the above would desugar to (with optional
switch on string's length during string-to-number conversion omitted. That
may actually be a good idea; it's straight forward and does have an obvious
performance benefit):

int $unique;
if ("Hello1".equals(someString)) $unique = 0;
else if ("Hello2".equals(someString)) $unique = 1;
else if ("Hello3".equals(someString)) $unique = 2;
else $unique = 3;

switch ($unique) {
    case 0:
       m1();
    case 3:
    case 1:
       m2();
       break;
    case 2:
       m3();
}


It avoids dependency on string hashcode (which, for the record, I do not
think needs to be avoided), and it's straightforward and simple for all
possible forms of string-in-switch that I can think of.

--Reinier Zwitserloot



On Wed, Dec 9, 2009 at 11:42 AM, Fredrik Öhrström <
fredrik.ohrstrom at oracle.com> wrote:

> This discussion reeks of premature optimization.... A tableswitch on
> arbitrary large numbers (aka hashcodes) must be compiled into a sequence
> of compares anyway (at least on the x86 platform). If the tableswitch
> happens on a sequence of relatively consecutive  numbers, then the JVM
> can create a jump table. But for hashcodes, no way!
>
> Therefore a sequence of compares that work with the interned string
> pointers will be faster. If interning is slow (and/or wastes memory)
> then a sequence of tailored compares that work directly on the
> characters will be the fastest. For example:
>
> switch (s) {
>  case "Hello World"  : .... break;
>  case "Hello Wooot" : .... break;
>  default: ....
> }
>
> Could, for example, be compiled into the pseudo-c code:
>
> if (s.length == 11) {
>  if (s.chars[8] == L'r' && !wcscmp(s.chars,  L"Hello World")) { ...;
> goto done; }
>  if (s.chars[8] == L'o' && !wcscmp(s.chars,  L"Hello Wooot")) { ...;
> goto done; }
> }
> /*default*/
>  ....
> done:
>
> Now should javac do this advanced analysis? No! Javac should only
> generate straight forward string compares and jumps that is a relatively
> easy pattern for the JVM to recognize as a string switch. Then the JVM
> can do the advanced optimizations if and when the code is actually
> determined to be a hot spot.
>
> //Fredrik
>
> Paul Benedict skrev:
> > Jon,
> >
> > On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
> > <Jonathan.Gibbons at sun.com> wrote:
> >
> >> If hell were to freeze over, and String.hashCode were to change in JDK
> n, n
> >>
> >>> =8, then javac could emit different code for Strings in switch,
> depending
> >>>
> >> on the value of -target.
> >>
> >
> > Regarding the state of hell, I don't think a compiler implementation
> > should ever rely on such a gamble. The implication is obvious: if JDK
> > N makes a change (by Oracle, by some future owner of OpenJDK -- who
> > knows what happens 10+ years from now), then class files using the
> > OpenJDK de-sugaring would break. The emitted hash results would no
> > longer match the runtime hashes and execution would be unpredictable.
> >
> > To safely emit hash results into byte code, I think you obviously need
> > to go the extra stretch and make a ruling on the algorithm never
> > changing. Isn't that just simply called being responsible?
> >
> > Paul
> >
> >
>
>
>



More information about the coin-dev mailing list