Strings in Switch
Rémi Forax
forax at univ-mlv.fr
Wed Dec 9 09:03:25 PST 2009
Le 09/12/2009 16:52, Fredrik Öhrström a écrit :
> Ulf Zibis skrev:
>
>> ... but isn't 'case "Hello2":' superfluous ? I guess it's covered by
>> 'default:'
>>
> Yes, but I think the example highlighted the intricacies of the switch
> syntax. :)
>
>> Additionally String#equals(Object object) could be optimized to
>> benefit from the hash codes:
>>
> Maintaining up to date hashcodes for every string will essentially force
> you to access every strings twice.
> There are not enough equals operations on the strings to make up for
> this precalculation cost. Especially since a compare can be so much more
> efficient than a hashcode calculation. Besides, a lot of strings are
> already interned which means that you can always start by checking identity.
>
Hi Fredrik,
Don't forget that String.hashCode() are only calculated once
(in openjdk implementation), String is an non-mutable object.
Rémi
> //Fredrik
>
>> int equalByHashThreshold = 2;
>>
>> public boolean equals(Object anObject) {
>> if (this == anObject) {
>> return true;
>> }
>> if (anObject instanceof String) {
>> String anotherString = (String)anObject;
>> int n = count;
>> if (n == anotherString.count&&
>> (equalByHashThreshold == 0 ||
>> --equalByHashThreshold == 0)&&
>> (anotherString.equalByHashThreshold == 0 ||
>> --anotherString.equalByHashThreshold == 0)&&
>> hash() == anotherString.hash()) {
>> char v1[] = value;
>> char v2[] = anotherString.value;
>> int i = offset;
>> int j = anotherString.offset;
>> while (n-- != 0) {
>> if (v1[i++] != v2[j++])
>> return false;
>> }
>> return true;
>> }
>> }
>> return false;
>> }
>>
>> public int hashCode() {
>> int h = hash;
>> if (h == 0) {
>> int off = offset;
>> char val[] = value;
>> int len = count;
>>
>> for (int i = 0; i< len; i++) {
>> h = 31*h + val[off++];
>> }
>> hash = h;
>> equalByHashThreshold = 0;
>> }
>> return h;
>> }
>>
>> Alternative:
>> public boolean equals(Object anObject) {
>> if (this == anObject) {
>> return true;
>> }
>> if (anObject instanceof String) {
>> String anotherString = (String)anObject;
>> int n = count;
>> if (n == anotherString.count&&
>> hash != 0&& anotherString.hash != 0&&
>> hash() == anotherString.hash()) {
>> hash = -1;
>> anotherString.hash = -1;
>> char v1[] = value;
>> char v2[] = anotherString.value;
>> int i = offset;
>> int j = anotherString.offset;
>> while (n-- != 0) {
>> if (v1[i++] != v2[j++])
>> return false;
>> }
>> return true;
>> }
>> }
>> return false;
>> }
>>
>> public int hashCode() {
>> int h = hash;
>> if (h == 0 || --h == 0) {
>> int off = offset;
>> char val[] = value;
>> int len = count;
>>
>> for (int i = 0; i< len; i++) {
>> h = 31*h + val[off++];
>> }
>> hash = h;
>> }
>> return h;
>> }
>>
>>
>> -Ulf
>>
>>
>> Am 09.12.2009 11:53, Reinier Zwitserloot schrieb:
>>
>>> As I understand it, switch-in-strings is handled during the "lower"
>>> phase of
>>> javac, which must desugar the string switch into legal java code.
>>>
>>> This makes a series of if/elseif cases actually impossible, due to
>>> switch's
>>> unique behaviour in regards to fall-through.... I think. Let's try
>>> this out.
>>> If we have:
>>>
>>> switch(someString) {
>>> case "Hello1":
>>> m1();
>>> default:
>>> case "Hello2":
>>> m2();
>>> break;
>>> case "Hello3":
>>> m3();
>>> }
>>>
>>> how should this translate to a series of if statements, in a way that is
>>> easier than the current nested double switch scenario? I don't really
>>> see a
>>> way.
>>>
>>> There's a compromise, where the original string-to-integer conversion is
>>> done with a series of ifs instead of a switch on hashCode. I don't
>>> really
>>> care about removing the dependency on string's hashCode, but if this is
>>> simpler, than, by all means. Until there's proof otherwise, I side with
>>> Fredrik that a switch on hashCodes is not going to have a measurable
>>> performance impact. As an example, the above would desugar to (with
>>> optional
>>> switch on string's length during string-to-number conversion omitted.
>>> That
>>> may actually be a good idea; it's straight forward and does have an
>>> obvious
>>> performance benefit):
>>>
>>> int $unique;
>>> if ("Hello1".equals(someString)) $unique = 0;
>>> else if ("Hello2".equals(someString)) $unique = 1;
>>> else if ("Hello3".equals(someString)) $unique = 2;
>>> else $unique = 3;
>>>
>>> switch ($unique) {
>>> case 0:
>>> m1();
>>> case 3:
>>> case 1:
>>> m2();
>>> break;
>>> case 2:
>>> m3();
>>> }
>>>
>>>
>>> It avoids dependency on string hashcode (which, for the record, I do not
>>> think needs to be avoided), and it's straightforward and simple for all
>>> possible forms of string-in-switch that I can think of.
>>>
>>> --Reinier Zwitserloot
>>>
>>>
>>>
>>> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik Öhrström<
>>> fredrik.ohrstrom at oracle.com> wrote:
>>>
>>>
>>>
>>>> This discussion reeks of premature optimization.... A tableswitch on
>>>> arbitrary large numbers (aka hashcodes) must be compiled into a
>>>> sequence
>>>> of compares anyway (at least on the x86 platform). If the tableswitch
>>>> happens on a sequence of relatively consecutive numbers, then the JVM
>>>> can create a jump table. But for hashcodes, no way!
>>>>
>>>> Therefore a sequence of compares that work with the interned string
>>>> pointers will be faster. If interning is slow (and/or wastes memory)
>>>> then a sequence of tailored compares that work directly on the
>>>> characters will be the fastest. For example:
>>>>
>>>> switch (s) {
>>>> case "Hello World" : .... break;
>>>> case "Hello Wooot" : .... break;
>>>> default: ....
>>>> }
>>>>
>>>> Could, for example, be compiled into the pseudo-c code:
>>>>
>>>> if (s.length == 11) {
>>>> if (s.chars[8] == L'r'&& !wcscmp(s.chars, L"Hello World")) { ...;
>>>> goto done; }
>>>> if (s.chars[8] == L'o'&& !wcscmp(s.chars, L"Hello Wooot")) { ...;
>>>> goto done; }
>>>> }
>>>> /*default*/
>>>> ....
>>>> done:
>>>>
>>>> Now should javac do this advanced analysis? No! Javac should only
>>>> generate straight forward string compares and jumps that is a
>>>> relatively
>>>> easy pattern for the JVM to recognize as a string switch. Then the JVM
>>>> can do the advanced optimizations if and when the code is actually
>>>> determined to be a hot spot.
>>>>
>>>> //Fredrik
>>>>
>>>> Paul Benedict skrev:
>>>>
>>>>
>>>>> Jon,
>>>>>
>>>>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
>>>>> <Jonathan.Gibbons at sun.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> If hell were to freeze over, and String.hashCode were to change in
>>>>>> JDK
>>>>>>
>>>>>>
>>>> n, n
>>>>
>>>>
>>>>>>> =8, then javac could emit different code for Strings in switch,
>>>>>>>
>>>>>>>
>>>> depending
>>>>
>>>>
>>>>>> on the value of -target.
>>>>>>
>>>>>>
>>>>>>
>>>>> Regarding the state of hell, I don't think a compiler implementation
>>>>> should ever rely on such a gamble. The implication is obvious: if JDK
>>>>> N makes a change (by Oracle, by some future owner of OpenJDK -- who
>>>>> knows what happens 10+ years from now), then class files using the
>>>>> OpenJDK de-sugaring would break. The emitted hash results would no
>>>>> longer match the runtime hashes and execution would be unpredictable.
>>>>>
>>>>> To safely emit hash results into byte code, I think you obviously need
>>>>> to go the extra stretch and make a ruling on the algorithm never
>>>>> changing. Isn't that just simply called being responsible?
>>>>>
>>>>> Paul
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
More information about the coin-dev
mailing list