From Joe.Darcy at Sun.COM Tue Dec 1 15:09:44 2009 From: Joe.Darcy at Sun.COM (Joseph D. Darcy) Date: Tue, 01 Dec 2009 15:09:44 -0800 Subject: Bug reports In-Reply-To: References: Message-ID: <4B15A238.7060200@sun.com> Hi Mark. Catching up on email after Devoxx and Thanksgiving, Mark Mahieu wrote: > Hi Joe, > > Is there any particular process you'd prefer Coin bug reports to go through, or should they just be filed at the usual place? > Hmm... First, thanks for asking :-) In general bugs.sun.com or other existing channel to file bugs on JDK 7 is suitable for filing bugs on Coin features too of course. For coin-dev participants, sending bugs to the list is reasonable as long as the general bug information is provided: synopsis/summary version info (which build and OS) description how to reproduce expected result vs actual result error messages source code causing the problem workaround (if available) The submitting problems should generally be actual bugs in the implementation or specification, such as "In JDK 7 build N, javac crashes on this use of the diamond operator..." or "The new literal grammar excludes numbers with one digit" as opposed to "Instead of the diamond operator, var declarations should be allowed." -Joe From ted at tedneward.com Thu Dec 3 02:37:33 2009 From: ted at tedneward.com (Ted Neward) Date: Thu, 3 Dec 2009 02:37:33 -0800 Subject: Post-Devoxx Project Coin update, closures off-topic for coin-dev In-Reply-To: <4B13EF40.2090408@sun.com> References: <4B13EF40.2090408@sun.com> Message-ID: <071a01ca7404$a1983b00$e4c8b100$@com> Joe-- Is the mailing list for Project Lambda up and running yet? Ted Neward Java, .NET, XML Services Consulting, Teaching, Speaking, Writing http://www.tedneward.com > -----Original Message----- > From: coin-dev-bounces at openjdk.java.net [mailto:coin-dev- > bounces at openjdk.java.net] On Behalf Of Joseph D. Darcy > Sent: Monday, November 30, 2009 8:14 AM > To: coin-dev at openjdk.java.net > Subject: Post-Devoxx Project Coin update, closures off-topic for coin- > dev > > Hello. > > As has been announced recently at Devoxx and covered in various places, > including previous threads on this list, Mark Reinhold made several > announcements about JDK 7 at this year's Devoxx: > > 1) JDK 7 will have a form of closures. > 2) The JDK 7 schedule is being extended to roughly fall 2010. > > On the first announcement, the coin-dev list is not the appropriate > forum to discuss closures in Java. Closures are hereby decreed as > off-topic for coin-dev. > > Mark's blog entry "Closures for Java" > (http://blogs.sun.com/mr/entry/closures) invites those with an informed > opinion to participate in the current discussion; watch Mark's blog for > news about creation of a new list or project, etc., to host this > closures effort. > > On the second announcement, while the JDK 7 schedule has been extended, > many of the current final five (or so) Project Coin features have not > yet been fully implemented, specified, and tested. Therefore, there > will *not* be a general reassessment of Project Coin feature selection > or another call for proposals in JDK 7. The final five (or so) > proposals remain selected for inclusion in JDK 7 and work will continue > to complete those features. However, given its technical merit and the > possibility of providing useful infrastructure for ARM, improved > exception handling is now being reconsidered for inclusion in JDK 7. > No > other "for further consideration" proposal is under reconsideration. > > -Joe From ted at tedneward.com Thu Dec 3 02:34:02 2009 From: ted at tedneward.com (Ted Neward) Date: Thu, 3 Dec 2009 02:34:02 -0800 Subject: ARM syntax and new keywords In-Reply-To: <15e8b9d20911261248y14c6c723re7c5b00784815a2d@mail.gmail.com> References: <200911261830.20223.david.goodenough@linkchoose.co.uk> <15e8b9d20911261248y14c6c723re7c5b00784815a2d@mail.gmail.com> Message-ID: <071901ca7404$24db1830$6e914890$@com> Actually, the exotic identifiers syntax has a loophole that probably should be closed: class #"java\\lang\\Object" // compiles into /java/lang/Object { } Ted Neward Java, .NET, XML Services Consulting, Teaching, Speaking, Writing http://www.tedneward.com > -----Original Message----- > From: coin-dev-bounces at openjdk.java.net [mailto:coin-dev- > bounces at openjdk.java.net] On Behalf Of Neal Gafter > Sent: Thursday, November 26, 2009 12:49 PM > To: David Goodenough > Cc: coin-dev at openjdk.java.net > Subject: Re: ARM syntax and new keywords > > On Thu, Nov 26, 2009 at 10:30 AM, David Goodenough < > david.goodenough at linkchoose.co.uk> wrote: > > > On Thursday 26 November 2009, Neal Gafter wrote: > > > Now that jdk7 will include a syntax for using names that are > otherwise > > > keywords (the syntax is #"name"), the backward-compatibility > breakage of > > > adding a new keyword is much less severe. Also, context-sensitive > > keywords > > > are a true-and-tried technique, just not used yet in Java. > > > > > I hope that the # operator will be available for both methods and > fields. > > This was discussed early on in the Coin process, in my lightweight > > properties > > proposal. It is really the only bit of that proposal that is needed, > the > > rest > > can be done in other ways. > > > > This has nothing to do with closures (there is a separate email list > for > that anyway). The #"name" syntax was added to give a way of using > "exotic" > identifiers that would otherwise be disallowed. For example, if you > have a > method pre jdk7 > > int foobar(int x) { return x + 1; } > > and then we add a keyword "foobar" to the language in jdk7, you can > still > use the identifier "foobar" by modifying the source as following > > int #"foobar"(int x) { return x + 1; } > > This is even possible for package names, though a bit awkward > > package #"foobar"; > > Cheers, > Neal From Joe.Darcy at Sun.COM Thu Dec 3 13:02:35 2009 From: Joe.Darcy at Sun.COM (Joe Darcy) Date: Thu, 03 Dec 2009 13:02:35 -0800 Subject: Post-Devoxx Project Coin update, closures off-topic for coin-dev In-Reply-To: <071a01ca7404$a1983b00$e4c8b100$@com> References: <4B13EF40.2090408@sun.com> <071a01ca7404$a1983b00$e4c8b100$@com> Message-ID: <4B18276B.2090305@sun.com> Ted Neward wrote: > Joe-- > > Is the mailing list for Project Lambda up and running yet? > No; the vote to create the project is still being conducted on compiler-dev. -Joe From pbenedict at apache.org Sat Dec 5 17:35:05 2009 From: pbenedict at apache.org (Paul Benedict) Date: Sat, 5 Dec 2009 19:35:05 -0600 Subject: Strings in Switch Message-ID: Joe, I reviewed the check-in and read a presentation about the implementation. I get that it translates into two switch statements, but I think there are cases where the duality can be eliminated. If the first switch statement produces no hash collisions, I don't think the second switch statement is necessary. Thoughts? Paul From reinier at zwitserloot.com Sun Dec 6 19:38:02 2009 From: reinier at zwitserloot.com (Reinier Zwitserloot) Date: Mon, 7 Dec 2009 04:38:02 +0100 Subject: Strings in Switch In-Reply-To: References: Message-ID: <560fb5ed0912061938t26f338b2i64419eecd363cdb9@mail.gmail.com> Paul, I don't think there are any cases that are going to occur in real life. The problem is false positives. Let' say I write: String x = getSomeString(); switch (x) { case "hello": doX(); break; case "world": doY(); break; } 'hello' and 'world' have different hashes, so it seems we can just desugar this to: switch(x.hashCode()) { case "hello".hashCode(): doX(); break; case "world".hashCode(): doY(); break; } but that's not the right desugaring, because if I return some string that ISNT "hello" but so happens to hashCode to the same hashCode as "hello", then we just broke the program. There's no way to know that x.hashCode() couldn't possibly collide, unless x is a compile time literal. However, that seems to be a rather rare academic case; case expressions already need to be compile time constants, so this would mean we have a switch statement comprised of 100% compile time literals. Something like: switch ("hello") { case "hello": .... case "world": .... } other than debug code and some half-hearted attempt at macroing, this isn't ever going to occur in java code. I don't think its a good idea to burden either the JLS or javac with a special case for this scenario; the dual switch one will handle this just fine. Of course, if there's some other code form which could be computed without possible collision issues in a single switch statement, please show it to us so we can come up with a strategy for detecting this situation. --Reinier Zwitserloot On Sun, Dec 6, 2009 at 2:35 AM, Paul Benedict wrote: > Joe, > > I reviewed the check-in and read a presentation about the > implementation. I get that it translates into two switch statements, > but I think there are cases where the duality can be eliminated. If > the first switch statement produces no hash collisions, I don't think > the second switch statement is necessary. Thoughts? > > Paul > > From neal at gafter.com Sun Dec 6 20:38:33 2009 From: neal at gafter.com (Neal Gafter) Date: Sun, 6 Dec 2009 20:38:33 -0800 Subject: Strings in Switch In-Reply-To: <560fb5ed0912061938t26f338b2i64419eecd363cdb9@mail.gmail.com> References: <560fb5ed0912061938t26f338b2i64419eecd363cdb9@mail.gmail.com> Message-ID: <15e8b9d20912062038r22d009bao22454f0b7eb30cd8@mail.gmail.com> On Sun, Dec 6, 2009 at 7:38 PM, Reinier Zwitserloot wrote: > 'hello' and 'world' have different hashes, so it seems we can just desugar > this to: > > switch(x.hashCode()) { > case "hello".hashCode(): doX(); break; > case "world".hashCode(): doY(); break; > } > > > but that's not the right desugaring, because if I return some string that > ISNT "hello" but so happens to hashCode to the same hashCode as "hello", > then we just broke the program. > I think Paul is imagining compiler-generated if statements inside the cases to check string.equals(x, "hello") etc. Cheers, Neal From Joe.Darcy at Sun.COM Sun Dec 6 22:07:59 2009 From: Joe.Darcy at Sun.COM (Joseph D. Darcy) Date: Sun, 06 Dec 2009 22:07:59 -0800 Subject: Strings in Switch In-Reply-To: References: Message-ID: <4B1C9BBF.7010704@sun.com> Paul Benedict wrote: > Joe, > > I reviewed the check-in and read a presentation about the > implementation. I get that it translates into two switch statements, > but I think there are cases where the duality can be eliminated. If > the first switch statement produces no hash collisions, I don't think > the second switch statement is necessary. Thoughts? > > There are many possible desugarings of strings in switch into one or more switch statements and other existing code structures, some of which are alluded to the comments of the current strings in switch implementation. If there is no "bad" control flow in the original strings in switch (no fall throughs, etc.) and no collisions with the chosen hash function, then yes the the code case "foo": can be replaced with case "foo".hashCode(): if ("foo".equals(...")) {...} if care is taken to implement the semantics of any default alternative that is present. However, for the initial strings in switch implementation in javac, we choose to pursue a single general-purpose strings in switch translation that should always provide at least reasonable performance since it results in less compiler code to test for what is currently a low duty-cycle code structure. If special cases of strings in switch turn out to have high duty-cycles, that would justify additional engineering to support different implementations tailored to different code inputs. -Joe From lk at teamten.com Sun Dec 6 22:14:08 2009 From: lk at teamten.com (Lawrence Kesteloot) Date: Sun, 6 Dec 2009 22:14:08 -0800 Subject: Strings in Switch In-Reply-To: <4B1C9BBF.7010704@sun.com> References: <4B1C9BBF.7010704@sun.com> Message-ID: <997cab100912062214n3a1effa8o9147d0acdaf8c42a@mail.gmail.com> Now that we plan to have closures, do we still need strings-in-switch? Won't a string-to-function map be about as fast (though maybe less convenient)? I don't know what the use cases are for strings-in-switch, but the feature already felt a bit low-benefit to me, and seems even more so now with closures. Lawrence On Sun, Dec 6, 2009 at 10:07 PM, Joseph D. Darcy wrote: > Paul Benedict wrote: >> Joe, >> >> I reviewed the check-in and read a presentation about the >> implementation. I get that it translates into two switch statements, >> but I think there are cases where the duality can be eliminated. If >> the first switch statement produces no hash collisions, I don't think >> the second switch statement is necessary. Thoughts? >> >> > > There are many possible desugarings of strings in switch into one or > more switch statements and other existing code structures, some of which > are alluded to the comments of the current strings in switch implementation. > > If there is no "bad" control flow in the original strings in switch (no > fall throughs, etc.) and no collisions with the chosen hash function, > then yes the the code > ?case "foo": > can be replaced with > ?case "foo".hashCode(): > ? ? ?if ("foo".equals(...")) {...} > if care is taken to implement the semantics of any default alternative > that is present. > > However, for the initial strings in switch implementation in javac, we > choose to pursue a single general-purpose strings in switch translation > that should always provide at least reasonable performance since it > results in less compiler code to test for what is currently a low > duty-cycle code structure. > > If special cases of strings in switch turn out to have high duty-cycles, > that would justify additional engineering to support different > implementations tailored to different code inputs. > > -Joe > > > From reinier at zwitserloot.com Sun Dec 6 22:17:11 2009 From: reinier at zwitserloot.com (Reinier Zwitserloot) Date: Mon, 7 Dec 2009 07:17:11 +0100 Subject: Strings in Switch In-Reply-To: <997cab100912062214n3a1effa8o9147d0acdaf8c42a@mail.gmail.com> References: <4B1C9BBF.7010704@sun.com> <997cab100912062214n3a1effa8o9147d0acdaf8c42a@mail.gmail.com> Message-ID: <560fb5ed0912062217h30ecaca3qdb44c49c4efe164e@mail.gmail.com> Let's no go closure crazy. The fact that switch does NOT support strings today is a silly triviality that catches out many beginning java programmers. A lot of the hard work on this proposal has already been done, so abandoning it now does not seem like a good idea. Also, Mark Reinhold's plan for closures does not include transparency, which would make a closure-based function map much inferior to strings in switch which is of course transparent. --Reinier Zwitserloot On Mon, Dec 7, 2009 at 7:14 AM, Lawrence Kesteloot wrote: > Now that we plan to have closures, do we still need strings-in-switch? > Won't a string-to-function map be about as fast (though maybe less > convenient)? I don't know what the use cases are for > strings-in-switch, but the feature already felt a bit low-benefit to > me, and seems even more so now with closures. > > Lawrence > > > On Sun, Dec 6, 2009 at 10:07 PM, Joseph D. Darcy > wrote: > > Paul Benedict wrote: > >> Joe, > >> > >> I reviewed the check-in and read a presentation about the > >> implementation. I get that it translates into two switch statements, > >> but I think there are cases where the duality can be eliminated. If > >> the first switch statement produces no hash collisions, I don't think > >> the second switch statement is necessary. Thoughts? > >> > >> > > > > There are many possible desugarings of strings in switch into one or > > more switch statements and other existing code structures, some of which > > are alluded to the comments of the current strings in switch > implementation. > > > > If there is no "bad" control flow in the original strings in switch (no > > fall throughs, etc.) and no collisions with the chosen hash function, > > then yes the the code > > case "foo": > > can be replaced with > > case "foo".hashCode(): > > if ("foo".equals(...")) {...} > > if care is taken to implement the semantics of any default alternative > > that is present. > > > > However, for the initial strings in switch implementation in javac, we > > choose to pursue a single general-purpose strings in switch translation > > that should always provide at least reasonable performance since it > > results in less compiler code to test for what is currently a low > > duty-cycle code structure. > > > > If special cases of strings in switch turn out to have high duty-cycles, > > that would justify additional engineering to support different > > implementations tailored to different code inputs. > > > > -Joe > > > > > > > > From lk at teamten.com Sun Dec 6 22:38:30 2009 From: lk at teamten.com (Lawrence Kesteloot) Date: Sun, 6 Dec 2009 22:38:30 -0800 Subject: Strings in Switch In-Reply-To: <560fb5ed0912062217h30ecaca3qdb44c49c4efe164e@mail.gmail.com> References: <4B1C9BBF.7010704@sun.com> <997cab100912062214n3a1effa8o9147d0acdaf8c42a@mail.gmail.com> <560fb5ed0912062217h30ecaca3qdb44c49c4efe164e@mail.gmail.com> Message-ID: <997cab100912062238t106db1c6r95fe6b09ca2b02be@mail.gmail.com> On Sun, Dec 6, 2009 at 10:17 PM, Reinier Zwitserloot wrote: > Let's no go closure crazy. It's interesting that one of the arguments for closures is that it makes language feature less necessary, but now suggesting that this feature might be less necessary is "going closure crazy". > The fact that switch does NOT support strings > today is a silly triviality that catches out many beginning java > programmers. A triviality? Now every Java compiler has to support this, every IDE, people have to learn it, people have to remember that it won't work if their code might one day have to be compiled by JDK 6, etc. I don't think anything in something as large as Java is a silly triviality. Every feature has a non-trivial cost. > A lot of the hard work on this proposal has already been done, so abandoning > it now does not seem like a good idea. That's 100% irrelevant. You don't add a feature to a language just because the work has been done. If it's, on balance, not a worthwhile features, then we remove it, sorry to those who spent time on it. Put another way, if project coin would not today accept this feature in light of closures, then it should pull it out. > Also, Mark Reinhold's plan for closures does not include transparency, which > would make a closure-based function map much inferior to strings in switch > which is of course transparent. Good point. Lawrence From Jonathan.Gibbons at Sun.COM Mon Dec 7 10:03:50 2009 From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons) Date: Mon, 07 Dec 2009 10:03:50 -0800 Subject: Strings in Switch In-Reply-To: References: Message-ID: <4B1D4386.6080907@sun.com> Paul Benedict wrote: > Joe, > > I reviewed the check-in and read a presentation about the > implementation. I get that it translates into two switch statements, > but I think there are cases where the duality can be eliminated. If > the first switch statement produces no hash collisions, I don't think > the second switch statement is necessary. Thoughts? > > Paul > > Paul, Don't forget you have to take care to handle all the other strings in the world that might be passed into the first switch. It's not just a matter of looking at the strings explicitly listed in the case labels, but of all the other strings that might be handled in the default case. -- Jon From reinier at zwitserloot.com Mon Dec 7 08:35:29 2009 From: reinier at zwitserloot.com (Reinier Zwitserloot) Date: Mon, 7 Dec 2009 17:35:29 +0100 Subject: Strings in Switch In-Reply-To: <997cab100912062238t106db1c6r95fe6b09ca2b02be@mail.gmail.com> References: <4B1C9BBF.7010704@sun.com> <997cab100912062214n3a1effa8o9147d0acdaf8c42a@mail.gmail.com> <560fb5ed0912062217h30ecaca3qdb44c49c4efe164e@mail.gmail.com> <997cab100912062238t106db1c6r95fe6b09ca2b02be@mail.gmail.com> Message-ID: <560fb5ed0912070835q7e62a20bv63daf25ab86f3336@mail.gmail.com> inline. On Mon, Dec 7, 2009 at 7:38 AM, Lawrence Kesteloot wrote: > It's interesting that one of the arguments for closures is that it > makes language feature less necessary, but now suggesting that this > feature might be less necessary is "going closure crazy". > > Please don't respond to emails until you read all of them. I explained why your idea is going closure crazy in the next paragraph. > A triviality? > You misunderstand my words. The fact that switch DOES work on ints, but does NOT work on Strings, _THAT_ is trivia. You have to know. It's not obvious. It seems like an arbitrary restriction. Doing away with this restriction is good. In fact, for the next coin, I wouldn't be opposed to enabling longs in switch as well, for the same reason. Consistency amongst the concept of compile time literals. All primitives, and Strings, can come in 'literal' form. This is relevant in a number of places, including compile-time inlining of constants. However, for switch, there are 3 exceptions: booleans, longs, and Strings. Booleans are irrelevant for obvious reasons. Strings are now getting added. Which leaves just the longs. Also, "people have to learn it"? No. There's not a soul on this earth who knows what java switch statements are that is going to be confused by strings in switch. People WOULD have to learn to use a library and/or pattern based around a Map, though. That's 100% irrelevant. You don't add a feature to a language just > because the work has been done. There's a limited budget to spend on improving java. We've already spent a lot of it here. It got through coin, a lot of peer review, and it hasn't run into serious opposition until you brought it up. Of course it's relevant. > > Also, Mark Reinhold's plan for closures does not include transparency, > which > > would make a closure-based function map much inferior to strings in > switch > > which is of course transparent. > > Good point. > > Yes, it is. From pbenedict at apache.org Mon Dec 7 14:43:26 2009 From: pbenedict at apache.org (Paul Benedict) Date: Mon, 7 Dec 2009 16:43:26 -0600 Subject: Strings in Switch In-Reply-To: <4B1D4386.6080907@sun.com> References: <4B1D4386.6080907@sun.com> Message-ID: I see that syntactic sugar assumes that the compiler and the runtime environment both use the same string hashing algorithm. As noted, the algorithm has never changed since at least JDK 1.2. Even if unlikely, I don't feel comfortable with this assumption - I do not have an alternative to propose either -- but I thought it was worth voicing. Regardless, I see this as a pure detail of one possible implementation. Another implementation may not choose to use hash codes at all. Am I correct, or am I wrong and the JLS change will mandate the use of hashCode for switching? Paul From jorge.ortiz at gmail.com Mon Dec 7 15:14:21 2009 From: jorge.ortiz at gmail.com (Jorge Ortiz) Date: Mon, 7 Dec 2009 15:14:21 -0800 Subject: Strings in Switch In-Reply-To: References: <4B1D4386.6080907@sun.com> Message-ID: <22a410d00912071514k3c18aanb9ec023b31f43e3d@mail.gmail.com> The equivalent in Scala of "Strings in Switch" is pattern matching on strings. In Scala this is implemented as nested if-else statements. This is probably necessary because Scala's pattern matching has some additional features (like matching only if a conditional is met, or matching on extractors) that probably wouldn't mesh will with optimizations like hashCode. That said, in my 2.5 years using Scala I've never once heard anyone complain about the performance of pattern matching on strings. It'd be interesting to see some benchmarks, but I'd guess that the difference in performance between the equality approach and the hashCode approach is unnoticeable unless you're matching on either really, really long strings or a very, very large number of strings. Neither of these scenarios is likely to be true for switch statements. --j On Mon, Dec 7, 2009 at 2:43 PM, Paul Benedict wrote: > I see that syntactic sugar assumes that the compiler and the runtime > environment both use the same string hashing algorithm. As noted, the > algorithm has never changed since at least JDK 1.2. Even if unlikely, > I don't feel comfortable with this assumption - I do not have an > alternative to propose either -- but I thought it was worth voicing. > > Regardless, I see this as a pure detail of one possible > implementation. Another implementation may not choose to use hash > codes at all. Am I correct, or am I wrong and the JLS change will > mandate the use of hashCode for switching? > > Paul > > From Joe.Darcy at Sun.COM Mon Dec 7 16:38:05 2009 From: Joe.Darcy at Sun.COM (Joseph D. Darcy) Date: Mon, 07 Dec 2009 16:38:05 -0800 Subject: Strings in Switch In-Reply-To: References: <4B1D4386.6080907@sun.com> Message-ID: <4B1D9FED.7090406@sun.com> Paul Benedict wrote: > I see that syntactic sugar assumes that the compiler and the runtime > environment both use the same string hashing algorithm. As noted, the > algorithm has never changed since at least JDK 1.2. Even if unlikely, > I don't feel comfortable with this assumption - I do not have an > alternative to propose either -- but I thought it was worth voicing. > This assumption is explicitly called out in comment in the implementation; we are aware of the potential problem and are making a different judgment on the comfortability of the implementation strategy. > Regardless, I see this as a pure detail of one possible > implementation. Another implementation may not choose to use hash > codes at all. Am I correct, or am I wrong and the JLS change will > mandate the use of hashCode for switching? > The JLS will be completely silent on the implementation technique. The entire strings in switch spec change is adding "String, " to the list of valid types of expressions that can be switched on. -Joe From pbenedict at apache.org Mon Dec 7 18:00:05 2009 From: pbenedict at apache.org (Paul Benedict) Date: Mon, 7 Dec 2009 20:00:05 -0600 Subject: Strings in Switch In-Reply-To: <4B1D9FED.7090406@sun.com> References: <4B1D4386.6080907@sun.com> <4B1D9FED.7090406@sun.com> Message-ID: Joe, On Mon, Dec 7, 2009 at 6:38 PM, Joseph D. Darcy wrote: > we are aware of the potential problem and are making a different judgment on > the comfortability of the implementation strategy. Have you thought about calculating the hash code, not as part of the compiler's emitted bytecode, but when the class is loaded? Maybe it is possible to desugar the code into a static { } so the compiler's environment is taken out of the equation. However, this would mean your double-switch would no longer be usable since case labels must be constants, but there are no constant restrictions regarding if/else chains. Another possible strategy is to export the current String hashing algorithm into some public method and make the JLS rely on that method. Eh, I don't like it, but it's a theoretical option. Paul From reinier at zwitserloot.com Mon Dec 7 19:34:41 2009 From: reinier at zwitserloot.com (Reinier Zwitserloot) Date: Tue, 8 Dec 2009 04:34:41 +0100 Subject: Strings in Switch In-Reply-To: References: <4B1D4386.6080907@sun.com> <4B1D9FED.7090406@sun.com> Message-ID: <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> String.hashCode()'s exact algorithm is codified in the official javadoc. It is therefore canon. Thus, changing String.hashCode breaks backwards compatibility. Java has never broken backwards compatibility in such a core feature. Hell freezes over before hashCode() will change comes to mind. If Strings are ever going to get a different hashCode algorithm, I expect it will be an internal affair, with special-casing code in e.g. HashMap to use the more efficient one, leaving the public-facing hashCode() intact, lest tons of existing code that relies on string hashCodes breaks. I'm not for or against any particular implementation of strings-in-switch, just making an observation. --Reinier Zwitserloot Need to receive donations via the web? Check https://tipit.to/ On Tue, Dec 8, 2009 at 3:00 AM, Paul Benedict wrote: > Joe, > > On Mon, Dec 7, 2009 at 6:38 PM, Joseph D. Darcy wrote: > > we are aware of the potential problem and are making a different judgment > on > > the comfortability of the implementation strategy. > > Have you thought about calculating the hash code, not as part of the > compiler's emitted bytecode, but when the class is loaded? Maybe it is > possible to desugar the code into a static { } so the compiler's > environment is taken out of the equation. However, this would mean > your double-switch would no longer be usable since case labels must be > constants, but there are no constant restrictions regarding if/else > chains. > > Another possible strategy is to export the current String hashing > algorithm into some public method and make the JLS rely on that > method. Eh, I don't like it, but it's a theoretical option. > > Paul > > From Joe.Darcy at Sun.COM Mon Dec 7 20:06:19 2009 From: Joe.Darcy at Sun.COM (Joe Darcy) Date: Mon, 07 Dec 2009 20:06:19 -0800 Subject: Strings in Switch In-Reply-To: <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> References: <4B1D4386.6080907@sun.com> <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> Message-ID: <4B1DD0BB.7010709@sun.com> Reinier Zwitserloot wrote: > String.hashCode()'s exact algorithm is codified in the official javadoc. It > is therefore canon. Thus, changing String.hashCode breaks backwards > compatibility. Java has never broken backwards compatibility in such a core > feature. Hell freezes over before hashCode() will change comes to mind. > Back in the dawn of time, the JLS also contained the javadoc of the platform classes. JLSv1 had a hashing algorithm for string that that only sampled 8 or 9 characters of the string! The actual javadoc had evolved to specify the current algorithm, which is a function of all the characters. When the irresistible force the platform javadoc met the immovable object of the JLS, in this case the javadoc won and became the canonical specification (and the platform javadoc was quite sensibly removed from the JLS as of JLSv2). Such discrepancies and changes were long ago in a Java platform far, far away. It is vanishingly unlikely that String.hashCode will change again in the SE platform because the "behavioral compatibility" impact would be too large; see "JDK Release Types and Compatibility Regions" http://blogs.sun.com/darcy/entry/release_types_compatibility_regions > If Strings are ever going to get a different hashCode algorithm, I expect it > will be an internal affair, with special-casing code in e.g. HashMap to use > the more efficient one, leaving the public-facing hashCode() intact, lest > tons of existing code that relies on string hashCodes breaks. > As I understand it, some sophisticated collection implementations like ConcurrentHashMap already have internal re-hashing logic to cope with poor-quality hashCode implementations. The hashing algorithm of Strings.hashCode is certainly not wonderful and by default I'm against specifying the hashing algorithm of a class. However, giving the distinguished role of String, I don't foresee its hashing algorithm changing and I believe it is reasonable for strings in switch to rely on that algorithm being used. -Joe From jjb at google.com Tue Dec 8 00:49:28 2009 From: jjb at google.com (Joshua Bloch) Date: Tue, 8 Dec 2009 00:49:28 -0800 Subject: Strings in Switch In-Reply-To: <4B1DD0BB.7010709@sun.com> References: <4B1D4386.6080907@sun.com> <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> Message-ID: <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> Joe, A few very minor clarifications: On Mon, Dec 7, 2009 at 8:06 PM, Joe Darcy wrote: > > Back in the dawn of time, the JLS also contained the javadoc of the > platform classes. JLSv1 had a hashing algorithm for string that that > only sampled 8 or 9 characters of the string! The actual javadoc had > evolved to specify the current algorithm, which is a function of all the > characters. When the irresistible force the platform javadoc met the > immovable object of the JLS, in this case the javadoc won Actually the spec for the String hash function in JLS1e was subtly broken, and unimplementable. The implemented hash function (which the spec was meant to describe) was awful. I used the unimplementable spec as justification for changing the spec and the implementation. In chaos, there is opportunity. Such discrepancies and changes were long ago in a Java platform far, far > away. It is vanishingly unlikely that String.hashCode will change again > in the SE platform because the "behavioral compatibility" impact would > be too large; see > > "JDK Release Types and Compatibility Regions" > http://blogs.sun.com/darcy/entry/release_types_compatibility_regions I am in complete agreement here. > As I understand it, some sophisticated collection implementations like > ConcurrentHashMap already have internal re-hashing logic to cope with > poor-quality hashCode implementations. > In fact, the lowly HashMap has had a secondary "defensive" hash function ever since 1.4. Josh From pbenedict at apache.org Tue Dec 8 06:39:54 2009 From: pbenedict at apache.org (Paul Benedict) Date: Tue, 8 Dec 2009 08:39:54 -0600 Subject: Strings in Switch In-Reply-To: <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> References: <4B1D4386.6080907@sun.com> <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> Message-ID: Joe, > Such discrepancies and changes were long ago in a Java platform far, far > away. ?It is vanishingly unlikely that String.hashCode will change again > in the SE platform because the "behavioral compatibility" impact would > be too large; see I agree the change may be unlikely, but why bet your compiler on it? Since you are encoding the result of the hash **in the class file**, I think it is necessary to ensure it *never* changes. Do remedies exist? Paul From mthornton at optrak.co.uk Tue Dec 8 06:52:22 2009 From: mthornton at optrak.co.uk (Mark Thornton) Date: Tue, 08 Dec 2009 14:52:22 +0000 Subject: Strings in Switch In-Reply-To: References: <4B1D4386.6080907@sun.com> <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> Message-ID: <4B1E6826.4010103@optrak.co.uk> Paul Benedict wrote: > Joe, > > >> Such discrepancies and changes were long ago in a Java platform far, far >> away. It is vanishingly unlikely that String.hashCode will change again >> in the SE platform because the "behavioral compatibility" impact would >> be too large; see >> > > I agree the change may be unlikely, but why bet your compiler on it? > Since you are encoding the result of the hash **in the class file**, I > think it is necessary to ensure it *never* changes. Do remedies exist? > > > Add a note in the JavaDoc to the effect that string switches depend on the hashcode algorithm not changing. Anyone changing the algorithm in spite of such a not could expect serious grief (shot at dawn)! Mark Thornton From reinier at zwitserloot.com Tue Dec 8 08:23:29 2009 From: reinier at zwitserloot.com (Reinier Zwitserloot) Date: Tue, 8 Dec 2009 17:23:29 +0100 Subject: Strings in Switch In-Reply-To: <4B1E6826.4010103@optrak.co.uk> References: <4B1D4386.6080907@sun.com> <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E6826.4010103@optrak.co.uk> Message-ID: <560fb5ed0912080823t52ee3187yb088ab9b5b381540@mail.gmail.com> A note in String's javadoc works for me. It doesn't neccessary have to mention strings in switch. All it really needs to mention is that the algorithm cannot, ever, change, period. --Reinier Zwitserloot On Tue, Dec 8, 2009 at 3:52 PM, Mark Thornton wrote: > Paul Benedict wrote: > > Joe, > > > > > >> Such discrepancies and changes were long ago in a Java platform far, far > >> away. It is vanishingly unlikely that String.hashCode will change again > >> in the SE platform because the "behavioral compatibility" impact would > >> be too large; see > >> > > > > I agree the change may be unlikely, but why bet your compiler on it? > > Since you are encoding the result of the hash **in the class file**, I > > think it is necessary to ensure it *never* changes. Do remedies exist? > > > > > > > Add a note in the JavaDoc to the effect that string switches depend on > the hashcode algorithm not changing. Anyone changing the algorithm in > spite of such a not could expect serious grief (shot at dawn)! > > Mark Thornton > > > > From Jonathan.Gibbons at Sun.COM Tue Dec 8 08:47:55 2009 From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons) Date: Tue, 08 Dec 2009 08:47:55 -0800 Subject: Strings in Switch In-Reply-To: References: <4B1D4386.6080907@sun.com> <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> Message-ID: <4B1E833B.1040200@sun.com> Paul Benedict wrote: > Joe, > > >> Such discrepancies and changes were long ago in a Java platform far, far >> away. It is vanishingly unlikely that String.hashCode will change again >> in the SE platform because the "behavioral compatibility" impact would >> be too large; see >> > > I agree the change may be unlikely, but why bet your compiler on it? > Since you are encoding the result of the hash **in the class file**, I > think it is necessary to ensure it *never* changes. Do remedies exist? > > Paul > > If hell were to freeze over, and String.hashCode were to change in JDK n, n >=8, then javac could emit different code for Strings in switch, depending on the value of -target. -- Jon From pbenedict at apache.org Tue Dec 8 09:00:26 2009 From: pbenedict at apache.org (Paul Benedict) Date: Tue, 8 Dec 2009 11:00:26 -0600 Subject: Strings in Switch In-Reply-To: <4B1E833B.1040200@sun.com> References: <4B1D4386.6080907@sun.com> <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> Message-ID: Jon, On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons wrote: > If hell were to freeze over, and String.hashCode were to change in JDK n, n >>=8, then javac could emit different code for Strings in switch, depending > on the value of -target. Regarding the state of hell, I don't think a compiler implementation should ever rely on such a gamble. The implication is obvious: if JDK N makes a change (by Oracle, by some future owner of OpenJDK -- who knows what happens 10+ years from now), then class files using the OpenJDK de-sugaring would break. The emitted hash results would no longer match the runtime hashes and execution would be unpredictable. To safely emit hash results into byte code, I think you obviously need to go the extra stretch and make a ruling on the algorithm never changing. Isn't that just simply called being responsible? Paul From fredrik.ohrstrom at oracle.com Wed Dec 9 02:42:04 2009 From: fredrik.ohrstrom at oracle.com (=?ISO-8859-1?Q?Fredrik_=D6hrstr=F6m?=) Date: Wed, 09 Dec 2009 11:42:04 +0100 Subject: Strings in Switch In-Reply-To: References: <4B1D4386.6080907@sun.com> <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> Message-ID: <4B1F7EFC.2030601@oracle.com> This discussion reeks of premature optimization.... A tableswitch on arbitrary large numbers (aka hashcodes) must be compiled into a sequence of compares anyway (at least on the x86 platform). If the tableswitch happens on a sequence of relatively consecutive numbers, then the JVM can create a jump table. But for hashcodes, no way! Therefore a sequence of compares that work with the interned string pointers will be faster. If interning is slow (and/or wastes memory) then a sequence of tailored compares that work directly on the characters will be the fastest. For example: switch (s) { case "Hello World" : .... break; case "Hello Wooot" : .... break; default: .... } Could, for example, be compiled into the pseudo-c code: if (s.length == 11) { if (s.chars[8] == L'r' && !wcscmp(s.chars, L"Hello World")) { ...; goto done; } if (s.chars[8] == L'o' && !wcscmp(s.chars, L"Hello Wooot")) { ...; goto done; } } /*default*/ .... done: Now should javac do this advanced analysis? No! Javac should only generate straight forward string compares and jumps that is a relatively easy pattern for the JVM to recognize as a string switch. Then the JVM can do the advanced optimizations if and when the code is actually determined to be a hot spot. //Fredrik Paul Benedict skrev: > Jon, > > On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons > wrote: > >> If hell were to freeze over, and String.hashCode were to change in JDK n, n >> >>> =8, then javac could emit different code for Strings in switch, depending >>> >> on the value of -target. >> > > Regarding the state of hell, I don't think a compiler implementation > should ever rely on such a gamble. The implication is obvious: if JDK > N makes a change (by Oracle, by some future owner of OpenJDK -- who > knows what happens 10+ years from now), then class files using the > OpenJDK de-sugaring would break. The emitted hash results would no > longer match the runtime hashes and execution would be unpredictable. > > To safely emit hash results into byte code, I think you obviously need > to go the extra stretch and make a ruling on the algorithm never > changing. Isn't that just simply called being responsible? > > Paul > > From reinier at zwitserloot.com Wed Dec 9 02:53:34 2009 From: reinier at zwitserloot.com (Reinier Zwitserloot) Date: Wed, 9 Dec 2009 11:53:34 +0100 Subject: Strings in Switch In-Reply-To: <4B1F7EFC.2030601@oracle.com> References: <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> Message-ID: <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> As I understand it, switch-in-strings is handled during the "lower" phase of javac, which must desugar the string switch into legal java code. This makes a series of if/elseif cases actually impossible, due to switch's unique behaviour in regards to fall-through.... I think. Let's try this out. If we have: switch(someString) { case "Hello1": m1(); default: case "Hello2": m2(); break; case "Hello3": m3(); } how should this translate to a series of if statements, in a way that is easier than the current nested double switch scenario? I don't really see a way. There's a compromise, where the original string-to-integer conversion is done with a series of ifs instead of a switch on hashCode. I don't really care about removing the dependency on string's hashCode, but if this is simpler, than, by all means. Until there's proof otherwise, I side with Fredrik that a switch on hashCodes is not going to have a measurable performance impact. As an example, the above would desugar to (with optional switch on string's length during string-to-number conversion omitted. That may actually be a good idea; it's straight forward and does have an obvious performance benefit): int $unique; if ("Hello1".equals(someString)) $unique = 0; else if ("Hello2".equals(someString)) $unique = 1; else if ("Hello3".equals(someString)) $unique = 2; else $unique = 3; switch ($unique) { case 0: m1(); case 3: case 1: m2(); break; case 2: m3(); } It avoids dependency on string hashcode (which, for the record, I do not think needs to be avoided), and it's straightforward and simple for all possible forms of string-in-switch that I can think of. --Reinier Zwitserloot On Wed, Dec 9, 2009 at 11:42 AM, Fredrik ?hrstr?m < fredrik.ohrstrom at oracle.com> wrote: > This discussion reeks of premature optimization.... A tableswitch on > arbitrary large numbers (aka hashcodes) must be compiled into a sequence > of compares anyway (at least on the x86 platform). If the tableswitch > happens on a sequence of relatively consecutive numbers, then the JVM > can create a jump table. But for hashcodes, no way! > > Therefore a sequence of compares that work with the interned string > pointers will be faster. If interning is slow (and/or wastes memory) > then a sequence of tailored compares that work directly on the > characters will be the fastest. For example: > > switch (s) { > case "Hello World" : .... break; > case "Hello Wooot" : .... break; > default: .... > } > > Could, for example, be compiled into the pseudo-c code: > > if (s.length == 11) { > if (s.chars[8] == L'r' && !wcscmp(s.chars, L"Hello World")) { ...; > goto done; } > if (s.chars[8] == L'o' && !wcscmp(s.chars, L"Hello Wooot")) { ...; > goto done; } > } > /*default*/ > .... > done: > > Now should javac do this advanced analysis? No! Javac should only > generate straight forward string compares and jumps that is a relatively > easy pattern for the JVM to recognize as a string switch. Then the JVM > can do the advanced optimizations if and when the code is actually > determined to be a hot spot. > > //Fredrik > > Paul Benedict skrev: > > Jon, > > > > On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons > > wrote: > > > >> If hell were to freeze over, and String.hashCode were to change in JDK > n, n > >> > >>> =8, then javac could emit different code for Strings in switch, > depending > >>> > >> on the value of -target. > >> > > > > Regarding the state of hell, I don't think a compiler implementation > > should ever rely on such a gamble. The implication is obvious: if JDK > > N makes a change (by Oracle, by some future owner of OpenJDK -- who > > knows what happens 10+ years from now), then class files using the > > OpenJDK de-sugaring would break. The emitted hash results would no > > longer match the runtime hashes and execution would be unpredictable. > > > > To safely emit hash results into byte code, I think you obviously need > > to go the extra stretch and make a ruling on the algorithm never > > changing. Isn't that just simply called being responsible? > > > > Paul > > > > > > > From Ulf.Zibis at gmx.de Wed Dec 9 04:50:03 2009 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 09 Dec 2009 13:50:03 +0100 Subject: Strings in Switch In-Reply-To: <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> References: <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> Message-ID: <4B1F9CFB.5060904@gmx.de> +1 ... but isn't 'case "Hello2":' superfluous ? I guess it's covered by 'default:' Additionally String#equals(Object object) could be optimized to benefit from the hash codes: int equalByHashThreshold = 2; public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String)anObject; int n = count; if (n == anotherString.count && (equalByHashThreshold == 0 || --equalByHashThreshold == 0) && (anotherString.equalByHashThreshold == 0 || --anotherString.equalByHashThreshold == 0) && hash() == anotherString.hash()) { char v1[] = value; char v2[] = anotherString.value; int i = offset; int j = anotherString.offset; while (n-- != 0) { if (v1[i++] != v2[j++]) return false; } return true; } } return false; } public int hashCode() { int h = hash; if (h == 0) { int off = offset; char val[] = value; int len = count; for (int i = 0; i < len; i++) { h = 31*h + val[off++]; } hash = h; equalByHashThreshold = 0; } return h; } Alternative: public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String)anObject; int n = count; if (n == anotherString.count && hash != 0 && anotherString.hash != 0 && hash() == anotherString.hash()) { hash = -1; anotherString.hash = -1; char v1[] = value; char v2[] = anotherString.value; int i = offset; int j = anotherString.offset; while (n-- != 0) { if (v1[i++] != v2[j++]) return false; } return true; } } return false; } public int hashCode() { int h = hash; if (h == 0 || --h == 0) { int off = offset; char val[] = value; int len = count; for (int i = 0; i < len; i++) { h = 31*h + val[off++]; } hash = h; } return h; } -Ulf Am 09.12.2009 11:53, Reinier Zwitserloot schrieb: > As I understand it, switch-in-strings is handled during the "lower" phase of > javac, which must desugar the string switch into legal java code. > > This makes a series of if/elseif cases actually impossible, due to switch's > unique behaviour in regards to fall-through.... I think. Let's try this out. > If we have: > > switch(someString) { > case "Hello1": > m1(); > default: > case "Hello2": > m2(); > break; > case "Hello3": > m3(); > } > > how should this translate to a series of if statements, in a way that is > easier than the current nested double switch scenario? I don't really see a > way. > > There's a compromise, where the original string-to-integer conversion is > done with a series of ifs instead of a switch on hashCode. I don't really > care about removing the dependency on string's hashCode, but if this is > simpler, than, by all means. Until there's proof otherwise, I side with > Fredrik that a switch on hashCodes is not going to have a measurable > performance impact. As an example, the above would desugar to (with optional > switch on string's length during string-to-number conversion omitted. That > may actually be a good idea; it's straight forward and does have an obvious > performance benefit): > > int $unique; > if ("Hello1".equals(someString)) $unique = 0; > else if ("Hello2".equals(someString)) $unique = 1; > else if ("Hello3".equals(someString)) $unique = 2; > else $unique = 3; > > switch ($unique) { > case 0: > m1(); > case 3: > case 1: > m2(); > break; > case 2: > m3(); > } > > > It avoids dependency on string hashcode (which, for the record, I do not > think needs to be avoided), and it's straightforward and simple for all > possible forms of string-in-switch that I can think of. > > --Reinier Zwitserloot > > > > On Wed, Dec 9, 2009 at 11:42 AM, Fredrik ?hrstr?m < > fredrik.ohrstrom at oracle.com> wrote: > > >> This discussion reeks of premature optimization.... A tableswitch on >> arbitrary large numbers (aka hashcodes) must be compiled into a sequence >> of compares anyway (at least on the x86 platform). If the tableswitch >> happens on a sequence of relatively consecutive numbers, then the JVM >> can create a jump table. But for hashcodes, no way! >> >> Therefore a sequence of compares that work with the interned string >> pointers will be faster. If interning is slow (and/or wastes memory) >> then a sequence of tailored compares that work directly on the >> characters will be the fastest. For example: >> >> switch (s) { >> case "Hello World" : .... break; >> case "Hello Wooot" : .... break; >> default: .... >> } >> >> Could, for example, be compiled into the pseudo-c code: >> >> if (s.length == 11) { >> if (s.chars[8] == L'r' && !wcscmp(s.chars, L"Hello World")) { ...; >> goto done; } >> if (s.chars[8] == L'o' && !wcscmp(s.chars, L"Hello Wooot")) { ...; >> goto done; } >> } >> /*default*/ >> .... >> done: >> >> Now should javac do this advanced analysis? No! Javac should only >> generate straight forward string compares and jumps that is a relatively >> easy pattern for the JVM to recognize as a string switch. Then the JVM >> can do the advanced optimizations if and when the code is actually >> determined to be a hot spot. >> >> //Fredrik >> >> Paul Benedict skrev: >> >>> Jon, >>> >>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons >>> wrote: >>> >>> >>>> If hell were to freeze over, and String.hashCode were to change in JDK >>>> >> n, n >> >>>>> =8, then javac could emit different code for Strings in switch, >>>>> >> depending >> >>>> on the value of -target. >>>> >>>> >>> Regarding the state of hell, I don't think a compiler implementation >>> should ever rely on such a gamble. The implication is obvious: if JDK >>> N makes a change (by Oracle, by some future owner of OpenJDK -- who >>> knows what happens 10+ years from now), then class files using the >>> OpenJDK de-sugaring would break. The emitted hash results would no >>> longer match the runtime hashes and execution would be unpredictable. >>> >>> To safely emit hash results into byte code, I think you obviously need >>> to go the extra stretch and make a ruling on the algorithm never >>> changing. Isn't that just simply called being responsible? >>> >>> Paul >>> >>> >>> >> >> > > From fredrik.ohrstrom at oracle.com Wed Dec 9 07:34:38 2009 From: fredrik.ohrstrom at oracle.com (=?UTF-8?B?RnJlZHJpayDDlmhyc3Ryw7Zt?=) Date: Wed, 09 Dec 2009 16:34:38 +0100 Subject: Strings in Switch In-Reply-To: <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> References: <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> Message-ID: <4B1FC38E.2010702@oracle.com> Reinier Zwitserloot skrev: > int $unique; > if ("Hello1".equals(someString)) $unique = 0; > else if ("Hello2".equals(someString)) $unique = 1; > else if ("Hello3".equals(someString)) $unique = 2; > else $unique = 3; > > switch ($unique) { > case 0: > m1(); > case 3: > case 1: > m2(); > break; > case 2: > m3(); > } > > > It avoids dependency on string hashcode (which, for the record, I do > not think needs to be avoided), and it's straightforward and simple > for all possible forms of string-in-switch that I can think of. Yes, this is much better. This is much easier to understand and the pattern is trivial to catch in the JVM. There are many opportunities for the compiler (even without strings-in-switch awareness) to optimize this sequence of compares and it avoids forcing a full calculation of a hash code that has to traverse the full string. Remember that a string compare can terminate early, a hashcode calculation cannot. Also a string compare works on as large blocks as possible per iteration (8bytes in 64 bit machines, even 16byte blocks with SSE2). If the JVM decides that it would be beneficial to use its own internal hashcodes to optimize the code, then it can do so. //Fredrik > --Reinier Zwitserloot > > > > On Wed, Dec 9, 2009 at 11:42 AM, Fredrik ?hrstr?m > > wrote: > > This discussion reeks of premature optimization.... A tableswitch on > arbitrary large numbers (aka hashcodes) must be compiled into a > sequence > of compares anyway (at least on the x86 platform). If the tableswitch > happens on a sequence of relatively consecutive numbers, then the JVM > can create a jump table. But for hashcodes, no way! > > Therefore a sequence of compares that work with the interned string > pointers will be faster. If interning is slow (and/or wastes memory) > then a sequence of tailored compares that work directly on the > characters will be the fastest. For example: > > switch (s) { > case "Hello World" : .... break; > case "Hello Wooot" : .... break; > default: .... > } > > Could, for example, be compiled into the pseudo-c code: > > if (s.length == 11) { > if (s.chars[8] == L'r' && !wcscmp(s.chars, L"Hello World")) { ...; > goto done; } > if (s.chars[8] == L'o' && !wcscmp(s.chars, L"Hello Wooot")) { ...; > goto done; } > } > /*default*/ > .... > done: > > Now should javac do this advanced analysis? No! Javac should only > generate straight forward string compares and jumps that is a > relatively > easy pattern for the JVM to recognize as a string switch. Then the JVM > can do the advanced optimizations if and when the code is actually > determined to be a hot spot. > > //Fredrik > > Paul Benedict skrev: > > Jon, > > > > On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons > > > wrote: > > > >> If hell were to freeze over, and String.hashCode were to change > in JDK n, n > >> > >>> =8, then javac could emit different code for Strings in > switch, depending > >>> > >> on the value of -target. > >> > > > > Regarding the state of hell, I don't think a compiler implementation > > should ever rely on such a gamble. The implication is obvious: > if JDK > > N makes a change (by Oracle, by some future owner of OpenJDK -- who > > knows what happens 10+ years from now), then class files using the > > OpenJDK de-sugaring would break. The emitted hash results would no > > longer match the runtime hashes and execution would be > unpredictable. > > > > To safely emit hash results into byte code, I think you > obviously need > > to go the extra stretch and make a ruling on the algorithm never > > changing. Isn't that just simply called being responsible? > > > > Paul > > > > > > > From markmahieu at googlemail.com Wed Dec 9 07:37:04 2009 From: markmahieu at googlemail.com (Mark Mahieu) Date: Wed, 9 Dec 2009 15:37:04 +0000 Subject: Strings in Switch In-Reply-To: <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> References: <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> Message-ID: On 9 Dec 2009, at 10:53, Reinier Zwitserloot wrote: > As I understand it, switch-in-strings is handled during the "lower" phase of > javac, which must desugar the string switch into legal java code. Hmm, that's not quite how I understand it. I picture the "lower" phase as a bridge between the side of javac which is concerned with the Java Language spec (parsing, flow analysis etc), and the side which deals with the VM spec (ie. bytecode generation). Its existence means that neither side need be unnecessarily complicated by details of the other. So, its input is syntax trees which are valid as far as the language spec is concerned, and its output is a simpler set of trees which can be used by the "gen" phase to produce valid JVM classes - but that 'simpler' set is not necessarily an exact subset of the trees used by earlier phases; ie. the output need not be directly representable as valid Java *language* code (synthetics and some uses of "let" expressions for example). > As an example, the above would desugar to (with optional > switch on string's length during string-to-number conversion omitted. That > may actually be a good idea; it's straight forward and does have an obvious > performance benefit): > > int $unique; > if ("Hello1".equals(someString)) $unique = 0; > else if ("Hello2".equals(someString)) $unique = 1; > else if ("Hello3".equals(someString)) $unique = 2; > else $unique = 3; I'm afraid a translation along these lines is likely to entice end users to attempt premature optimisation by messing with the order of the cases. But I still don't see the problem with what Joe proposed (months ago) and implemented. Regards, Mark From fredrik.ohrstrom at oracle.com Wed Dec 9 07:52:23 2009 From: fredrik.ohrstrom at oracle.com (=?UTF-8?B?RnJlZHJpayDDlmhyc3Ryw7Zt?=) Date: Wed, 09 Dec 2009 16:52:23 +0100 Subject: Strings in Switch In-Reply-To: <4B1F9CFB.5060904@gmx.de> References: <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> <4B1F9CFB.5060904@gmx.de> Message-ID: <4B1FC7B7.5070108@oracle.com> Ulf Zibis skrev: > ... but isn't 'case "Hello2":' superfluous ? I guess it's covered by > 'default:' Yes, but I think the example highlighted the intricacies of the switch syntax. :) > Additionally String#equals(Object object) could be optimized to > benefit from the hash codes: Maintaining up to date hashcodes for every string will essentially force you to access every strings twice. There are not enough equals operations on the strings to make up for this precalculation cost. Especially since a compare can be so much more efficient than a hashcode calculation. Besides, a lot of strings are already interned which means that you can always start by checking identity. //Fredrik > int equalByHashThreshold = 2; > > public boolean equals(Object anObject) { > if (this == anObject) { > return true; > } > if (anObject instanceof String) { > String anotherString = (String)anObject; > int n = count; > if (n == anotherString.count && > (equalByHashThreshold == 0 || > --equalByHashThreshold == 0) && > (anotherString.equalByHashThreshold == 0 || > --anotherString.equalByHashThreshold == 0) && > hash() == anotherString.hash()) { > char v1[] = value; > char v2[] = anotherString.value; > int i = offset; > int j = anotherString.offset; > while (n-- != 0) { > if (v1[i++] != v2[j++]) > return false; > } > return true; > } > } > return false; > } > > public int hashCode() { > int h = hash; > if (h == 0) { > int off = offset; > char val[] = value; > int len = count; > > for (int i = 0; i < len; i++) { > h = 31*h + val[off++]; > } > hash = h; > equalByHashThreshold = 0; > } > return h; > } > > Alternative: > public boolean equals(Object anObject) { > if (this == anObject) { > return true; > } > if (anObject instanceof String) { > String anotherString = (String)anObject; > int n = count; > if (n == anotherString.count && > hash != 0 && anotherString.hash != 0 && > hash() == anotherString.hash()) { > hash = -1; > anotherString.hash = -1; > char v1[] = value; > char v2[] = anotherString.value; > int i = offset; > int j = anotherString.offset; > while (n-- != 0) { > if (v1[i++] != v2[j++]) > return false; > } > return true; > } > } > return false; > } > > public int hashCode() { > int h = hash; > if (h == 0 || --h == 0) { > int off = offset; > char val[] = value; > int len = count; > > for (int i = 0; i < len; i++) { > h = 31*h + val[off++]; > } > hash = h; > } > return h; > } > > > -Ulf > > > Am 09.12.2009 11:53, Reinier Zwitserloot schrieb: >> As I understand it, switch-in-strings is handled during the "lower" >> phase of >> javac, which must desugar the string switch into legal java code. >> >> This makes a series of if/elseif cases actually impossible, due to >> switch's >> unique behaviour in regards to fall-through.... I think. Let's try >> this out. >> If we have: >> >> switch(someString) { >> case "Hello1": >> m1(); >> default: >> case "Hello2": >> m2(); >> break; >> case "Hello3": >> m3(); >> } >> >> how should this translate to a series of if statements, in a way that is >> easier than the current nested double switch scenario? I don't really >> see a >> way. >> >> There's a compromise, where the original string-to-integer conversion is >> done with a series of ifs instead of a switch on hashCode. I don't >> really >> care about removing the dependency on string's hashCode, but if this is >> simpler, than, by all means. Until there's proof otherwise, I side with >> Fredrik that a switch on hashCodes is not going to have a measurable >> performance impact. As an example, the above would desugar to (with >> optional >> switch on string's length during string-to-number conversion omitted. >> That >> may actually be a good idea; it's straight forward and does have an >> obvious >> performance benefit): >> >> int $unique; >> if ("Hello1".equals(someString)) $unique = 0; >> else if ("Hello2".equals(someString)) $unique = 1; >> else if ("Hello3".equals(someString)) $unique = 2; >> else $unique = 3; >> >> switch ($unique) { >> case 0: >> m1(); >> case 3: >> case 1: >> m2(); >> break; >> case 2: >> m3(); >> } >> >> >> It avoids dependency on string hashcode (which, for the record, I do not >> think needs to be avoided), and it's straightforward and simple for all >> possible forms of string-in-switch that I can think of. >> >> --Reinier Zwitserloot >> >> >> >> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik ?hrstr?m < >> fredrik.ohrstrom at oracle.com> wrote: >> >> >>> This discussion reeks of premature optimization.... A tableswitch on >>> arbitrary large numbers (aka hashcodes) must be compiled into a >>> sequence >>> of compares anyway (at least on the x86 platform). If the tableswitch >>> happens on a sequence of relatively consecutive numbers, then the JVM >>> can create a jump table. But for hashcodes, no way! >>> >>> Therefore a sequence of compares that work with the interned string >>> pointers will be faster. If interning is slow (and/or wastes memory) >>> then a sequence of tailored compares that work directly on the >>> characters will be the fastest. For example: >>> >>> switch (s) { >>> case "Hello World" : .... break; >>> case "Hello Wooot" : .... break; >>> default: .... >>> } >>> >>> Could, for example, be compiled into the pseudo-c code: >>> >>> if (s.length == 11) { >>> if (s.chars[8] == L'r' && !wcscmp(s.chars, L"Hello World")) { ...; >>> goto done; } >>> if (s.chars[8] == L'o' && !wcscmp(s.chars, L"Hello Wooot")) { ...; >>> goto done; } >>> } >>> /*default*/ >>> .... >>> done: >>> >>> Now should javac do this advanced analysis? No! Javac should only >>> generate straight forward string compares and jumps that is a >>> relatively >>> easy pattern for the JVM to recognize as a string switch. Then the JVM >>> can do the advanced optimizations if and when the code is actually >>> determined to be a hot spot. >>> >>> //Fredrik >>> >>> Paul Benedict skrev: >>> >>>> Jon, >>>> >>>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons >>>> wrote: >>>> >>>> >>>>> If hell were to freeze over, and String.hashCode were to change in >>>>> JDK >>>>> >>> n, n >>> >>>>>> =8, then javac could emit different code for Strings in switch, >>>>>> >>> depending >>> >>>>> on the value of -target. >>>>> >>>>> >>>> Regarding the state of hell, I don't think a compiler implementation >>>> should ever rely on such a gamble. The implication is obvious: if JDK >>>> N makes a change (by Oracle, by some future owner of OpenJDK -- who >>>> knows what happens 10+ years from now), then class files using the >>>> OpenJDK de-sugaring would break. The emitted hash results would no >>>> longer match the runtime hashes and execution would be unpredictable. >>>> >>>> To safely emit hash results into byte code, I think you obviously need >>>> to go the extra stretch and make a ruling on the algorithm never >>>> changing. Isn't that just simply called being responsible? >>>> >>>> Paul >>>> >>>> >>>> >>> >>> >> >> > From per at bothner.com Wed Dec 9 08:37:14 2009 From: per at bothner.com (Per Bothner) Date: Wed, 09 Dec 2009 08:37:14 -0800 Subject: Strings in Switch In-Reply-To: <4B1F7EFC.2030601@oracle.com> References: <4B1D4386.6080907@sun.com> <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> Message-ID: <4B1FD23A.1050404@bothner.com> On 12/09/2009 02:42 AM, Fredrik ?hrstr?m wrote: > A tableswitch on > arbitrary large numbers (aka hashcodes) must be compiled into a sequence > of compares anyway (at least on the x86 platform). If the tableswitch > happens on a sequence of relatively consecutive numbers, then the JVM > can create a jump table. But for hashcodes, no way! A tableswitch on arbitrary large numbers can be compiled to use binary search in a sorted array, which should be fairly efficient. (That is why the tableswitch entries are required to be sorted.) -- --Per Bothner per at bothner.com http://per.bothner.com/ From per at bothner.com Wed Dec 9 08:42:37 2009 From: per at bothner.com (Per Bothner) Date: Wed, 09 Dec 2009 08:42:37 -0800 Subject: Strings in Switch In-Reply-To: <4B1FC7B7.5070108@oracle.com> References: <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> <4B1F9CFB.5060904@gmx.de> <4B1FC7B7.5070108@oracle.com> Message-ID: <4B1FD37D.9020002@bothner.com> On 12/09/2009 07:52 AM, Fredrik ?hrstr?m wrote: > Especially since a compare can be so much more > efficient than a hashcode calculation. I'd be surprised if there is noticeable difference on modern desktop-class processors: Either way you have to do the memory reads, and that's what costs - computation is close to free. -- --Per Bothner per at bothner.com http://per.bothner.com/ From forax at univ-mlv.fr Wed Dec 9 09:03:25 2009 From: forax at univ-mlv.fr (=?UTF-8?B?UsOpbWkgRm9yYXg=?=) Date: Wed, 09 Dec 2009 18:03:25 +0100 Subject: Strings in Switch In-Reply-To: <4B1FC7B7.5070108@oracle.com> References: <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> <4B1F9CFB.5060904@gmx.de> <4B1FC7B7.5070108@oracle.com> Message-ID: <4B1FD85D.8060701@univ-mlv.fr> Le 09/12/2009 16:52, Fredrik ?hrstr?m a ?crit : > Ulf Zibis skrev: > >> ... but isn't 'case "Hello2":' superfluous ? I guess it's covered by >> 'default:' >> > Yes, but I think the example highlighted the intricacies of the switch > syntax. :) > >> Additionally String#equals(Object object) could be optimized to >> benefit from the hash codes: >> > Maintaining up to date hashcodes for every string will essentially force > you to access every strings twice. > There are not enough equals operations on the strings to make up for > this precalculation cost. Especially since a compare can be so much more > efficient than a hashcode calculation. Besides, a lot of strings are > already interned which means that you can always start by checking identity. > Hi Fredrik, Don't forget that String.hashCode() are only calculated once (in openjdk implementation), String is an non-mutable object. R?mi > //Fredrik > >> int equalByHashThreshold = 2; >> >> public boolean equals(Object anObject) { >> if (this == anObject) { >> return true; >> } >> if (anObject instanceof String) { >> String anotherString = (String)anObject; >> int n = count; >> if (n == anotherString.count&& >> (equalByHashThreshold == 0 || >> --equalByHashThreshold == 0)&& >> (anotherString.equalByHashThreshold == 0 || >> --anotherString.equalByHashThreshold == 0)&& >> hash() == anotherString.hash()) { >> char v1[] = value; >> char v2[] = anotherString.value; >> int i = offset; >> int j = anotherString.offset; >> while (n-- != 0) { >> if (v1[i++] != v2[j++]) >> return false; >> } >> return true; >> } >> } >> return false; >> } >> >> public int hashCode() { >> int h = hash; >> if (h == 0) { >> int off = offset; >> char val[] = value; >> int len = count; >> >> for (int i = 0; i< len; i++) { >> h = 31*h + val[off++]; >> } >> hash = h; >> equalByHashThreshold = 0; >> } >> return h; >> } >> >> Alternative: >> public boolean equals(Object anObject) { >> if (this == anObject) { >> return true; >> } >> if (anObject instanceof String) { >> String anotherString = (String)anObject; >> int n = count; >> if (n == anotherString.count&& >> hash != 0&& anotherString.hash != 0&& >> hash() == anotherString.hash()) { >> hash = -1; >> anotherString.hash = -1; >> char v1[] = value; >> char v2[] = anotherString.value; >> int i = offset; >> int j = anotherString.offset; >> while (n-- != 0) { >> if (v1[i++] != v2[j++]) >> return false; >> } >> return true; >> } >> } >> return false; >> } >> >> public int hashCode() { >> int h = hash; >> if (h == 0 || --h == 0) { >> int off = offset; >> char val[] = value; >> int len = count; >> >> for (int i = 0; i< len; i++) { >> h = 31*h + val[off++]; >> } >> hash = h; >> } >> return h; >> } >> >> >> -Ulf >> >> >> Am 09.12.2009 11:53, Reinier Zwitserloot schrieb: >> >>> As I understand it, switch-in-strings is handled during the "lower" >>> phase of >>> javac, which must desugar the string switch into legal java code. >>> >>> This makes a series of if/elseif cases actually impossible, due to >>> switch's >>> unique behaviour in regards to fall-through.... I think. Let's try >>> this out. >>> If we have: >>> >>> switch(someString) { >>> case "Hello1": >>> m1(); >>> default: >>> case "Hello2": >>> m2(); >>> break; >>> case "Hello3": >>> m3(); >>> } >>> >>> how should this translate to a series of if statements, in a way that is >>> easier than the current nested double switch scenario? I don't really >>> see a >>> way. >>> >>> There's a compromise, where the original string-to-integer conversion is >>> done with a series of ifs instead of a switch on hashCode. I don't >>> really >>> care about removing the dependency on string's hashCode, but if this is >>> simpler, than, by all means. Until there's proof otherwise, I side with >>> Fredrik that a switch on hashCodes is not going to have a measurable >>> performance impact. As an example, the above would desugar to (with >>> optional >>> switch on string's length during string-to-number conversion omitted. >>> That >>> may actually be a good idea; it's straight forward and does have an >>> obvious >>> performance benefit): >>> >>> int $unique; >>> if ("Hello1".equals(someString)) $unique = 0; >>> else if ("Hello2".equals(someString)) $unique = 1; >>> else if ("Hello3".equals(someString)) $unique = 2; >>> else $unique = 3; >>> >>> switch ($unique) { >>> case 0: >>> m1(); >>> case 3: >>> case 1: >>> m2(); >>> break; >>> case 2: >>> m3(); >>> } >>> >>> >>> It avoids dependency on string hashcode (which, for the record, I do not >>> think needs to be avoided), and it's straightforward and simple for all >>> possible forms of string-in-switch that I can think of. >>> >>> --Reinier Zwitserloot >>> >>> >>> >>> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik ?hrstr?m< >>> fredrik.ohrstrom at oracle.com> wrote: >>> >>> >>> >>>> This discussion reeks of premature optimization.... A tableswitch on >>>> arbitrary large numbers (aka hashcodes) must be compiled into a >>>> sequence >>>> of compares anyway (at least on the x86 platform). If the tableswitch >>>> happens on a sequence of relatively consecutive numbers, then the JVM >>>> can create a jump table. But for hashcodes, no way! >>>> >>>> Therefore a sequence of compares that work with the interned string >>>> pointers will be faster. If interning is slow (and/or wastes memory) >>>> then a sequence of tailored compares that work directly on the >>>> characters will be the fastest. For example: >>>> >>>> switch (s) { >>>> case "Hello World" : .... break; >>>> case "Hello Wooot" : .... break; >>>> default: .... >>>> } >>>> >>>> Could, for example, be compiled into the pseudo-c code: >>>> >>>> if (s.length == 11) { >>>> if (s.chars[8] == L'r'&& !wcscmp(s.chars, L"Hello World")) { ...; >>>> goto done; } >>>> if (s.chars[8] == L'o'&& !wcscmp(s.chars, L"Hello Wooot")) { ...; >>>> goto done; } >>>> } >>>> /*default*/ >>>> .... >>>> done: >>>> >>>> Now should javac do this advanced analysis? No! Javac should only >>>> generate straight forward string compares and jumps that is a >>>> relatively >>>> easy pattern for the JVM to recognize as a string switch. Then the JVM >>>> can do the advanced optimizations if and when the code is actually >>>> determined to be a hot spot. >>>> >>>> //Fredrik >>>> >>>> Paul Benedict skrev: >>>> >>>> >>>>> Jon, >>>>> >>>>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons >>>>> wrote: >>>>> >>>>> >>>>> >>>>>> If hell were to freeze over, and String.hashCode were to change in >>>>>> JDK >>>>>> >>>>>> >>>> n, n >>>> >>>> >>>>>>> =8, then javac could emit different code for Strings in switch, >>>>>>> >>>>>>> >>>> depending >>>> >>>> >>>>>> on the value of -target. >>>>>> >>>>>> >>>>>> >>>>> Regarding the state of hell, I don't think a compiler implementation >>>>> should ever rely on such a gamble. The implication is obvious: if JDK >>>>> N makes a change (by Oracle, by some future owner of OpenJDK -- who >>>>> knows what happens 10+ years from now), then class files using the >>>>> OpenJDK de-sugaring would break. The emitted hash results would no >>>>> longer match the runtime hashes and execution would be unpredictable. >>>>> >>>>> To safely emit hash results into byte code, I think you obviously need >>>>> to go the extra stretch and make a ruling on the algorithm never >>>>> changing. Isn't that just simply called being responsible? >>>>> >>>>> Paul >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> > > From Ulf.Zibis at gmx.de Wed Dec 9 09:58:14 2009 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 09 Dec 2009 18:58:14 +0100 Subject: Strings in Switch In-Reply-To: <4B1F9CFB.5060904@gmx.de> References: <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> <4B1F9CFB.5060904@gmx.de> Message-ID: <4B1FE536.2060208@gmx.de> Alternative (correction): public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String)anObject; int n = count; if (n == anotherString.count) { if (hash == 0) hash = -1; // mark 1st invokation of equals() else if (anotherString.hash == 0) anotherString.hash == -1; // mark 1st invokation " // on 2nd invocation now first try hash code comparision: else if (hash() != anotherString.hash()) return false; char v1[] = value; char v2[] = anotherString.value; int i = offset; int j = anotherString.offset; while (n-- != 0) if (v1[i++] != v2[j++]) return false; return true; } } return false; } public int hashCode() { int h = hash; if (h == 0 || ++h == 0) { int off = offset; char val[] = value; int len = count; for (int i = 0; i < len; i++) { h = 31*h + val[off++]; } hash = h; } return h; } -Ulf Am 09.12.2009 13:50, Ulf Zibis schrieb: > +1 > > ... but isn't 'case "Hello2":' superfluous ? I guess it's covered by > 'default:' > > Additionally String#equals(Object object) could be optimized to benefit > from the hash codes: > int equalByHashThreshold = 2; > > public boolean equals(Object anObject) { > if (this == anObject) { > return true; > } > if (anObject instanceof String) { > String anotherString = (String)anObject; > int n = count; > if (n == anotherString.count && > (equalByHashThreshold == 0 || --equalByHashThreshold > == 0) && > (anotherString.equalByHashThreshold == 0 || > --anotherString.equalByHashThreshold == 0) && > hash() == anotherString.hash()) { > char v1[] = value; > char v2[] = anotherString.value; > int i = offset; > int j = anotherString.offset; > while (n-- != 0) { > if (v1[i++] != v2[j++]) > return false; > } > return true; > } > } > return false; > } > > public int hashCode() { > int h = hash; > if (h == 0) { > int off = offset; > char val[] = value; > int len = count; > > for (int i = 0; i < len; i++) { > h = 31*h + val[off++]; > } > hash = h; > equalByHashThreshold = 0; > } > return h; > } > > Alternative: > public boolean equals(Object anObject) { > if (this == anObject) { > return true; > } > if (anObject instanceof String) { > String anotherString = (String)anObject; > int n = count; > if (n == anotherString.count && > hash != 0 && anotherString.hash != 0 && > hash() == anotherString.hash()) { > hash = -1; > anotherString.hash = -1; > char v1[] = value; > char v2[] = anotherString.value; > int i = offset; > int j = anotherString.offset; > while (n-- != 0) { > if (v1[i++] != v2[j++]) > return false; > } > return true; > } > } > return false; > } > > public int hashCode() { > int h = hash; > if (h == 0 || --h == 0) { > int off = offset; > char val[] = value; > int len = count; > > for (int i = 0; i < len; i++) { > h = 31*h + val[off++]; > } > hash = h; > } > return h; > } > > > -Ulf > > > Am 09.12.2009 11:53, Reinier Zwitserloot schrieb: > >> As I understand it, switch-in-strings is handled during the "lower" phase of >> javac, which must desugar the string switch into legal java code. >> >> This makes a series of if/elseif cases actually impossible, due to switch's >> unique behaviour in regards to fall-through.... I think. Let's try this out. >> If we have: >> >> switch(someString) { >> case "Hello1": >> m1(); >> default: >> case "Hello2": >> m2(); >> break; >> case "Hello3": >> m3(); >> } >> >> how should this translate to a series of if statements, in a way that is >> easier than the current nested double switch scenario? I don't really see a >> way. >> >> There's a compromise, where the original string-to-integer conversion is >> done with a series of ifs instead of a switch on hashCode. I don't really >> care about removing the dependency on string's hashCode, but if this is >> simpler, than, by all means. Until there's proof otherwise, I side with >> Fredrik that a switch on hashCodes is not going to have a measurable >> performance impact. As an example, the above would desugar to (with optional >> switch on string's length during string-to-number conversion omitted. That >> may actually be a good idea; it's straight forward and does have an obvious >> performance benefit): >> >> int $unique; >> if ("Hello1".equals(someString)) $unique = 0; >> else if ("Hello2".equals(someString)) $unique = 1; >> else if ("Hello3".equals(someString)) $unique = 2; >> else $unique = 3; >> >> switch ($unique) { >> case 0: >> m1(); >> case 3: >> case 1: >> m2(); >> break; >> case 2: >> m3(); >> } >> >> >> It avoids dependency on string hashcode (which, for the record, I do not >> think needs to be avoided), and it's straightforward and simple for all >> possible forms of string-in-switch that I can think of. >> >> --Reinier Zwitserloot >> >> >> >> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik ?hrstr?m < >> fredrik.ohrstrom at oracle.com> wrote: >> >> >> >>> This discussion reeks of premature optimization.... A tableswitch on >>> arbitrary large numbers (aka hashcodes) must be compiled into a sequence >>> of compares anyway (at least on the x86 platform). If the tableswitch >>> happens on a sequence of relatively consecutive numbers, then the JVM >>> can create a jump table. But for hashcodes, no way! >>> >>> Therefore a sequence of compares that work with the interned string >>> pointers will be faster. If interning is slow (and/or wastes memory) >>> then a sequence of tailored compares that work directly on the >>> characters will be the fastest. For example: >>> >>> switch (s) { >>> case "Hello World" : .... break; >>> case "Hello Wooot" : .... break; >>> default: .... >>> } >>> >>> Could, for example, be compiled into the pseudo-c code: >>> >>> if (s.length == 11) { >>> if (s.chars[8] == L'r' && !wcscmp(s.chars, L"Hello World")) { ...; >>> goto done; } >>> if (s.chars[8] == L'o' && !wcscmp(s.chars, L"Hello Wooot")) { ...; >>> goto done; } >>> } >>> /*default*/ >>> .... >>> done: >>> >>> Now should javac do this advanced analysis? No! Javac should only >>> generate straight forward string compares and jumps that is a relatively >>> easy pattern for the JVM to recognize as a string switch. Then the JVM >>> can do the advanced optimizations if and when the code is actually >>> determined to be a hot spot. >>> >>> //Fredrik >>> >>> Paul Benedict skrev: >>> >>> >>>> Jon, >>>> >>>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons >>>> wrote: >>>> >>>> >>>> >>>>> If hell were to freeze over, and String.hashCode were to change in JDK >>>>> >>>>> >>> n, n >>> >>> >>>>>> =8, then javac could emit different code for Strings in switch, >>>>>> >>>>>> >>> depending >>> >>> >>>>> on the value of -target. >>>>> >>>>> >>>>> >>>> Regarding the state of hell, I don't think a compiler implementation >>>> should ever rely on such a gamble. The implication is obvious: if JDK >>>> N makes a change (by Oracle, by some future owner of OpenJDK -- who >>>> knows what happens 10+ years from now), then class files using the >>>> OpenJDK de-sugaring would break. The emitted hash results would no >>>> longer match the runtime hashes and execution would be unpredictable. >>>> >>>> To safely emit hash results into byte code, I think you obviously need >>>> to go the extra stretch and make a ruling on the algorithm never >>>> changing. Isn't that just simply called being responsible? >>>> >>>> Paul >>>> >>>> >>>> >>>> >>> >>> >> >> > > > From Ulf.Zibis at gmx.de Wed Dec 9 10:29:47 2009 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 09 Dec 2009 19:29:47 +0100 Subject: Strings in Switch In-Reply-To: <15e8b9d20912091005h157044b5heb8d45419db29d59@mail.gmail.com> References: <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> <4B1F9CFB.5060904@gmx.de> <4B1FE536.2060208@gmx.de> <15e8b9d20912091005h157044b5heb8d45419db29d59@mail.gmail.com> Message-ID: <4B1FEC9B.5020509@gmx.de> Yes, but ONLY if hash code is not calculated yet AND equals() is invoked 1st time. I assume, that all interned Strings e.g. constants yet have the hash value precalculated. This is to avoid (1) the little more expensive hash code computation if a short living string is only equated once, and (2) an additional field in each string object to save a kind of threshold marker. The win of adding hash code compare into String#equals() would come to account, if a string would be equated often. -Ulf Am 09.12.2009 19:05, Neal Gafter schrieb: > Do you really want to set the has code of every string in the world to -1? > > On Wed, Dec 9, 2009 at 9:58 AM, Ulf Zibis > wrote: > > public boolean equals(Object anObject) { ... > > if (hash == 0) > hash = -1; // mark 1st invokation of > equals() > else if (anotherString.hash == 0) > anotherString.hash == -1; // mark 1st invokation " > ... } > > public int hashCode() { > int h = hash; > if (h == 0 || ++h == 0) { ... > } > return h; > } > > From Joe.Darcy at Sun.COM Wed Dec 9 10:30:35 2009 From: Joe.Darcy at Sun.COM (Joseph D. Darcy) Date: Wed, 09 Dec 2009 10:30:35 -0800 Subject: Strings in Switch In-Reply-To: <4B1F7EFC.2030601@oracle.com> References: <4B1D4386.6080907@sun.com> <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> Message-ID: <4B1FECCB.6070408@sun.com> Fredrik ?hrstr?m wrote: > This discussion reeks of premature optimization.... A tableswitch on > arbitrary large numbers (aka hashcodes) must be compiled into a sequence > of compares anyway (at least on the x86 platform). If the tableswitch > happens on a sequence of relatively consecutive numbers, then the JVM > can create a jump table. But for hashcodes, no way! > Implementing strings in switch has always been a fertile topic for discussion! The purpose of the synthesized initial switch in the two-switch strings in switch implementation is to create a dense contiguous set of integral jump targets for the second switch that are easy to digest for a JVM. When designing this strings in switch implementation, various factors came into play including minimizing the worst-case behavior in terms of number of character comparisons. Currently the strings being switched on is expected to be traversed at most twice, once to compute the hash code and again to be compared at the hash site. (If there are hash collisions, multiple compares could occur at the hash site.) For a chain of if-equals-else-if-equals chain, the number of expected character comparisons will be likely be higher since when the string being switched is present as a target, on average it would be compared with about half the target strings. Depending on the fraction of strings that have hash codes precomputed, the fraction of switched on strings that are or are not in the target list, and various other properties of the strings being switched on and the strings in the target set, different strings in switch implementations can be driven to have pathological behavior. That said, I believe the current strings in switch implementation is correct and should have acceptable performance. I'd be willing to investigate re-engineering the strings in switch implementation once: 1) A greater number of the Coin features are implemented, specified, and tested. 2) There is some usage of strings in switch to guide the implementation strategy. -Joe From Joe.Darcy at Sun.COM Wed Dec 9 10:35:29 2009 From: Joe.Darcy at Sun.COM (Joseph D. Darcy) Date: Wed, 09 Dec 2009 10:35:29 -0800 Subject: Strings in Switch In-Reply-To: References: <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> Message-ID: <4B1FEDF1.7060606@sun.com> Mark Mahieu wrote: > On 9 Dec 2009, at 10:53, Reinier Zwitserloot wrote: > > >> As I understand it, switch-in-strings is handled during the "lower" phase of >> javac, which must desugar the string switch into legal java code. >> > > Hmm, that's not quite how I understand it. > > I picture the "lower" phase as a bridge between the side of javac which is concerned with the Java Language spec (parsing, flow analysis etc), and the side which deals with the VM spec (ie. bytecode generation). Its existence means that neither side need be unnecessarily complicated by details of the other. > > So, its input is syntax trees which are valid as far as the language spec is concerned, and its output is a simpler set of trees which can be used by the "gen" phase to produce valid JVM classes - but that 'simpler' set is not necessarily an exact subset of the trees used by earlier phases; ie. the output need not be directly representable as valid Java *language* code (synthetics and some uses of "let" expressions for example). > Mark, Your description of Lower is correct. -Joe From Ulf.Zibis at gmx.de Wed Dec 9 11:17:01 2009 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 09 Dec 2009 20:17:01 +0100 Subject: Strings in Switch In-Reply-To: <4B1FC38E.2010702@oracle.com> References: <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> <4B1FC38E.2010702@oracle.com> Message-ID: <4B1FF7AD.50502@gmx.de> Am 09.12.2009 16:34, Fredrik ?hrstr?m schrieb: > .... to optimize > this sequence of compares and it avoids forcing a full calculation of a > hash code that has to traverse the full string. Remember that a string > compare can terminate early, a hashcode calculation cannot. Also a > string compare works on as large blocks as possible per iteration > (8bytes in 64 bit machines, even 16byte blocks with SSE2). Good point. If a compare on a string is rarely, and especially if it's total length is not trivial, the hash code computation should be more expensive, but after some repeated compairs on the same string, the hashcode algorithm would win. Here an enhanced String#equals() implementation, which values the length of every invoked compare on characters: int equalByHashThreshold = count; public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String)anObject; int n = count; if (n == anotherString.count && (equalByHashThreshold > 0 || hash() == anotherString.hash())) { char v1[] = value; char v2[] = anotherString.value; int i = offset; int j = anotherString.offset; while (n-- != 0) if (v1[i++] != v2[j++]) { if (equalByHashThreshold > 0) equalByHashThreshold -= (count - n); return false; } return true; } } return false; } public int hashCode() { int h = hash; if (h == 0) { int off = offset; char val[] = value; int len = count; for (int i = 0; i < len; i++) { h = 31*h + val[off++]; } hash = h; equalByHashThreshold = 0; } return h; } -Ulf From r.spilker at gmail.com Thu Dec 10 00:53:20 2009 From: r.spilker at gmail.com (Roel Spilker) Date: Thu, 10 Dec 2009 09:53:20 +0100 Subject: Strings in Switch In-Reply-To: <4B1FC38E.2010702@oracle.com> References: <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> <4B1FC38E.2010702@oracle.com> Message-ID: Frederik Two things: - We're talking switch statements on String literals here. I don't expect people to use long Strings here. - Performance would only be a problem if you would execute the code very often. Luckily, the hashCode of String is cached. So that should take care of a potential performance hazard. Roel On Wed, Dec 9, 2009 at 4:34 PM, Fredrik ?hrstr?m < fredrik.ohrstrom at oracle.com> wrote: > Yes, this is much better. This is much easier to understand and the > pattern is trivial to catch in the JVM. There are many opportunities > for the compiler (even without strings-in-switch awareness) to optimize > this sequence of compares and it avoids forcing a full calculation of a > hash code that has to traverse the full string. Remember that a string > compare can terminate early, a hashcode calculation cannot. Also a > string compare works on as large blocks as possible per iteration > (8bytes in 64 bit machines, even 16byte blocks with SSE2). If the JVM > decides that it would be beneficial to use its own internal hashcodes to > optimize the code, then it can do so. > > From tball at google.com Thu Dec 10 07:48:18 2009 From: tball at google.com (Tom Ball) Date: Thu, 10 Dec 2009 07:48:18 -0800 Subject: Strings in Switch In-Reply-To: References: <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> Message-ID: On Wed, Dec 9, 2009 at 7:37 AM, Mark Mahieu wrote: > > On 9 Dec 2009, at 10:53, Reinier Zwitserloot wrote: > > > As I understand it, switch-in-strings is handled during the "lower" phase > of > > javac, which must desugar the string switch into legal java code. > > Hmm, that's not quite how I understand it. > > I picture the "lower" phase as a bridge between the side of javac which is > concerned with the Java Language spec (parsing, flow analysis etc), and the > side which deals with the VM spec (ie. bytecode generation). Its existence > means that neither side need be unnecessarily complicated by details of the > other. > > So, its input is syntax trees which are valid as far as the language spec > is concerned, and its output is a simpler set of trees which can be used by > the "gen" phase to produce valid JVM classes - but that 'simpler' set is not > necessarily an exact subset of the trees used by earlier phases; ie. the > output need not be directly representable as valid Java *language* code > (synthetics and some uses of "let" expressions for example). > That's exactly right. It's common when drafting language change specs to show the JVM output as simplified Java, but that's because the writer and reviewers generally know Java much better than JVM byte-code and can thus more easily spot mistakes. It's never been a JVM requirement, however. Josh and I ran into this with the ARM spec, where he was wrestling with what sort of synthetic super type might be needed for some corner cases. It turns out that no new types were needed, as the previous compiler phases had already verified the code was typesafe. So all the code needed to do (from the JVM requirements) was to directly use the resource variable, without it needing to be recast or incur any risk of a runtime type exception. Tom From pbenedict at apache.org Wed Dec 16 10:29:09 2009 From: pbenedict at apache.org (Paul Benedict) Date: Wed, 16 Dec 2009 12:29:09 -0600 Subject: Strings in Switch .. and classes Message-ID: It occurred to me that if hashCode() really is an acceptable and fool-proof (yet to be convinced) way to implement strings in switch, this potentially opens a further enhancement to switch on Class. Perhaps in JDK 8, we could convert many instanceof checks into label cases. Class c = o.getClass(); if (c instanceof String) { .. } else if (c instanceof Integer) { ..} else if (c instanceof Date) { .. } Can be de-sugared into the new String switch: switch (object.getClass().getName()) { case "java.lang.String": case "java.lang.Integer": case "java.util.Date": } } PS: Since instanceof evaluates null to false, as like any other switch statement, an if-check should still be done in front lest an NPE occurs. Paul From Thomas.Hawtin at Sun.COM Wed Dec 16 10:37:05 2009 From: Thomas.Hawtin at Sun.COM (Tom Hawtin) Date: Wed, 16 Dec 2009 18:37:05 +0000 Subject: Strings in Switch .. and classes In-Reply-To: References: Message-ID: <4B2928D1.5080008@sun.com> Paul Benedict wrote: > Class c = o.getClass(); > if (c instanceof String) { .. } > Can be de-sugared into the new String switch: > > switch (object.getClass().getName()) { > case "java.lang.String": Class names are not unique. Tom Hawtin From pbenedict at apache.org Wed Dec 16 10:51:39 2009 From: pbenedict at apache.org (Paul Benedict) Date: Wed, 16 Dec 2009 12:51:39 -0600 Subject: Strings in Switch .. and classes In-Reply-To: <4B2928D1.5080008@sun.com> References: <4B2928D1.5080008@sun.com> Message-ID: Tom, On Wed, Dec 16, 2009 at 12:37 PM, Tom Hawtin wrote: > Paul Benedict wrote: > >> Class c = o.getClass(); >> if (c instanceof String) { ?.. } > >> Can be de-sugared into the new String switch: >> >> switch (object.getClass().getName()) { >> ?case "java.lang.String": > > Class names are not unique. > Can you expound on this some more? I am surprised, so I want to hear more about it. I thought these two are equivalent, no? boolean x = c instanceof String; boolean y = c.getName().equals("java.lang.String"); Paul From mthornton at optrak.co.uk Wed Dec 16 11:28:06 2009 From: mthornton at optrak.co.uk (Mark Thornton) Date: Wed, 16 Dec 2009 19:28:06 +0000 Subject: Strings in Switch .. and classes In-Reply-To: References: <4B2928D1.5080008@sun.com> Message-ID: <4B2934C6.5030202@optrak.co.uk> Paul Benedict wrote: > Tom, > > On Wed, Dec 16, 2009 at 12:37 PM, Tom Hawtin wrote: > >> Paul Benedict wrote: >> >> >>> Class c = o.getClass(); >>> if (c instanceof String) { .. } >>> >>> Can be de-sugared into the new String switch: >>> >>> switch (object.getClass().getName()) { >>> case "java.lang.String": >>> >> Class names are not unique. >> >> > > Can you expound on this some more? I am surprised, so I want to hear > more about it. I thought these two are equivalent, no? > > boolean x = c instanceof String; > boolean y = c.getName().equals("java.lang.String"); > > Paul > > Classes of the same name can be loaded by different ClassLoader's, the full identity of a Class is a pair of (ClassLoader, classname). This shouldn't happen with classes like String which are part of the base platform. Mark Thornton From Jonathan.Gibbons at Sun.COM Wed Dec 16 12:17:16 2009 From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons) Date: Wed, 16 Dec 2009 12:17:16 -0800 Subject: Strings in Switch .. and classes In-Reply-To: References: Message-ID: <4B29404C.1080609@sun.com> Paul Benedict wrote: > It occurred to me that if hashCode() really is an acceptable and > fool-proof (yet to be convinced) way to implement strings in switch, > this potentially opens a further enhancement to switch on Class. > Perhaps in JDK 8, we could convert many instanceof checks into label > cases. > > Class c = o.getClass(); > if (c instanceof String) { .. } > else if (c instanceof Integer) { ..} > else if (c instanceof Date) { .. } > > Can be de-sugared into the new String switch: > > switch (object.getClass().getName()) { > case "java.lang.String": > case "java.lang.Integer": > case "java.util.Date": > } > } > > PS: Since instanceof evaluates null to false, as like any other switch > statement, an if-check should still be done in front lest an NPE > occurs. > > Paul > > Setting aside issues of class identity, the proposed desugaring does not take the use of subtypes into account. -- Jon From pbenedict at apache.org Wed Dec 16 12:25:15 2009 From: pbenedict at apache.org (Paul Benedict) Date: Wed, 16 Dec 2009 14:25:15 -0600 Subject: Strings in Switch .. and classes In-Reply-To: <4B29404C.1080609@sun.com> References: <4B29404C.1080609@sun.com> Message-ID: Jon, you are correct. My email has some glaring holes. After all, it was just a quick thought, but definitely needs more coherence. Thanks. Paul On Wed, Dec 16, 2009 at 2:17 PM, Jonathan Gibbons wrote: > Paul Benedict wrote: >> >> It occurred to me that if hashCode() really is an acceptable and >> fool-proof (yet to be convinced) way to implement strings in switch, >> this potentially opens a further enhancement to switch on Class. >> Perhaps in JDK 8, we could convert many instanceof checks into label >> cases. >> >> Class c = o.getClass(); >> if (c instanceof String) { ?.. } >> else if (c instanceof Integer) { ..} >> else if (c instanceof Date) { .. } >> >> Can be de-sugared into the new String switch: >> >> switch (object.getClass().getName()) { >> ?case "java.lang.String": >> ?case "java.lang.Integer": >> ?case "java.util.Date": >> ?} >> } >> >> PS: Since instanceof evaluates null to false, as like any other switch >> statement, an if-check should still be done in front lest an NPE >> occurs. >> >> Paul >> >> > > > Setting aside issues of class identity, the proposed desugaring does not > take the use of subtypes into account. > > -- Jon > From matthew at matthewadams.me Wed Dec 16 23:16:26 2009 From: matthew at matthewadams.me (Matthew Adams) Date: Wed, 16 Dec 2009 23:16:26 -0800 Subject: Too late compile-time type checked reflection syntax sugar proposal? Message-ID: <1ba389ce0912162316t6d2d026jd21685f9f3cd13ba@mail.gmail.com> Hi all, Do the powers that be consider it too late to consider a new proposal for JDK 7? Here it is in ultrashort form, albeit not completely thought through. I'm proposing it just to kick it around. Proposal: Compile-time type-checked and existence-checked reflection syntax Description: Introduce a new, double-dot operator ".." to act as syntax sugar for accessing reflection information with type & existence checking at compile time. Concept: The double-dot operator, meaning "get metamodel artifact", allows for much more concise reflective access to things you know about at build time but must use reflection for some reason. Trust me, it happens plenty. The choice of ".." for the operator was that first, ".." doesn't introduce a new keyword, and second, in filesystems, ".." usually means "go up a level", which is essentially what we're doing: going up a level from model to metamodel. Looking at the examples, you can see how much less code it is compared to the reflection-based equivalent, plus if it's typesafe, you get fewer errors when you're depending on type safety -- that is, at least you knew at compile time that things were all good. It still doesn't mean anything at runtime, and you could get NoSuchMethodException, etc. Examples: 1. Get the Field object for the field named "bar" in class Foo: Field bar = Foo..bar; // current way Field bar = Foo.class.getDeclaredField("bar"); 2. Get the Method object for the method with signature "myMethod(int a, String y)" defined on class Goo: Method m = Goo..myMethod(int,String); // note scope & return type don't matter // current way Method m = Goo.class.getDeclaredMethod("myMethod", new Class[int.class, String.class] {}); 3. Get the Class object for the class Snafu. This is an interesting case that offers backward compatibility: Class c = Snafu..class; // exactly the same as Snafu.class, the ".." operator's insipiration!! 4. Get the @Foo annotation on the Bar class: Annotation foo = Bar.. at Foo; // current way Annotation foo = Bar.class.getAnnotation(Foo.class); 5. Get the @Foo annotation on the field named "blah" in the class Gorp: Annotation foo = Gorp..blah.. at Foo; // current way Annotation foo = Gorp.class.getDeclaredField("blah").getAnnotation(Foo.class); 6. Get the @Foo annotation on the second parameter of the method "start(int x, @Foo int y, int z)" defined in class Startable: Annotation foo = Startable..start(int,int.. at Foo,int); // current way -- no error checking Annotation[] anns = Startable.class.getMethod("start", new Class[] { int.class, int.class, int.class }).getParameterAnnotations()[1]; Annotation foo = null; for (Annotation ann : anns) { if (ann.getClass().equals(Foo.class)) { foo = ann; break; // got it } } // foo is either null or a reference to the @Foo annotation instance on the second parameter of the method 7. Get all of the @Foo annotations on all of the parameters of the methods "start(@Foo int x, int y, @Foo int z)" defined in class Startable: Annotation[] foo = Startable..start(int.. at Foo,int.. at Foo,int.. at Foo); // returns an array with the first @Foo, null, then the last @Foo // current way left as an exercise to the reader :) 8. Get the @Foo annotation on the "@Foo start(int x, int y, int z)" method defined in class Startable: Annotation foo = Startable..start(int,int,int).. at Foo; // current way Annotation foo = Startable.class.getDeclaredMethod("start", new Class[] { int.class, int.class, int.class }).getAnnotation(Foo.class); Motivation: The double-dot operator would allow for compile-time type-checked reflective operations, like those in the persistence APIs. For example, in JPA: @Entity public class Department { @OneToMany(mappedBy = "department") // note string Set employees; //... } becomes @Entity public class Department { @OneToMany(mappedBy = Employee..department) // checked at compile time Set employees; //... } It also is beneficial in many other areas. Use your imagination! I can't think of many more (it's late), but Criteria queries come to mind... WDYT? -matthew -- mailto:matthew at matthewadams.me skype:matthewadams12 yahoo:matthewadams aol:matthewadams12 google-talk:matthewadams12 at gmail.com msn:matthew at matthewadams.me http://matthewadams.me http://www.linkedin.com/in/matthewadams From Joe.Darcy at Sun.COM Wed Dec 16 23:29:24 2009 From: Joe.Darcy at Sun.COM (Joseph D. Darcy) Date: Wed, 16 Dec 2009 23:29:24 -0800 Subject: Too late compile-time type checked reflection syntax sugar proposal? In-Reply-To: <1ba389ce0912162316t6d2d026jd21685f9f3cd13ba@mail.gmail.com> References: <1ba389ce0912162316t6d2d026jd21685f9f3cd13ba@mail.gmail.com> Message-ID: <4B29DDD4.6080802@sun.com> Matthew Adams wrote: > Hi all, > > Do the powers that be consider it too late to consider a new proposal > for JDK 7? Yes. Regards, -Joe From Ulf.Zibis at gmx.de Thu Dec 17 02:01:44 2009 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 17 Dec 2009 11:01:44 +0100 Subject: Strings in Switch .. and classes In-Reply-To: References: Message-ID: <4B2A0188.802@gmx.de> See also threads: switch (...) instanceof feature --- 2009-03-30 Extend switch .. case statement for Object types and simple expressions --- 2009-03-30 Strings in switch --- 2009-03-30 Extend switch .. case statement for all types and simple expressions (update) --- 2009-03-31 JCK feedback on "Strings in Switch" proposal --- 2009-05-24 -Ulf Paul Benedict schrieb: > It occurred to me that if hashCode() really is an acceptable and > fool-proof (yet to be convinced) way to implement strings in switch, > this potentially opens a further enhancement to switch on Class. > Perhaps in JDK 8, we could convert many instanceof checks into label > cases. > > Class c = o.getClass(); > if (c instanceof String) { .. } > else if (c instanceof Integer) { ..} > else if (c instanceof Date) { .. } > > Can be de-sugared into the new String switch: > > switch (object.getClass().getName()) { > case "java.lang.String": > case "java.lang.Integer": > case "java.util.Date": > } > } > > PS: Since instanceof evaluates null to false, as like any other switch > statement, an if-check should still be done in front lest an NPE > occurs. > > Paul > > From jimmyuniversal at yahoo.com Thu Dec 17 14:14:48 2009 From: jimmyuniversal at yahoo.com (James Arlow) Date: Thu, 17 Dec 2009 14:14:48 -0800 (PST) Subject: Benefit from computing String Hash at compile time? Message-ID: <201502.54372.qm@web57708.mail.re3.yahoo.com> I'm not exactly up to date, but even reading entries from December, there are talks about computing the string hash at compile time. This seems like a bad idea to me when looking towards future compatibility. For the sake of posterity, it would make more sense to store the strings as literals in the class file, and then compute the hash during the class loading process. The amount of processing at run-time would be negligible, and it would eliminate the possibility of errors creeping up from an "improved" or non-standard hash function. While the "improved" case seems unlikely, it would prevent whole sections of code from breaking simply because a third party JVM introduced an accidental error into the hash process. I think everyone can agree its best to not risk cutting off open options unless there is a critical performance penalty that would be addressed by doing so, so for a function that is called once per switch option and only when the class is loaded, I think its safe to forget about compile time hashes altogether. If people are really worried about performance, then the best option would be to offer two ways to compile the class, one with string literals, for compatibility, and one with Java standard hash values, for performance. From opinali at gmail.com Fri Dec 18 09:38:41 2009 From: opinali at gmail.com (Osvaldo Doederlein) Date: Fri, 18 Dec 2009 15:38:41 -0200 Subject: Benefit from computing String Hash at compile time? In-Reply-To: <201502.54372.qm@web57708.mail.re3.yahoo.com> References: <201502.54372.qm@web57708.mail.re3.yahoo.com> Message-ID: I believe the String hashcode computation could be performed eagerly, piggy-backing on the (already significant complex) copying and/or decoding code used by its constructors. For every character, first to last, produced/added to the this.value array, we also update the hashcode: h = 31*h + character => trivial enough to be a basically "free" addition to the existing loops. Some details: - For large strings, we gain a lot because we don't have to visit all the characters again (posibly when they are not anymore in the CPU cache) when the hashcode is first computed (perhaps much later than construction). - The current algorithms reuses the hashcode 0 to mean "not computed", so it will recopute everything again for strings that just happen to produce 0 with the current formula. Eager computation avoids this risk, however small. - Some constructors (remarkably for substring and cloning) rely on Arrays.copyOfRange(), which implementation is more efficient than any Java loop (I guess it's a HotSpot intrinsic with optimization for alignment etc.). In that case, using an explicit loop so we can smuggle the hashcode calculation inside it, will probably have a measurable disadvantage. But this disadvantage is only for construction (and then only for large strings); for strings that are ever hashed, the net saving will always be still positive. - Eager computation allows to declare hash as final, which may have some performance benefic, e.g. for caching in registers. - Eager computation allows hashCode() to be a trivial getter without any branch. - The hashcode function can be factored into a private static method, e.g. int incHash(int currhash), so this tiny algorthm must not be repeated in a dozen constructors; that method will be trivial to inline so there's no cost either for compiled code. - Admittedly, for interpreted code there are higher disadvantages in the constructor; but then, I expect most String constructors to appear as the first methods to be optimized by the JIT - they are just BURNING "hot". - If we have eager computation, I think it's not worthy caching the hashcode of literal Strings in the Constant Pool; this requires changing the CP spec and the classfiles will be 4 bytes bigger for every String literal - a lot of extra bytes considering how many Strings we typically have (including all Strings from CP symbols). Still, javac could use the "well-known" hashcode for special needs like strings-in-switch; other optimizations could be used more aggressively (precomputed hashtables for huge static symbol tables...). Let's face it, the String hashcode algorithm has changed in the early days, but it will never change again. A+ Osvaldo 2009/12/17 James Arlow > I'm not exactly up to date, but even reading entries from December, there > are talks about computing the string hash at compile time. This seems like > a bad idea to me when looking towards future compatibility. > > For the sake of posterity, it would make more sense to store the strings as > literals in the class file, and then compute the hash during the class > loading process. The amount of processing at run-time would be negligible, > and it would eliminate the possibility of errors creeping up from an > "improved" or non-standard hash function. > > While the "improved" case seems unlikely, it would prevent whole sections > of code from breaking simply because a third party JVM introduced an > accidental error into the hash process. > > I think everyone can agree its best to not risk cutting off open options > unless there is a critical performance penalty that would be addressed by > doing so, so for a function that is called once per switch option and only > when the class is loaded, I think its safe to forget about compile time > hashes altogether. > > If people are really worried about performance, then the best option would > be to offer two ways to compile the class, one with string literals, for > compatibility, and one with Java standard hash values, for performance. > > > > > > From pbenedict at apache.org Fri Dec 18 14:29:01 2009 From: pbenedict at apache.org (Paul Benedict) Date: Fri, 18 Dec 2009 16:29:01 -0600 Subject: Benefit from computing String Hash at compile time? Message-ID: James, I concur with your thoughts. It's a risky decision to embed the hash code into the class file. I can't imagine any other JDK implementation would attempt this, but perhaps Sun can bet on some things that others cannot. Regardless, some prominent people disagreed, but I don't think it changes reality. Either the hash code should forever be made what it is -- and why couldn't that be done? -- or have an alternate implementation. I really like your idea of storing the Strings in the class file and computing their hash when the class loads. Paul From abies at adres.pl Fri Dec 18 15:01:43 2009 From: abies at adres.pl (Artur Biesiadowski) Date: Sat, 19 Dec 2009 00:01:43 +0100 Subject: Benefit from computing String Hash at compile time? In-Reply-To: References: <201502.54372.qm@web57708.mail.re3.yahoo.com> Message-ID: <4B2C09D7.10405@adres.pl> Osvaldo Doederlein wrote: > - Some constructors (remarkably for substring and cloning) rely on > Arrays.copyOfRange(), which implementation is more efficient than any Java > loop (I guess it's a HotSpot intrinsic with optimization for alignment > etc.). In that case, using an explicit loop so we can smuggle the hashcode > calculation inside it, will probably have a measurable disadvantage. But > this disadvantage is only for construction (and then only for large > strings); for strings that are ever hashed, the net saving will always be > still positive. Especially in case of substring, optimized private constructor is used, which just does 3 assignments. With your idea, it would have to iterate over all elements. This is quite common operation. I wonder if there is anything (some Hotspot intrinsic?) preventing quick hack on java.lang.String, putting it in bootclasspath/a and measuring time of javac few thousands source files, reindexing huge lucene data and maybe hsql on some test database. It should at least give a rough figure if it changes the speed in any measurable way. Regards, Artur Biesiadowski From Ulf.Zibis at gmx.de Fri Dec 18 17:09:34 2009 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sat, 19 Dec 2009 02:09:34 +0100 Subject: Benefit from computing String Hash at compile time? In-Reply-To: References: Message-ID: <4B2C27CE.1080504@gmx.de> +1 Am 18.12.2009 23:29, Paul Benedict schrieb: > James, > > I concur with your thoughts. It's a risky decision to embed the hash > code into the class file. I can't imagine any other JDK implementation > would attempt this, but perhaps Sun can bet on some things that others > cannot. Regardless, some prominent people disagreed, but I don't think > it changes reality. Either the hash code should forever be made what > it is -- and why couldn't that be done? -- or have an alternate > implementation. I really like your idea of storing the Strings in the > class file and computing their hash when the class loads. > > Paul > > > From pbenedict at apache.org Fri Dec 18 17:42:27 2009 From: pbenedict at apache.org (Paul Benedict) Date: Fri, 18 Dec 2009 19:42:27 -0600 Subject: Benefit from computing String Hash at compile time? In-Reply-To: <560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com> References: <560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com> Message-ID: Reinier, Thank you for your reply. On Fri, Dec 18, 2009 at 5:04 PM, Reinier Zwitserloot wrote: > String.hashCode() has _already_ been defined as unchanging and set in stone. > We could do so again, if it assuages recently stated fears, though I'm not > sure what this would accomplish. It's right here: > http://java.sun.com/javase/6/docs/api/java/lang/String.html#hashCode() I hope to make some things clear: My objection relies solely on the fact that it is not "set in stone". If I remember correctly, Joe had to do research if the API ever changed (not since at least 1.2). Neither Joe, Jonathan, and Josh (people well respected) have claimed what you are claiming. The highest assurance given is that it's "highly unlikely" and only if "hell freezes over". . Now I grant the fact it's highly unlikely. I buy off on that. The odds are hashCode() is not going to change. I also have no philosophical problems with emitting the value from String.hashCode() into class files. However, I believe the manufacturer of a JDK should have *absolute certainty* when making this decision. It's pretty clear to me this certainty is high, but not absolute. And since OpenJDK is made by Sun, the bearer of Java, if it is good for them, it's good for everyone. Follow the leader. Once this decision is made, I assert String.hashCode() will have to be "set in stone" but only because of Project Coin and Sun's influence, not the API. Paul From opinali at gmail.com Sat Dec 19 04:53:23 2009 From: opinali at gmail.com (Osvaldo Pinali Doederlein) Date: Sat, 19 Dec 2009 10:53:23 -0200 Subject: Benefit from computing String Hash at compile time? In-Reply-To: <4B2C09D7.10405@adres.pl> References: <201502.54372.qm@web57708.mail.re3.yahoo.com> <4B2C09D7.10405@adres.pl> Message-ID: <4B2CCCC3.1090401@gmail.com> Em 18/12/2009 21:01, Artur Biesiadowski escreveu: > Osvaldo Doederlein wrote: > >> - Some constructors (remarkably for substring and cloning) rely on >> Arrays.copyOfRange(), which implementation is more efficient than any Java >> loop (I guess it's a HotSpot intrinsic with optimization for alignment >> etc.). In that case, using an explicit loop so we can smuggle the hashcode >> calculation inside it, will probably have a measurable disadvantage. But >> this disadvantage is only for construction (and then only for large >> strings); for strings that are ever hashed, the net saving will always be >> still positive. >> > Especially in case of substring, optimized private constructor is used, > which just does 3 assignments. With your idea, it would have to iterate > over all elements. This is quite common operation. > The short answer: you are right, that's an important special case, remarkably in methods using several temporary strings (often substrings of some previous string) because temp strings are virtually never hashed; and the sharing of String.value is critical to String's immutable design. But this only means that eager computation of the hashcode is not always a good idea - so, perhaps we can do that eagerly in all/most constructors that create a new String.value; or more generally, in any constructor where this extra computation is proved to not produce any significant performance degradation. For other constructors, we just keep String.hash initialized with 0, so the current hashCode() is kept unchanged and will calculate the value if necessary. The long answer, I'm coding a prototype impl of this optimization in some constructors so I can benchmark this and see if it's worth the trouble. As usual it's better telling the code to do all the talking. > I wonder if there is anything (some Hotspot intrinsic?) preventing quick > hack on java.lang.String, putting it in bootclasspath/a and measuring > time of javac few thousands source files, reindexing huge lucene data > and maybe hsql on some test database. It should at least give a rough > figure if it changes the speed in any measurable way. > HotSpot is doing some intrinsic tricks for String/StringBuilder (IIRC) in recent JDK7 build, but I didn't check these changesets... but I don't think it would affect such benchmarking, if we don't change the data layout (fields). A+ Osvaldo From opinali at gmail.com Sat Dec 19 05:24:37 2009 From: opinali at gmail.com (Osvaldo Pinali Doederlein) Date: Sat, 19 Dec 2009 11:24:37 -0200 Subject: Benefit from computing String Hash at compile time? In-Reply-To: References: <560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com> Message-ID: <4B2CD415.5070200@gmail.com> Em 18/12/2009 23:42, Paul Benedict escreveu: > On Fri, Dec 18, 2009 at 5:04 PM, Reinier Zwitserloot > wrote: > >> String.hashCode() has _already_ been defined as unchanging and set in stone. >> We could do so again, if it assuages recently stated fears, though I'm not >> sure what this would accomplish. It's right here: >> http://java.sun.com/javase/6/docs/api/java/lang/String.html#hashCode() >> > I hope to make some things clear: > > My objection relies solely on the fact that it is not "set in stone". > If I remember correctly, Joe had to do research if the API ever > changed (not since at least 1.2). Neither Joe, Jonathan, and Josh > (people well respected) have claimed what you are claiming. The > highest assurance given is that it's "highly unlikely" and only if > "hell freezes over". . > > Now I grant the fact it's highly unlikely. I buy off on that. The odds > are hashCode() is not going to change. I also have no philosophical > problems with emitting the value from String.hashCode() into class > files. However, I believe the manufacturer of a JDK should have > *absolute certainty* when making this decision. It's pretty clear to > me this certainty is high, but not absolute. And since OpenJDK is made > by Sun, the bearer of Java, if it is good for them, it's good for > everyone. Follow the leader. Once this decision is made, I assert > String.hashCode() will have to be "set in stone" but only because of > Project Coin and Sun's influence, not the API. > The hashcode algorithm has changed only once, in 1.2, I checked it too. And yes, there's not formal guarantee yet that it won't change again; but all it takes is a single line of javadoc, stating that the algorithm - which is _already documented and contractual_ at least since 1.2 - won't ever change again. Even independent cleanroom implementations (I checked GNU Classpath), use the same algorithm. A+ Osvaldo From reinier at zwitserloot.com Sat Dec 19 08:43:07 2009 From: reinier at zwitserloot.com (Reinier Zwitserloot) Date: Sat, 19 Dec 2009 17:43:07 +0100 Subject: Benefit from computing String Hash at compile time? In-Reply-To: <4B2CD415.5070200@gmail.com> References: <560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com> <4B2CD415.5070200@gmail.com> Message-ID: <560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com> I don't really understand the thread of conversation here. The fact that the algorithm is explained in the javadoc means that it is part of the java spec. There is no need to explain that it can't ever change; that notion is already inherent in the fact that the algorithm is explained in the javadoc. The mistake seems to be in what 'part of the java spec' means. It does not actually mean: Cannot possibly change. Like everything else in java that is frozen, pragmatic issues trump backwards compatibility. String.hashCode was changed in 1.2, as was explained earlier, because of a conflict in the JVM spec and the javadoc, as well as a really stupid algorithm in the JVM spec. Pragmatism won out. An analysis was made of the impact, and the analysis resulted in the decision to change it, even though that wasn't, technically, backwards compatible. That was then. At this point in time, it most definitely will not be changing ever again, regardless of whether string-in-switch has a dependency on String.hashCode's implementation. The javadoc of String should also not be bogged down with the implementation detail that string-on-switch is dependent on it. Implementation details have no place in javadoc. --Reinier Zwitserloot On Sat, Dec 19, 2009 at 2:24 PM, Osvaldo Pinali Doederlein < opinali at gmail.com> wrote: > Em 18/12/2009 23:42, Paul Benedict escreveu: > > On Fri, Dec 18, 2009 at 5:04 PM, Reinier Zwitserloot >> wrote: >> >> >>> String.hashCode() has _already_ been defined as unchanging and set in >>> stone. >>> We could do so again, if it assuages recently stated fears, though I'm >>> not >>> sure what this would accomplish. It's right here: >>> http://java.sun.com/javase/6/docs/api/java/lang/String.html#hashCode() >>> >>> >> I hope to make some things clear: >> >> My objection relies solely on the fact that it is not "set in stone". >> If I remember correctly, Joe had to do research if the API ever >> changed (not since at least 1.2). Neither Joe, Jonathan, and Josh >> (people well respected) have claimed what you are claiming. The >> highest assurance given is that it's "highly unlikely" and only if >> "hell freezes over". . >> >> Now I grant the fact it's highly unlikely. I buy off on that. The odds >> are hashCode() is not going to change. I also have no philosophical >> problems with emitting the value from String.hashCode() into class >> files. However, I believe the manufacturer of a JDK should have >> *absolute certainty* when making this decision. It's pretty clear to >> me this certainty is high, but not absolute. And since OpenJDK is made >> by Sun, the bearer of Java, if it is good for them, it's good for >> everyone. Follow the leader. Once this decision is made, I assert >> String.hashCode() will have to be "set in stone" but only because of >> Project Coin and Sun's influence, not the API. >> >> > > The hashcode algorithm has changed only once, in 1.2, I checked it too. And > yes, there's not formal guarantee yet that it won't change again; but all it > takes is a single line of javadoc, stating that the algorithm - which is > _already documented and contractual_ at least since 1.2 - won't ever change > again. Even independent cleanroom implementations (I checked GNU Classpath), > use the same algorithm. > > A+ > Osvaldo > From pbenedict at apache.org Sat Dec 19 09:28:31 2009 From: pbenedict at apache.org (Paul Benedict) Date: Sat, 19 Dec 2009 11:28:31 -0600 Subject: Benefit from computing String Hash at compile time? In-Reply-To: <560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com> References: <560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com> <4B2CD415.5070200@gmail.com> <560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com> Message-ID: Reinier, > There is no need to explain that it can't ever change; that notion is > already inherent in the fact that the algorithm is explained in the javadoc. > The mistake seems to be in what 'part of the java spec' means. It does not > actually mean: Cannot possibly change. The algorithm is explained. The documentation is good, isn't it? It is, however, the documentation is for that version of the Java platform. > The javadoc of String should also not be bogged down with the implementation > detail that string-on-switch is dependent on it. Implementation details have > no place in javadoc. I agree with you. No one has to reveal implementation details. All that is necessary is a note that the algorithm must not change from JDK version to JDK version. Moving on... If anyone at Sun is still listening (::grins::), I prefer to emit a static method that contains a duplicate of the hashCode() algorithm. Then, no one has to worry about JDK version upgrades and String.hashCode() is free for future tweaking. static int $switch_hashCode(String s) {... } switch ($switch_hashCode(s)) { ... } Paul From opinali at gmail.com Sat Dec 19 11:06:48 2009 From: opinali at gmail.com (Osvaldo Pinali Doederlein) Date: Sat, 19 Dec 2009 17:06:48 -0200 Subject: Benefit from computing String Hash at compile time? In-Reply-To: References: <560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com> <4B2CD415.5070200@gmail.com> <560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com> Message-ID: <4B2D2448.4090505@gmail.com> Em 19/12/2009 15:28, Paul Benedict escreveu: > Moving on... > If anyone at Sun is still listening (::grins::), I prefer to emit a > static method that contains a duplicate of the hashCode() algorithm. > Then, no one has to worry about JDK version upgrades and > String.hashCode() is free for future tweaking. > > static int $switch_hashCode(String s) {... } > switch ($switch_hashCode(s)) { > ... > } > > Paul > This is wasteful, first because strings used in switch statements and also in hashed collections will be hashed twice; second (and much more important), every execution switch(str) needs to call your special hashcode function again for str, as this hashcode cannot be cached in str. This extra cost makes switch-on-string O(N) on the str.length, which makes the hashing compilation strategy pointless. A+ Osvaldo From pbenedict at apache.org Sat Dec 19 16:30:54 2009 From: pbenedict at apache.org (Paul Benedict) Date: Sat, 19 Dec 2009 18:30:54 -0600 Subject: Benefit from computing String Hash at compile time? In-Reply-To: <4B2D2448.4090505@gmail.com> References: <560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com> <4B2CD415.5070200@gmail.com> <560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com> <4B2D2448.4090505@gmail.com> Message-ID: > This is wasteful, first because strings used in switch statements and also > in hashed collections will be hashed twice; second (and much more > important), every execution switch(str) needs to call your special hashcode > function again for str, as this hashcode cannot be cached in str. This extra > cost makes switch-on-string O(N) on the str.length, which makes the hashing > compilation strategy pointless. You make a good point. As for being "hashed twice", that's simply the cost of removing the reliance on String.hashCode(). Well, couldn't $switch_hashCode() perform some caching for itself? Paul From reinier at zwitserloot.com Sun Dec 20 02:10:23 2009 From: reinier at zwitserloot.com (Reinier Zwitserloot) Date: Sun, 20 Dec 2009 11:10:23 +0100 Subject: Benefit from computing String Hash at compile time? In-Reply-To: References: <560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com> <4B2CD415.5070200@gmail.com> <560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com> <4B2D2448.4090505@gmail.com> Message-ID: <560fb5ed0912200210s660cbbe6u867926b09f08e0c3@mail.gmail.com> Huh? All you need to do is this: add a method to java.lang.String with the signature: public synthetic int switchCode() { return hashCode(); } Why would this cause performance issues? --Reinier Zwitserloot On Sun, Dec 20, 2009 at 1:30 AM, Paul Benedict wrote: > > This is wasteful, first because strings used in switch statements and > also > > in hashed collections will be hashed twice; second (and much more > > important), every execution switch(str) needs to call your special > hashcode > > function again for str, as this hashcode cannot be cached in str. This > extra > > cost makes switch-on-string O(N) on the str.length, which makes the > hashing > > compilation strategy pointless. > > You make a good point. As for being "hashed twice", that's simply the > cost of removing the reliance on String.hashCode(). Well, couldn't > $switch_hashCode() perform some caching for itself? > > Paul > > From opinali at gmail.com Sun Dec 20 04:16:31 2009 From: opinali at gmail.com (Osvaldo Pinali Doederlein) Date: Sun, 20 Dec 2009 10:16:31 -0200 Subject: Benefit from computing String Hash at compile time? In-Reply-To: <560fb5ed0912200210s660cbbe6u867926b09f08e0c3@mail.gmail.com> References: <560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com> <4B2CD415.5070200@gmail.com> <560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com> <4B2D2448.4090505@gmail.com> <560fb5ed0912200210s660cbbe6u867926b09f08e0c3@mail.gmail.com> Message-ID: <4B2E159F.2090604@gmail.com> Em 20/12/2009 08:10, Reinier Zwitserloot escreveu: > Huh? > > All you need to do is this: > > add a method to java.lang.String with the signature: > > public synthetic int switchCode() { > return hashCode(); > } > > > Why would this cause performance issues? This is a good idea, as long as 1) that implementation doesn't change. If it ever needs to change, we're again in hell, either having to recompute the switch-hash at every call, or wasting an extra int field in the String object to cache this secondary hashcode (that would be horribly wasteful because String is the single most popular object in most Java heaps, and only a miserably tiny fraction of all strings would ever be used in switchs). 2) we don't bother to tight-couple java.lang.String (remarkably with a public method) to the switch statement, which is a concern for some posters. So I still think it is pointless. I say, just put in String.hashCode() "...and this algorithm is cast in stone, forever and ever until JDK +Infinity", and use it directly from switch, move on. A+ Osvaldo > > --Reinier Zwitserloot > > > On Sun, Dec 20, 2009 at 1:30 AM, Paul Benedict > wrote: > > > This is wasteful, first because strings used in switch > statements and also > > in hashed collections will be hashed twice; second (and much more > > important), every execution switch(str) needs to call your > special hashcode > > function again for str, as this hashcode cannot be cached in > str. This extra > > cost makes switch-on-string O(N) on the str.length, which makes > the hashing > > compilation strategy pointless. > > You make a good point. As for being "hashed twice", that's simply the > cost of removing the reliance on String.hashCode(). Well, couldn't > $switch_hashCode() perform some caching for itself? > > Paul > > From opinali at gmail.com Sun Dec 20 09:50:10 2009 From: opinali at gmail.com (Osvaldo Pinali Doederlein) Date: Sun, 20 Dec 2009 15:50:10 -0200 Subject: Benefit from computing String Hash at compile time? In-Reply-To: <4B2CCCC3.1090401@gmail.com> References: <201502.54372.qm@web57708.mail.re3.yahoo.com> <4B2C09D7.10405@adres.pl> <4B2CCCC3.1090401@gmail.com> Message-ID: <4B2E63D2.2070407@gmail.com> Hi, > The long answer, I'm coding a prototype impl of this optimization in > some constructors so I can benchmark this and see if it's worth the > trouble. As usual it's better telling the code to do all the talking. Result of this benchmark, for JDK 7b78, creating a 500-char string: new String(char value[]): Normal = (server) 559ns, (client) 520ns; hashCode() = (server) 1046ms, (client) 1182ms Eager hashing = (server) 1704ns, (client) 1609ns This first test was certainly a disaster, but that was expected - one of the constructors that rely on native/intrinsic methods like copyOf(), copyOfRange(), arraycopy(). Even adding the construction to hash times don't show any advantage for eager hashing. new String(int[] codePoints, int offset, int count): Normal = (server) 3277ns, (client) 3342ns Eager hashing = (server) 3397ns, (client) 3610ns This one is much better; we can see some performance degradation with the eager hashing (+3,66% for server and +8,01% for client), but that's a small overhead on already-slow constructors, and summing construction to hashing times show a nice advantage (-27% for server, -25% for client). I expect the relative overhead of eager hashing to be even smaller for the 3 constructors that use StringCoding.decode(), but I didn't try these (much more code to hack). I've used every optimization possible at source level - manual inlining, constant folding/propagation, caching fields in locals (sample code attached in the end) - to no avail. I believe one significant problem is that the hashing function is not friendly to optimization; something like str[0] ^ str[1] ^ str[2]... would be much better (i.e., any function that allows loop unrolling and SIMD tricks). But, as we've already discussed, String.hashCode()'s algorithm cannot be changed so it's pointless considering this. My conclusion is "Myth Busted". Eager hashing helps only the most complex constructors, which are very rarely used. Even for these constructors, there is a measurable (if small) cost for strings that are never hashed, which probably doesn't offset the bigger, but still modest gain for those that are eventually hashed. A+ Osvaldo public String(String original) { int size = original.count; char[] originalValue = original.value; char[] v; if (originalValue.length > size) { // The array representing the String is bigger than the new // String itself. Perhaps this constructor is being called // in order to trim the baggage, so make a copy of the array. int off = original.offset; int newLength = off+size - off; v = new char[newLength]; int h = 0; int end = Math.min(originalValue.length - off, newLength); for (int i = 0; i < end; ++i) { h = 31*h + (v[i] = originalValue[off + i]); } this.hash = h; } else { // The array representing the String is the same // size as the String, so no point in making a copy. v = originalValue; } this.offset = 0; this.count = size; this.value = v; } A+ Osvaldo From mthornton at optrak.co.uk Sun Dec 20 11:25:47 2009 From: mthornton at optrak.co.uk (Mark Thornton) Date: Sun, 20 Dec 2009 19:25:47 +0000 Subject: Benefit from computing String Hash at compile time? In-Reply-To: <4B2E63D2.2070407@gmail.com> References: <201502.54372.qm@web57708.mail.re3.yahoo.com> <4B2C09D7.10405@adres.pl> <4B2CCCC3.1090401@gmail.com> <4B2E63D2.2070407@gmail.com> Message-ID: <4B2E7A3B.5010307@optrak.co.uk> Osvaldo Pinali Doederlein wrote: > attached in the end) - to no avail. I believe one significant problem is > that the hashing function is not friendly to optimization; something > like str[0] ^ str[1] ^ str[2]... would be much better (i.e., any > function that allows loop unrolling and SIMD tricks). But, as we've > already discussed, String.hashCode()'s algorithm cannot be changed so > it's pointless considering this. > from JavaDoc: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] = s[0]*31^(n-1) + s[2]*31^(n-3) + ... + s[1]*31^(n-2) + s[4]*31^(n-4) + ... int hOdd = 0; int hEven = 0; final int mSquared = 31*31; int ne = s.length() &~1; for (int i=0; i References: <201502.54372.qm@web57708.mail.re3.yahoo.com> <4B2C09D7.10405@adres.pl> <4B2CCCC3.1090401@gmail.com> <4B2E63D2.2070407@gmail.com> <4B2E7A3B.5010307@optrak.co.uk> Message-ID: <4B2E97F6.2040206@gmail.com> Em 20/12/2009 17:25, Mark Thornton escreveu: > Osvaldo Pinali Doederlein wrote: >> attached in the end) - to no avail. I believe one significant problem >> is that the hashing function is not friendly to optimization; >> something like str[0] ^ str[1] ^ str[2]... would be much better >> (i.e., any function that allows loop unrolling and SIMD tricks). But, >> as we've already discussed, String.hashCode()'s algorithm cannot be >> changed so it's pointless considering this. > > from JavaDoc: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] > > = s[0]*31^(n-1) + s[2]*31^(n-3) + ... + > s[1]*31^(n-2) + s[4]*31^(n-4) + ... Good idea, I didn't consider this and I believe the JIT is not that smart. I did this change to the code, but with bad results again. For the simple String(char[]), only regressions (server: 1704->1749ns; client: 1609->1630ns). The String(int[],int,int) constructor can't use the 2x-unrolled code because it must proceed one char at a time due to supplementary code points (I could work around this - but then, that would add even more code). As usual, it's dangerous to pollute very tight loops with extra variables and calculations; we risk losing performance due to factors like register allocation. The eager hashcode code was bad enough, and the more sophisticated version is even worse. Things are different when the code is really bound by the FSB; the benchmark allocated 500-char strings (roughly 1Kb) but this is too small, the new operator zero-fills the char[] so everything in in the L1 when it is initialized. Eager hashcode computation would probably look better for huge strings that don't fit in the caches, then the extra instructions could be completely hidden behind cache misses, but this is an irrelevant scenario for String. A+ Osvaldo > > > int hOdd = 0; > int hEven = 0; > final int mSquared = 31*31; > > int ne = s.length() &~1; > for (int i=0; i hEven = mSquared*hEven+s.charAt(i); > hOdd = mSquared*hOdd+s.charAt(i+1); > } > if (ne < s.length() { > hEven = mSquared*hEven+s.charAt(ne); > return hEven+31*hOdd; > } > else { > return 31*hEven+hOdd; > } > > Other variations possible. For really long Strings, divide the string > into M pieces where M is the number of processors available. > > Mark Thornton > > > From mthornton at optrak.co.uk Sun Dec 20 13:40:35 2009 From: mthornton at optrak.co.uk (Mark Thornton) Date: Sun, 20 Dec 2009 21:40:35 +0000 Subject: Benefit from computing String Hash at compile time? In-Reply-To: <4B2E97F6.2040206@gmail.com> References: <201502.54372.qm@web57708.mail.re3.yahoo.com> <4B2C09D7.10405@adres.pl> <4B2CCCC3.1090401@gmail.com> <4B2E63D2.2070407@gmail.com> <4B2E7A3B.5010307@optrak.co.uk> <4B2E97F6.2040206@gmail.com> Message-ID: <4B2E99D3.3030208@optrak.co.uk> Osvaldo Pinali Doederlein wrote: > Em 20/12/2009 17:25, Mark Thornton escreveu: >> Osvaldo Pinali Doederlein wrote: >>> attached in the end) - to no avail. I believe one significant >>> problem is that the hashing function is not friendly to >>> optimization; something like str[0] ^ str[1] ^ str[2]... would be >>> much better (i.e., any function that allows loop unrolling and SIMD >>> tricks). But, as we've already discussed, String.hashCode()'s >>> algorithm cannot be changed so it's pointless considering this. >> >> from JavaDoc: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] >> >> = s[0]*31^(n-1) + s[2]*31^(n-3) + ... + >> s[1]*31^(n-2) + s[4]*31^(n-4) + ... > > Good idea, I didn't consider this and I believe the JIT is not that > smart. I did this change to the code, but with bad results again. For > the simple String(char[]), only regressions (server: 1704->1749ns; > client: 1609->1630ns). The String(int[],int,int) constructor can't use > the 2x-unrolled code because it must proceed one char at a time due to > supplementary code points (I could work around this - but then, that > would add even more code). > > As usual, it's dangerous to pollute very tight loops with extra > variables and calculations; we risk losing performance due to factors > like register allocation. The eager hashcode code was bad enough, and > the more sophisticated version is even worse. Things are different > when the code is really bound by the FSB; the benchmark allocated > 500-char strings (roughly 1Kb) but this is too small, the new operator > zero-fills the char[] so everything in in the L1 when it is > initialized. Eager hashcode computation would probably look better for > huge strings that don't fit in the caches, then the extra instructions > could be completely hidden behind cache misses, but this is an > irrelevant scenario for String. > For something important like String.hashCode the JIT doesn't have to be smart because we can ask the JVM authors to handcode intrinsic alternatives. The fastest code is likely to depend on the hardware --- x64 has more registers available, the number of integer multiplier units varies, etc. Regards, Mark Thornton From opinali at gmail.com Mon Dec 21 05:49:36 2009 From: opinali at gmail.com (Osvaldo Pinali Doederlein) Date: Mon, 21 Dec 2009 11:49:36 -0200 Subject: Benefit from computing String Hash at compile time? In-Reply-To: <4B2E99D3.3030208@optrak.co.uk> References: <201502.54372.qm@web57708.mail.re3.yahoo.com> <4B2C09D7.10405@adres.pl> <4B2CCCC3.1090401@gmail.com> <4B2E63D2.2070407@gmail.com> <4B2E7A3B.5010307@optrak.co.uk> <4B2E97F6.2040206@gmail.com> <4B2E99D3.3030208@optrak.co.uk> Message-ID: <4B2F7CF0.9@gmail.com> Em 20/12/2009 19:40, Mark Thornton escreveu: > Osvaldo Pinali Doederlein wrote: >> Good idea, I didn't consider this and I believe the JIT is not that >> smart. I did this change to the code, but with bad results again. For >> the simple String(char[]), only regressions (server: 1704->1749ns; >> client: 1609->1630ns). The String(int[],int,int) constructor can't >> use the 2x-unrolled code because it must proceed one char at a time >> due to supplementary code points (I could work around this - but >> then, that would add even more code). >> >> As usual, it's dangerous to pollute very tight loops with extra >> variables and calculations; we risk losing performance due to factors >> like register allocation. The eager hashcode code was bad enough, and >> the more sophisticated version is even worse. Things are different >> when the code is really bound by the FSB; the benchmark allocated >> 500-char strings (roughly 1Kb) but this is too small, the new >> operator zero-fills the char[] so everything in in the L1 when it is >> initialized. Eager hashcode computation would probably look better >> for huge strings that don't fit in the caches, then the extra >> instructions could be completely hidden behind cache misses, but this >> is an irrelevant scenario for String. >> > For something important like String.hashCode the JIT doesn't have to > be smart because we can ask the JVM authors to handcode intrinsic > alternatives. The fastest code is likely to depend on the hardware --- > x64 has more registers available, the number of integer multiplier > units varies, etc. > Fair enough, these testes were on a Core2 Duo laptop (Windows 7 32-bit), so I repeated in another box with Solaris amd64 (with -d64): new String(char value[]): Normal = (server) 475ns, (client) 474ns; hashCode() = (server) 891ms, (client) 890ms Eager hashing (2x unrolled) = (server) 836ns, (client) 837ns new String(int[] codePoints, int offset, int count): Normal = (server) 2030ns, (client) 2028ns Eager hashing (normal) = (server) 2623ns, (client) 2626ns The results for the simple constructor were surprisingly better - in fact they were even "too good" to raise some suspicion, faster than the isolated hashCode() cost (but not too much, and weird things often happen in optimization). The cost over the standard, non-eager-hashing constructors is still too high for popular constructors + strings that are never hashed. Still this confirms the superiority of 64-bit; the native code for these simple constructors should leave enough spare registers that HotSpot could accommodate the extra hash calculation with much less impact. But I'm actually surprised because I though modern x86 CPUs, even (and remarkably) in 32-bit mode, would use tricks like virtual register windows so the tiny number of architectural registers wouldn't matter that much - it seems this doesn't work as well as advertised. Unfortunately, for the complex constructors there was no gain, even slightly worse (+29% for both server and client) although for precise comparison I'd need to repeat the 32-bit tests in this different system. A+ Osvaldo From Joe.Darcy at Sun.COM Mon Dec 21 19:32:56 2009 From: Joe.Darcy at Sun.COM (Joseph D. Darcy) Date: Mon, 21 Dec 2009 19:32:56 -0800 Subject: Benefit from computing String Hash at compile time? In-Reply-To: References: <560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com> <4B2CD415.5070200@gmail.com> <560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com> Message-ID: <4B303DE8.5010103@sun.com> Paul Benedict wrote: > Reinier, > > >> There is no need to explain that it can't ever change; that notion is >> already inherent in the fact that the algorithm is explained in the javadoc. >> The mistake seems to be in what 'part of the java spec' means. It does not >> actually mean: Cannot possibly change. >> > > The algorithm is explained. The documentation is good, isn't it? It > is, however, the documentation is for that version of the Java > platform. > > >> The javadoc of String should also not be bogged down with the implementation >> detail that string-on-switch is dependent on it. Implementation details have >> no place in javadoc. >> > > I agree with you. No one has to reveal implementation details. All > that is necessary is a note that the algorithm must not change from > JDK version to JDK version. > > Moving on... > If anyone at Sun is still listening (::grins::), I prefer to emit a > static method that contains a duplicate of the hashCode() algorithm. > Then, no one has to worry about JDK version upgrades and > String.hashCode() is free for future tweaking. > > static int $switch_hashCode(String s) {... } > switch ($switch_hashCode(s)) { > ... > } > > I'm still listening, but mostly on vacation until early 2010. I'm quite familiar with the compatibility policies used to evolve the JDK and I've written about those in my blog; e.g. "JDK Release Types and Compatibility Regions" http://blogs.sun.com/darcy/entry/release_types_compatibility_regions "Kinds of Compatibility: Source, Binary, and Behavioral" http://blogs.sun.com/darcy/entry/kinds_of_compatibility There is a vanishingly small chance changing the hash algorithm of string would ever be contemplated for Java SE; the behavioral compatibility risk would be too great given the ubiquitous use of the String class. As one of my professors was fond of saying, "making virtue of a necessity," since the string hashing algorithm effectively cannot be changed, the current strings in switch implementation assumes it will be stable. -Joe From Ulf.Zibis at gmx.de Tue Dec 22 05:47:07 2009 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 22 Dec 2009 14:47:07 +0100 Subject: Strings in Switch In-Reply-To: <4B1FF7AD.50502@gmx.de> References: <4B1D9FED.7090406@sun.com> <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> <4B1DD0BB.7010709@sun.com> <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> <4B1E833B.1040200@sun.com> <4B1F7EFC.2030601@oracle.com> <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> <4B1FC38E.2010702@oracle.com> <4B1FF7AD.50502@gmx.de> Message-ID: <4B30CDDB.20108@gmx.de> For more details see my bug report: 6912520 - String#equals(Object) should benefit from hash code -Ulf Am 09.12.2009 20:17, Ulf Zibis schrieb: > > If a compare on a string is rarely, and especially if it's > total length is not trivial, the hash code computation should be more > expensive, but after some repeated compairs on the same string, the > hashcode algorithm would win. > Here an enhanced String#equals() implementation, which values the length > of every invoked compare on characters: > > int equalByHashThreshold = count; > > public boolean equals(Object anObject) { > if (this == anObject) { > return true; > } > if (anObject instanceof String) { > String anotherString = (String)anObject; > int n = count; > if (n == anotherString.count && > (equalByHashThreshold > 0 || > hash() == anotherString.hash())) { > char v1[] = value; > char v2[] = anotherString.value; > int i = offset; > int j = anotherString.offset; > while (n-- != 0) > if (v1[i++] != v2[j++]) { > if (equalByHashThreshold > 0) > equalByHashThreshold -= (count - n); > return false; > } > return true; > } > } > return false; > } > > public int hashCode() { > int h = hash; > if (h == 0) { > int off = offset; > char val[] = value; > int len = count; > > for (int i = 0; i < len; i++) { > h = 31*h + val[off++]; > } > hash = h; > equalByHashThreshold = 0; > } > return h; > } > > > > -Ulf > > > > > > > From info at frankcornelis.be Thu Dec 24 21:33:16 2009 From: info at frankcornelis.be (Frank Cornelis) Date: Fri, 25 Dec 2009 06:33:16 +0100 Subject: Field attribute Message-ID: <4B344E9C.1090304@frankcornelis.be> Hi, Here is an idea for a Java language extension. As JBoss Seam user I frequently do something like: @DataModel private List entities; @Factory("entities"); public void initEntities() { this.entities = ... } If I refactor this code and rename the entities field, my factory method won't work anymore as it has to refer to the entities field name. So my suggestion would be to introduce a syntax as follows: @DataModel private List entities; @Factory(@this.entity.getName()) public void initEntities() { this.entities = ... } So the @this.entities gives you the Field class of the this.entities variable. Kind Regards, Frank. From Joe.Darcy at Sun.COM Sun Dec 27 08:10:52 2009 From: Joe.Darcy at Sun.COM (Joseph D. Darcy) Date: Sun, 27 Dec 2009 08:10:52 -0800 Subject: Field attribute In-Reply-To: <4B344E9C.1090304@frankcornelis.be> References: <4B344E9C.1090304@frankcornelis.be> Message-ID: <4B37870C.1040905@sun.com> Frank Cornelis wrote: > Hi, > > > Here is an idea for a Java language extension. The Project Coin call for proposals phase ended many months ago and the proposal form is a detailed examination of the language change and its implications rather than a sketch of the idea. To have this kind of idea considered for a future language change, it should be submitted to http://bugreport.sun.com/bugreport/ Regards, -Joe Darcy From matthew at matthewadams.me Sun Dec 27 08:22:55 2009 From: matthew at matthewadams.me (Matthew Adams) Date: Sun, 27 Dec 2009 08:22:55 -0800 Subject: Field attribute In-Reply-To: <4B37870C.1040905@sun.com> References: <4B344E9C.1090304@frankcornelis.be> <4B37870C.1040905@sun.com> Message-ID: <1ba389ce0912270822u4e43ba5cvde141a0a4663dcfa@mail.gmail.com> Hi Frank, I made a similar proposal, but too late: http://mail.openjdk.java.net/pipermail/coin-dev/2009-December/002638.html I'll add a bug report myself. -matthew On Sun, Dec 27, 2009 at 8:10 AM, Joseph D. Darcy wrote: > Frank Cornelis wrote: >> Hi, >> >> >> Here is an idea for a Java language extension. > > The Project Coin call for proposals phase ended many months ago and the > proposal form is a detailed examination of the language change and its > implications rather than a sketch of the idea. > > To have this kind of idea considered for a future language change, it > should be submitted to > http://bugreport.sun.com/bugreport/ > > Regards, > > -Joe Darcy > > -- mailto:matthew at matthewadams.me skype:matthewadams12 yahoo:matthewadams aol:matthewadams12 google-talk:matthewadams12 at gmail.com msn:matthew at matthewadams.me http://matthewadams.me http://www.linkedin.com/in/matthewadams From matthew at matthewadams.me Sun Dec 27 08:30:41 2009 From: matthew at matthewadams.me (Matthew Adams) Date: Sun, 27 Dec 2009 08:30:41 -0800 Subject: Field attribute In-Reply-To: <1ba389ce0912270822u4e43ba5cvde141a0a4663dcfa@mail.gmail.com> References: <4B344E9C.1090304@frankcornelis.be> <4B37870C.1040905@sun.com> <1ba389ce0912270822u4e43ba5cvde141a0a4663dcfa@mail.gmail.com> Message-ID: <1ba389ce0912270830x2a0f321fmaf5252bd02bab090@mail.gmail.com> Bug report/enhancement request filed with Sun just now. -matthew On Sun, Dec 27, 2009 at 8:22 AM, Matthew Adams wrote: > Hi Frank, > > I made a similar proposal, but too late: > > http://mail.openjdk.java.net/pipermail/coin-dev/2009-December/002638.html > > I'll add a bug report myself. > > -matthew > > On Sun, Dec 27, 2009 at 8:10 AM, Joseph D. Darcy wrote: >> Frank Cornelis wrote: >>> Hi, >>> >>> >>> Here is an idea for a Java language extension. >> >> The Project Coin call for proposals phase ended many months ago and the >> proposal form is a detailed examination of the language change and its >> implications rather than a sketch of the idea. >> >> To have this kind of idea considered for a future language change, it >> should be submitted to >> http://bugreport.sun.com/bugreport/ >> >> Regards, >> >> -Joe Darcy >> >> > > > > -- > mailto:matthew at matthewadams.me > skype:matthewadams12 > yahoo:matthewadams > aol:matthewadams12 > google-talk:matthewadams12 at gmail.com > msn:matthew at matthewadams.me > http://matthewadams.me > http://www.linkedin.com/in/matthewadams > -- mailto:matthew at matthewadams.me skype:matthewadams12 yahoo:matthewadams aol:matthewadams12 google-talk:matthewadams12 at gmail.com msn:matthew at matthewadams.me http://matthewadams.me http://www.linkedin.com/in/matthewadams From david.goodenough at linkchoose.co.uk Sun Dec 27 08:58:51 2009 From: david.goodenough at linkchoose.co.uk (David Goodenough) Date: Sun, 27 Dec 2009 16:58:51 +0000 Subject: Field attribute In-Reply-To: <4B344E9C.1090304@frankcornelis.be> References: <4B344E9C.1090304@frankcornelis.be> Message-ID: <200912271658.52012.david.goodenough@linkchoose.co.uk> You might like to look at Lombok and my Beans extension to Lombok. Lombok can be found at http://projectlombok.org and my Beans extension can be found at http://dga.co.uk/lombokbeans. David On Friday 25 December 2009, Frank Cornelis wrote: > Hi, > > > Here is an idea for a Java language extension. As JBoss Seam user I > frequently do something like: > @DataModel > private List entities; > > @Factory("entities"); > public void initEntities() { > this.entities = ... > } > > If I refactor this code and rename the entities field, my factory method > won't work anymore as it has to refer to the entities field name. So my > suggestion would be to introduce a syntax as follows: > @DataModel > private List entities; > > @Factory(@this.entity.getName()) > public void initEntities() { > this.entities = ... > } > > So the @this.entities gives you the Field class of the this.entities > variable. > > > Kind Regards, > Frank. >