From Joe.Darcy at Sun.COM  Tue Dec  1 15:09:44 2009
From: Joe.Darcy at Sun.COM (Joseph D. Darcy)
Date: Tue, 01 Dec 2009 15:09:44 -0800
Subject: Bug reports
In-Reply-To: <C02F0672-1309-41F7-A016-355502CCFD96@googlemail.com>
References: <C02F0672-1309-41F7-A016-355502CCFD96@googlemail.com>
Message-ID: <4B15A238.7060200@sun.com>

Hi Mark.

Catching up on email after Devoxx and Thanksgiving,

Mark Mahieu wrote:
> Hi Joe,
>
> Is there any particular process you'd prefer Coin bug reports to go through, or should they just be filed at the usual place?
>   

Hmm... First, thanks for asking :-)  In general bugs.sun.com or other 
existing channel to file bugs on JDK 7 is suitable for filing bugs on 
Coin features too of course.  For coin-dev participants, sending bugs to 
the list is reasonable as long as the general bug information is provided:

synopsis/summary
version info (which build and OS)
description
how to reproduce
expected result vs actual result
error messages
source code causing the problem
workaround (if available)

The submitting problems should generally be actual bugs in the 
implementation or specification, such as "In JDK 7 build N, javac 
crashes on this use of the diamond operator..." or "The new literal 
grammar excludes numbers with one digit" as opposed to "Instead of the 
diamond operator, var declarations should be allowed."

-Joe


From ted at tedneward.com  Thu Dec  3 02:37:33 2009
From: ted at tedneward.com (Ted Neward)
Date: Thu, 3 Dec 2009 02:37:33 -0800
Subject: Post-Devoxx Project Coin update, closures off-topic for coin-dev
In-Reply-To: <4B13EF40.2090408@sun.com>
References: <4B13EF40.2090408@sun.com>
Message-ID: <071a01ca7404$a1983b00$e4c8b100$@com>

Joe--

Is the mailing list for Project Lambda up and running yet?

Ted Neward
Java, .NET, XML Services
Consulting, Teaching, Speaking, Writing
http://www.tedneward.com

> -----Original Message-----
> From: coin-dev-bounces at openjdk.java.net [mailto:coin-dev-
> bounces at openjdk.java.net] On Behalf Of Joseph D. Darcy
> Sent: Monday, November 30, 2009 8:14 AM
> To: coin-dev at openjdk.java.net
> Subject: Post-Devoxx Project Coin update, closures off-topic for coin-
> dev
> 
> Hello.
> 
> As has been announced recently at Devoxx and covered in various places,
> including previous threads on this list, Mark Reinhold made several
> announcements about JDK 7 at this year's Devoxx:
> 
> 1) JDK 7 will have a form of closures.
> 2) The JDK 7 schedule is being extended to roughly fall 2010.
> 
> On the first announcement, the coin-dev list is not the appropriate
> forum to discuss closures in Java.  Closures are hereby decreed as
> off-topic for coin-dev.
> 
> Mark's blog entry "Closures for Java"
> (http://blogs.sun.com/mr/entry/closures) invites those with an informed
> opinion to participate in the current discussion; watch Mark's blog for
> news about creation of a new list or project, etc., to host this
> closures effort.
> 
> On the second announcement, while the JDK 7 schedule has been extended,
> many of the current final five (or so) Project Coin features have not
> yet been fully implemented, specified, and tested.  Therefore, there
> will *not* be a general reassessment of Project Coin feature selection
> or another call for proposals in JDK 7.  The final five (or so)
> proposals remain selected for inclusion in JDK 7 and work will continue
> to complete those features.  However, given its technical merit and the
> possibility of providing useful infrastructure for ARM, improved
> exception handling is now being reconsidered for inclusion in JDK 7.
> No
> other "for further consideration" proposal is under reconsideration.
> 
> -Joe


From ted at tedneward.com  Thu Dec  3 02:34:02 2009
From: ted at tedneward.com (Ted Neward)
Date: Thu, 3 Dec 2009 02:34:02 -0800
Subject: ARM syntax and new keywords
In-Reply-To: <15e8b9d20911261248y14c6c723re7c5b00784815a2d@mail.gmail.com>
References: <200911261830.20223.david.goodenough@linkchoose.co.uk>
	<15e8b9d20911261248y14c6c723re7c5b00784815a2d@mail.gmail.com>
Message-ID: <071901ca7404$24db1830$6e914890$@com>

Actually, the exotic identifiers syntax has a loophole that probably should
be closed:

class #"java\\lang\\Object" // compiles into <targetdir>/java/lang/Object
{
}

Ted Neward
Java, .NET, XML Services
Consulting, Teaching, Speaking, Writing
http://www.tedneward.com

> -----Original Message-----
> From: coin-dev-bounces at openjdk.java.net [mailto:coin-dev-
> bounces at openjdk.java.net] On Behalf Of Neal Gafter
> Sent: Thursday, November 26, 2009 12:49 PM
> To: David Goodenough
> Cc: coin-dev at openjdk.java.net
> Subject: Re: ARM syntax and new keywords
> 
> On Thu, Nov 26, 2009 at 10:30 AM, David Goodenough <
> david.goodenough at linkchoose.co.uk> wrote:
> 
> > On Thursday 26 November 2009, Neal Gafter wrote:
> > > Now that jdk7 will include a syntax for using names that are
> otherwise
> > > keywords (the syntax is #"name"), the backward-compatibility
> breakage of
> > > adding a new keyword is much less severe.  Also, context-sensitive
> > keywords
> > > are a true-and-tried technique, just not used yet in Java.
> > >
> > I hope that the # operator will be available for both methods and
> fields.
> > This was discussed early on in the Coin process, in my lightweight
> > properties
> > proposal.  It is really the only bit of that proposal that is needed,
> the
> > rest
> > can be done in other ways.
> >
> 
> This has nothing to do with closures (there is a separate email list
> for
> that anyway).  The #"name" syntax was added to give a way of using
> "exotic"
> identifiers that would otherwise be disallowed.  For example, if you
> have a
> method pre jdk7
> 
>   int foobar(int x) { return x + 1; }
> 
> and then we add a keyword "foobar" to the language in jdk7, you can
> still
> use the identifier "foobar" by modifying the source as following
> 
>   int #"foobar"(int x) { return x + 1; }
> 
> This is even possible for package names, though a bit awkward
> 
>   package #"foobar";
> 
> Cheers,
> Neal


From Joe.Darcy at Sun.COM  Thu Dec  3 13:02:35 2009
From: Joe.Darcy at Sun.COM (Joe Darcy)
Date: Thu, 03 Dec 2009 13:02:35 -0800
Subject: Post-Devoxx Project Coin update, closures off-topic for coin-dev
In-Reply-To: <071a01ca7404$a1983b00$e4c8b100$@com>
References: <4B13EF40.2090408@sun.com> <071a01ca7404$a1983b00$e4c8b100$@com>
Message-ID: <4B18276B.2090305@sun.com>

Ted Neward wrote:
> Joe--
>
> Is the mailing list for Project Lambda up and running yet?
>   

No; the vote to create the project is still being conducted on compiler-dev.

-Joe


From pbenedict at apache.org  Sat Dec  5 17:35:05 2009
From: pbenedict at apache.org (Paul Benedict)
Date: Sat, 5 Dec 2009 19:35:05 -0600
Subject: Strings in Switch
Message-ID: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>

Joe,

I reviewed the check-in and read a presentation about the
implementation. I get that it translates into two switch statements,
but I think there are cases where the duality can be eliminated. If
the first switch statement produces no hash collisions, I don't think
the second switch statement is necessary. Thoughts?

Paul


From reinier at zwitserloot.com  Sun Dec  6 19:38:02 2009
From: reinier at zwitserloot.com (Reinier Zwitserloot)
Date: Mon, 7 Dec 2009 04:38:02 +0100
Subject: Strings in Switch
In-Reply-To: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
Message-ID: <560fb5ed0912061938t26f338b2i64419eecd363cdb9@mail.gmail.com>

Paul, I don't think there are any cases that are going to occur in real
life.

The problem is false positives. Let' say I write:

String x = getSomeString();
switch (x) {
    case "hello": doX(); break;
    case "world": doY(); break;
}

'hello' and 'world' have different hashes, so it seems we can just desugar
this to:

switch(x.hashCode()) {
   case "hello".hashCode(): doX(); break;
   case "world".hashCode(): doY(); break;
}


but that's not the right desugaring, because if I return some string that
ISNT "hello" but so happens to hashCode to the same hashCode as "hello",
then we just broke the program.

There's no way to know that x.hashCode() couldn't possibly collide, unless x
is a compile time literal. However, that seems to be a rather rare academic
case; case expressions already need to be compile time constants, so this
would mean we have a switch statement comprised of 100% compile time
literals. Something like:

switch ("hello") {
case "hello": ....
case "world": ....
}

other than debug code and some half-hearted attempt at macroing, this isn't
ever going to occur in java code. I don't think its a good idea to burden
either the JLS or javac with a special case for this scenario; the dual
switch one will handle this just fine.

Of course, if there's some other code form which could be computed without
possible collision issues in a single switch statement, please show it to us
so we can come up with a strategy for detecting this situation.

--Reinier Zwitserloot


On Sun, Dec 6, 2009 at 2:35 AM, Paul Benedict <pbenedict at apache.org> wrote:

> Joe,
>
> I reviewed the check-in and read a presentation about the
> implementation. I get that it translates into two switch statements,
> but I think there are cases where the duality can be eliminated. If
> the first switch statement produces no hash collisions, I don't think
> the second switch statement is necessary. Thoughts?
>
> Paul
>
>


From neal at gafter.com  Sun Dec  6 20:38:33 2009
From: neal at gafter.com (Neal Gafter)
Date: Sun, 6 Dec 2009 20:38:33 -0800
Subject: Strings in Switch
In-Reply-To: <560fb5ed0912061938t26f338b2i64419eecd363cdb9@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com> 
	<560fb5ed0912061938t26f338b2i64419eecd363cdb9@mail.gmail.com>
Message-ID: <15e8b9d20912062038r22d009bao22454f0b7eb30cd8@mail.gmail.com>

On Sun, Dec 6, 2009 at 7:38 PM, Reinier Zwitserloot <reinier at zwitserloot.com
> wrote:

> 'hello' and 'world' have different hashes, so it seems we can just desugar
> this to:
>
> switch(x.hashCode()) {
>   case "hello".hashCode(): doX(); break;
>   case "world".hashCode(): doY(); break;
> }
>
>
> but that's not the right desugaring, because if I return some string that
> ISNT "hello" but so happens to hashCode to the same hashCode as "hello",
> then we just broke the program.
>

I think Paul is imagining compiler-generated if statements inside the cases
to check string.equals(x, "hello") etc.

Cheers,
Neal


From Joe.Darcy at Sun.COM  Sun Dec  6 22:07:59 2009
From: Joe.Darcy at Sun.COM (Joseph D. Darcy)
Date: Sun, 06 Dec 2009 22:07:59 -0800
Subject: Strings in Switch
In-Reply-To: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
Message-ID: <4B1C9BBF.7010704@sun.com>

Paul Benedict wrote:
> Joe,
>
> I reviewed the check-in and read a presentation about the
> implementation. I get that it translates into two switch statements,
> but I think there are cases where the duality can be eliminated. If
> the first switch statement produces no hash collisions, I don't think
> the second switch statement is necessary. Thoughts?
>
>   

There are many possible desugarings of strings in switch into one or 
more switch statements and other existing code structures, some of which 
are alluded to the comments of the current strings in switch implementation.

If there is no "bad" control flow in the original strings in switch (no 
fall throughs, etc.) and no collisions with the chosen hash function, 
then yes the the code
  case "foo":
can be replaced with
  case "foo".hashCode():
      if ("foo".equals(...")) {...}
if care is taken to implement the semantics of any default alternative 
that is present.

However, for the initial strings in switch implementation in javac, we 
choose to pursue a single general-purpose strings in switch translation 
that should always provide at least reasonable performance since it 
results in less compiler code to test for what is currently a low 
duty-cycle code structure.

If special cases of strings in switch turn out to have high duty-cycles, 
that would justify additional engineering to support different 
implementations tailored to different code inputs.

-Joe


From lk at teamten.com  Sun Dec  6 22:14:08 2009
From: lk at teamten.com (Lawrence Kesteloot)
Date: Sun, 6 Dec 2009 22:14:08 -0800
Subject: Strings in Switch
In-Reply-To: <4B1C9BBF.7010704@sun.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com> 
	<4B1C9BBF.7010704@sun.com>
Message-ID: <997cab100912062214n3a1effa8o9147d0acdaf8c42a@mail.gmail.com>

Now that we plan to have closures, do we still need strings-in-switch?
Won't a string-to-function map be about as fast (though maybe less
convenient)? I don't know what the use cases are for
strings-in-switch, but the feature already felt a bit low-benefit to
me, and seems even more so now with closures.

Lawrence


On Sun, Dec 6, 2009 at 10:07 PM, Joseph D. Darcy <Joe.Darcy at sun.com> wrote:
> Paul Benedict wrote:
>> Joe,
>>
>> I reviewed the check-in and read a presentation about the
>> implementation. I get that it translates into two switch statements,
>> but I think there are cases where the duality can be eliminated. If
>> the first switch statement produces no hash collisions, I don't think
>> the second switch statement is necessary. Thoughts?
>>
>>
>
> There are many possible desugarings of strings in switch into one or
> more switch statements and other existing code structures, some of which
> are alluded to the comments of the current strings in switch implementation.
>
> If there is no "bad" control flow in the original strings in switch (no
> fall throughs, etc.) and no collisions with the chosen hash function,
> then yes the the code
> ?case "foo":
> can be replaced with
> ?case "foo".hashCode():
> ? ? ?if ("foo".equals(...")) {...}
> if care is taken to implement the semantics of any default alternative
> that is present.
>
> However, for the initial strings in switch implementation in javac, we
> choose to pursue a single general-purpose strings in switch translation
> that should always provide at least reasonable performance since it
> results in less compiler code to test for what is currently a low
> duty-cycle code structure.
>
> If special cases of strings in switch turn out to have high duty-cycles,
> that would justify additional engineering to support different
> implementations tailored to different code inputs.
>
> -Joe
>
>
>


From reinier at zwitserloot.com  Sun Dec  6 22:17:11 2009
From: reinier at zwitserloot.com (Reinier Zwitserloot)
Date: Mon, 7 Dec 2009 07:17:11 +0100
Subject: Strings in Switch
In-Reply-To: <997cab100912062214n3a1effa8o9147d0acdaf8c42a@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1C9BBF.7010704@sun.com>
	<997cab100912062214n3a1effa8o9147d0acdaf8c42a@mail.gmail.com>
Message-ID: <560fb5ed0912062217h30ecaca3qdb44c49c4efe164e@mail.gmail.com>

Let's no go closure crazy. The fact that switch does NOT support strings
today is a silly triviality that catches out many beginning java
programmers.

A lot of the hard work on this proposal has already been done, so abandoning
it now does not seem like a good idea.

Also, Mark Reinhold's plan for closures does not include transparency, which
would make a closure-based function map much inferior to strings in switch
which is of course transparent.

--Reinier Zwitserloot


On Mon, Dec 7, 2009 at 7:14 AM, Lawrence Kesteloot <lk at teamten.com> wrote:

> Now that we plan to have closures, do we still need strings-in-switch?
> Won't a string-to-function map be about as fast (though maybe less
> convenient)? I don't know what the use cases are for
> strings-in-switch, but the feature already felt a bit low-benefit to
> me, and seems even more so now with closures.
>
> Lawrence
>
>
> On Sun, Dec 6, 2009 at 10:07 PM, Joseph D. Darcy <Joe.Darcy at sun.com>
> wrote:
> > Paul Benedict wrote:
> >> Joe,
> >>
> >> I reviewed the check-in and read a presentation about the
> >> implementation. I get that it translates into two switch statements,
> >> but I think there are cases where the duality can be eliminated. If
> >> the first switch statement produces no hash collisions, I don't think
> >> the second switch statement is necessary. Thoughts?
> >>
> >>
> >
> > There are many possible desugarings of strings in switch into one or
> > more switch statements and other existing code structures, some of which
> > are alluded to the comments of the current strings in switch
> implementation.
> >
> > If there is no "bad" control flow in the original strings in switch (no
> > fall throughs, etc.) and no collisions with the chosen hash function,
> > then yes the the code
> >  case "foo":
> > can be replaced with
> >  case "foo".hashCode():
> >      if ("foo".equals(...")) {...}
> > if care is taken to implement the semantics of any default alternative
> > that is present.
> >
> > However, for the initial strings in switch implementation in javac, we
> > choose to pursue a single general-purpose strings in switch translation
> > that should always provide at least reasonable performance since it
> > results in less compiler code to test for what is currently a low
> > duty-cycle code structure.
> >
> > If special cases of strings in switch turn out to have high duty-cycles,
> > that would justify additional engineering to support different
> > implementations tailored to different code inputs.
> >
> > -Joe
> >
> >
> >
>
>


From lk at teamten.com  Sun Dec  6 22:38:30 2009
From: lk at teamten.com (Lawrence Kesteloot)
Date: Sun, 6 Dec 2009 22:38:30 -0800
Subject: Strings in Switch
In-Reply-To: <560fb5ed0912062217h30ecaca3qdb44c49c4efe164e@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com> 
	<4B1C9BBF.7010704@sun.com>
	<997cab100912062214n3a1effa8o9147d0acdaf8c42a@mail.gmail.com> 
	<560fb5ed0912062217h30ecaca3qdb44c49c4efe164e@mail.gmail.com>
Message-ID: <997cab100912062238t106db1c6r95fe6b09ca2b02be@mail.gmail.com>

On Sun, Dec 6, 2009 at 10:17 PM, Reinier Zwitserloot
<reinier at zwitserloot.com> wrote:
> Let's no go closure crazy.

It's interesting that one of the arguments for closures is that it
makes language feature less necessary, but now suggesting that this
feature might be less necessary is "going closure crazy".

> The fact that switch does NOT support strings
> today is a silly triviality that catches out many beginning java
> programmers.

A triviality? Now every Java compiler has to support this, every IDE,
people have to learn it, people have to remember that it won't work if
their code might one day have to be compiled by JDK 6, etc. I don't
think anything in something as large as Java is a silly triviality.
Every feature has a non-trivial cost.

> A lot of the hard work on this proposal has already been done, so abandoning
> it now does not seem like a good idea.

That's 100% irrelevant. You don't add a feature to a language just
because the work has been done. If it's, on balance, not a worthwhile
features, then we remove it, sorry to those who spent time on it. Put
another way, if project coin would not today accept this feature in
light of closures, then it should pull it out.

> Also, Mark Reinhold's plan for closures does not include transparency, which
> would make a closure-based function map much inferior to strings in switch
> which is of course transparent.

Good point.

Lawrence


From Jonathan.Gibbons at Sun.COM  Mon Dec  7 10:03:50 2009
From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons)
Date: Mon, 07 Dec 2009 10:03:50 -0800
Subject: Strings in Switch
In-Reply-To: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
Message-ID: <4B1D4386.6080907@sun.com>

Paul Benedict wrote:
> Joe,
>
> I reviewed the check-in and read a presentation about the
> implementation. I get that it translates into two switch statements,
> but I think there are cases where the duality can be eliminated. If
> the first switch statement produces no hash collisions, I don't think
> the second switch statement is necessary. Thoughts?
>
> Paul
>
>   
Paul,

Don't forget you have to take care to handle all the other strings in 
the world that might be passed into the first switch. It's not just a 
matter of looking at the strings explicitly listed in the case labels, 
but of all the other strings that might be handled in the default case.

-- Jon


From reinier at zwitserloot.com  Mon Dec  7 08:35:29 2009
From: reinier at zwitserloot.com (Reinier Zwitserloot)
Date: Mon, 7 Dec 2009 17:35:29 +0100
Subject: Strings in Switch
In-Reply-To: <997cab100912062238t106db1c6r95fe6b09ca2b02be@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1C9BBF.7010704@sun.com>
	<997cab100912062214n3a1effa8o9147d0acdaf8c42a@mail.gmail.com>
	<560fb5ed0912062217h30ecaca3qdb44c49c4efe164e@mail.gmail.com>
	<997cab100912062238t106db1c6r95fe6b09ca2b02be@mail.gmail.com>
Message-ID: <560fb5ed0912070835q7e62a20bv63daf25ab86f3336@mail.gmail.com>

inline.

On Mon, Dec 7, 2009 at 7:38 AM, Lawrence Kesteloot <lk at teamten.com> wrote:

> It's interesting that one of the arguments for closures is that it
> makes language feature less necessary, but now suggesting that this
> feature might be less necessary is "going closure crazy".
>
>
Please don't respond to emails until you read all of them. I explained why
your idea is going closure crazy in the next paragraph.


> A triviality?
>

You misunderstand my words. The fact that switch DOES work on ints, but does
NOT work on Strings, _THAT_ is trivia. You have to know. It's not obvious.
It seems like an arbitrary restriction. Doing away with this restriction is
good. In fact, for the next coin, I wouldn't be opposed to enabling longs in
switch as well, for the same reason. Consistency amongst the concept of
compile time literals. All primitives, and Strings, can come in 'literal'
form. This is relevant in a number of places, including compile-time
inlining of constants. However, for switch, there are 3 exceptions:
booleans, longs, and Strings. Booleans are irrelevant for obvious reasons.
Strings are now getting added. Which leaves just the longs.

Also, "people have to learn it"? No. There's not a soul on this earth who
knows what java switch statements are that is going to be confused by
strings in switch. People WOULD have to learn to use a library and/or
pattern based around a Map<String, #(String)void>, though.

That's 100% irrelevant. You don't add a feature to a language just
> because the work has been done.


There's a limited budget to spend on improving java. We've already spent a
lot of it here. It got through coin, a lot of peer review, and it hasn't run
into serious opposition until you brought it up. Of course it's relevant.


> > Also, Mark Reinhold's plan for closures does not include transparency,
> which
> > would make a closure-based function map much inferior to strings in
> switch
> > which is of course transparent.
>
> Good point.
>
>
Yes, it is.


From pbenedict at apache.org  Mon Dec  7 14:43:26 2009
From: pbenedict at apache.org (Paul Benedict)
Date: Mon, 7 Dec 2009 16:43:26 -0600
Subject: Strings in Switch
In-Reply-To: <4B1D4386.6080907@sun.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D4386.6080907@sun.com>
Message-ID: <b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>

I see that syntactic sugar assumes that the compiler and the runtime
environment both use the same string hashing algorithm. As noted, the
algorithm has never changed since at least JDK 1.2. Even if unlikely,
I don't feel comfortable with this assumption - I do not have an
alternative to propose either -- but I thought it was worth voicing.

Regardless, I see this as a pure detail of one possible
implementation. Another implementation may not choose to use hash
codes at all. Am I correct, or am I wrong and the JLS change will
mandate the use of hashCode for switching?

Paul


From jorge.ortiz at gmail.com  Mon Dec  7 15:14:21 2009
From: jorge.ortiz at gmail.com (Jorge Ortiz)
Date: Mon, 7 Dec 2009 15:14:21 -0800
Subject: Strings in Switch
In-Reply-To: <b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D4386.6080907@sun.com>
	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
Message-ID: <22a410d00912071514k3c18aanb9ec023b31f43e3d@mail.gmail.com>

The equivalent in Scala of "Strings in Switch" is pattern matching on
strings. In Scala this is implemented as nested if-else statements. This is
probably necessary because Scala's pattern matching has some additional
features (like matching only if a conditional is met, or matching on
extractors) that probably wouldn't mesh will with optimizations like
hashCode. That said, in my 2.5 years using Scala I've never once heard
anyone complain about the performance of pattern matching on strings. It'd
be interesting to see some benchmarks, but I'd guess that the difference in
performance between the equality approach and the hashCode approach is
unnoticeable unless you're matching on either really, really long strings or
a very, very large number of strings. Neither of these scenarios is likely
to be true for switch statements.

--j

On Mon, Dec 7, 2009 at 2:43 PM, Paul Benedict <pbenedict at apache.org> wrote:

> I see that syntactic sugar assumes that the compiler and the runtime
> environment both use the same string hashing algorithm. As noted, the
> algorithm has never changed since at least JDK 1.2. Even if unlikely,
> I don't feel comfortable with this assumption - I do not have an
> alternative to propose either -- but I thought it was worth voicing.
>
> Regardless, I see this as a pure detail of one possible
> implementation. Another implementation may not choose to use hash
> codes at all. Am I correct, or am I wrong and the JLS change will
> mandate the use of hashCode for switching?
>
> Paul
>
>


From Joe.Darcy at Sun.COM  Mon Dec  7 16:38:05 2009
From: Joe.Darcy at Sun.COM (Joseph D. Darcy)
Date: Mon, 07 Dec 2009 16:38:05 -0800
Subject: Strings in Switch
In-Reply-To: <b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D4386.6080907@sun.com>
	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
Message-ID: <4B1D9FED.7090406@sun.com>

Paul Benedict wrote:
> I see that syntactic sugar assumes that the compiler and the runtime
> environment both use the same string hashing algorithm. As noted, the
> algorithm has never changed since at least JDK 1.2. Even if unlikely,
> I don't feel comfortable with this assumption - I do not have an
> alternative to propose either -- but I thought it was worth voicing.
>   

This assumption is explicitly called out in comment in the 
implementation; we are aware of the potential problem and are making a 
different judgment on the comfortability of the implementation strategy.

> Regardless, I see this as a pure detail of one possible
> implementation. Another implementation may not choose to use hash
> codes at all. Am I correct, or am I wrong and the JLS change will
> mandate the use of hashCode for switching?
>   

The JLS will be completely silent on the implementation technique.  The 
entire strings in switch spec change is adding "String, " to the list of 
valid types of expressions that can be switched on.

-Joe


From pbenedict at apache.org  Mon Dec  7 18:00:05 2009
From: pbenedict at apache.org (Paul Benedict)
Date: Mon, 7 Dec 2009 20:00:05 -0600
Subject: Strings in Switch
In-Reply-To: <4B1D9FED.7090406@sun.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D4386.6080907@sun.com>
	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
	<4B1D9FED.7090406@sun.com>
Message-ID: <b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>

Joe,

On Mon, Dec 7, 2009 at 6:38 PM, Joseph D. Darcy <Joe.Darcy at sun.com> wrote:
> we are aware of the potential problem and are making a different judgment on
> the comfortability of the implementation strategy.

Have you thought about calculating the hash code, not as part of the
compiler's emitted bytecode, but when the class is loaded? Maybe it is
possible to desugar the code into a static { } so the compiler's
environment is taken out of the equation. However, this would mean
your double-switch would no longer be usable since case labels must be
constants, but there are no constant restrictions regarding if/else
chains.

Another possible strategy is to export the current String hashing
algorithm into some public method and make the JLS rely on that
method. Eh, I don't like it, but it's a theoretical option.

Paul


From reinier at zwitserloot.com  Mon Dec  7 19:34:41 2009
From: reinier at zwitserloot.com (Reinier Zwitserloot)
Date: Tue, 8 Dec 2009 04:34:41 +0100
Subject: Strings in Switch
In-Reply-To: <b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D4386.6080907@sun.com>
	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
	<4B1D9FED.7090406@sun.com>
	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>
Message-ID: <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>

String.hashCode()'s exact algorithm is codified in the official javadoc. It
is therefore canon. Thus, changing String.hashCode breaks backwards
compatibility. Java has never broken backwards compatibility in such a core
feature. Hell freezes over before hashCode() will change comes to mind.

If Strings are ever going to get a different hashCode algorithm, I expect it
will be an internal affair, with special-casing code in e.g. HashMap to use
the more efficient one, leaving the public-facing hashCode() intact, lest
tons of existing code that relies on string hashCodes breaks.

I'm not for or against any particular implementation of strings-in-switch,
just making an observation.

--Reinier Zwitserloot

Need to receive donations via the web?
Check https://tipit.to/


On Tue, Dec 8, 2009 at 3:00 AM, Paul Benedict <pbenedict at apache.org> wrote:

> Joe,
>
> On Mon, Dec 7, 2009 at 6:38 PM, Joseph D. Darcy <Joe.Darcy at sun.com> wrote:
> > we are aware of the potential problem and are making a different judgment
> on
> > the comfortability of the implementation strategy.
>
> Have you thought about calculating the hash code, not as part of the
> compiler's emitted bytecode, but when the class is loaded? Maybe it is
> possible to desugar the code into a static { } so the compiler's
> environment is taken out of the equation. However, this would mean
> your double-switch would no longer be usable since case labels must be
> constants, but there are no constant restrictions regarding if/else
> chains.
>
> Another possible strategy is to export the current String hashing
> algorithm into some public method and make the JLS rely on that
> method. Eh, I don't like it, but it's a theoretical option.
>
> Paul
>
>


From Joe.Darcy at Sun.COM  Mon Dec  7 20:06:19 2009
From: Joe.Darcy at Sun.COM (Joe Darcy)
Date: Mon, 07 Dec 2009 20:06:19 -0800
Subject: Strings in Switch
In-Reply-To: <560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D4386.6080907@sun.com>
	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
	<4B1D9FED.7090406@sun.com>
	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>
Message-ID: <4B1DD0BB.7010709@sun.com>

Reinier Zwitserloot wrote:
> String.hashCode()'s exact algorithm is codified in the official javadoc. It
> is therefore canon. Thus, changing String.hashCode breaks backwards
> compatibility. Java has never broken backwards compatibility in such a core
> feature. Hell freezes over before hashCode() will change comes to mind.
>   
Back in the dawn of time, the JLS also contained the javadoc of the 
platform classes.  JLSv1 had a hashing algorithm for string that that 
only sampled 8 or 9 characters of the string!  The actual javadoc had 
evolved to specify the current algorithm, which is a function of all the 
characters.  When the irresistible force the platform javadoc met the 
immovable object of the JLS, in this case the javadoc won and became the 
canonical specification (and the platform javadoc was quite sensibly 
removed from the JLS as of JLSv2).

Such discrepancies and changes were long ago in a Java platform far, far 
away.  It is vanishingly unlikely that String.hashCode will change again 
in the SE platform because the "behavioral compatibility" impact would 
be too large; see

"JDK Release Types and Compatibility Regions"
http://blogs.sun.com/darcy/entry/release_types_compatibility_regions

> If Strings are ever going to get a different hashCode algorithm, I expect it
> will be an internal affair, with special-casing code in e.g. HashMap to use
> the more efficient one, leaving the public-facing hashCode() intact, lest
> tons of existing code that relies on string hashCodes breaks.
>   

As I understand it, some sophisticated collection implementations like 
ConcurrentHashMap already have internal re-hashing logic to cope with 
poor-quality hashCode implementations.

The hashing algorithm of Strings.hashCode is certainly not wonderful and 
by default I'm against specifying the hashing algorithm of a class.  
However, giving the distinguished role of String, I don't foresee its 
hashing algorithm changing and I believe it is reasonable for strings in 
switch to rely on that algorithm being used.

-Joe


From jjb at google.com  Tue Dec  8 00:49:28 2009
From: jjb at google.com (Joshua Bloch)
Date: Tue, 8 Dec 2009 00:49:28 -0800
Subject: Strings in Switch
In-Reply-To: <4B1DD0BB.7010709@sun.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D4386.6080907@sun.com>
	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
	<4B1D9FED.7090406@sun.com>
	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>
	<4B1DD0BB.7010709@sun.com>
Message-ID: <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>

Joe,

A few very minor clarifications:

On Mon, Dec 7, 2009 at 8:06 PM, Joe Darcy <Joe.Darcy at sun.com> wrote:

>
> Back in the dawn of time, the JLS also contained the javadoc of the
> platform classes.  JLSv1 had a hashing algorithm for string that that
> only sampled 8 or 9 characters of the string!  The actual javadoc had
> evolved to specify the current algorithm, which is a function of all the
> characters.  When the irresistible force the platform javadoc met the
> immovable object of the JLS, in this case the javadoc won


Actually the spec for the String hash function in JLS1e was subtly broken,
and unimplementable.  The implemented hash function (which the spec was
meant to describe) was awful.  I used the unimplementable spec as
justification for changing the spec and the implementation.  In chaos, there
is opportunity.

Such discrepancies and changes were long ago in a Java platform far, far
> away.  It is vanishingly unlikely that String.hashCode will change again
> in the SE platform because the "behavioral compatibility" impact would
> be too large; see
>
> "JDK Release Types and Compatibility Regions"
> http://blogs.sun.com/darcy/entry/release_types_compatibility_regions


I am in complete agreement here.


> As I understand it, some sophisticated collection implementations like
> ConcurrentHashMap already have internal re-hashing logic to cope with
> poor-quality hashCode implementations.
>

In fact, the lowly HashMap has had a secondary "defensive" hash function
ever since 1.4.

                   Josh


From pbenedict at apache.org  Tue Dec  8 06:39:54 2009
From: pbenedict at apache.org (Paul Benedict)
Date: Tue, 8 Dec 2009 08:39:54 -0600
Subject: Strings in Switch
In-Reply-To: <17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D4386.6080907@sun.com>
	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
	<4B1D9FED.7090406@sun.com>
	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>
	<4B1DD0BB.7010709@sun.com>
	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>
Message-ID: <b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>

Joe,

> Such discrepancies and changes were long ago in a Java platform far, far
> away. ?It is vanishingly unlikely that String.hashCode will change again
> in the SE platform because the "behavioral compatibility" impact would
> be too large; see

I agree the change may be unlikely, but why bet your compiler on it?
Since you are encoding the result of the hash **in the class file**, I
think it is necessary to ensure it *never* changes. Do remedies exist?

Paul


From mthornton at optrak.co.uk  Tue Dec  8 06:52:22 2009
From: mthornton at optrak.co.uk (Mark Thornton)
Date: Tue, 08 Dec 2009 14:52:22 +0000
Subject: Strings in Switch
In-Reply-To: <b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>	<4B1D4386.6080907@sun.com>	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>	<4B1D9FED.7090406@sun.com>	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>	<4B1DD0BB.7010709@sun.com>	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>
	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>
Message-ID: <4B1E6826.4010103@optrak.co.uk>

Paul Benedict wrote:
> Joe,
>
>   
>> Such discrepancies and changes were long ago in a Java platform far, far
>> away.  It is vanishingly unlikely that String.hashCode will change again
>> in the SE platform because the "behavioral compatibility" impact would
>> be too large; see
>>     
>
> I agree the change may be unlikely, but why bet your compiler on it?
> Since you are encoding the result of the hash **in the class file**, I
> think it is necessary to ensure it *never* changes. Do remedies exist?
>
>
>   
Add a note in the JavaDoc to the effect that string switches depend on 
the hashcode algorithm not changing. Anyone changing the algorithm in 
spite of such a not could expect serious grief (shot at dawn)!

Mark Thornton


From reinier at zwitserloot.com  Tue Dec  8 08:23:29 2009
From: reinier at zwitserloot.com (Reinier Zwitserloot)
Date: Tue, 8 Dec 2009 17:23:29 +0100
Subject: Strings in Switch
In-Reply-To: <4B1E6826.4010103@optrak.co.uk>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D4386.6080907@sun.com>
	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
	<4B1D9FED.7090406@sun.com>
	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>
	<4B1DD0BB.7010709@sun.com>
	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>
	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>
	<4B1E6826.4010103@optrak.co.uk>
Message-ID: <560fb5ed0912080823t52ee3187yb088ab9b5b381540@mail.gmail.com>

A note in String's javadoc works for me. It doesn't neccessary have to
mention strings in switch. All it really needs to mention is that the
algorithm cannot, ever, change, period.

--Reinier Zwitserloot


On Tue, Dec 8, 2009 at 3:52 PM, Mark Thornton <mthornton at optrak.co.uk>wrote:

> Paul Benedict wrote:
> > Joe,
> >
> >
> >> Such discrepancies and changes were long ago in a Java platform far, far
> >> away.  It is vanishingly unlikely that String.hashCode will change again
> >> in the SE platform because the "behavioral compatibility" impact would
> >> be too large; see
> >>
> >
> > I agree the change may be unlikely, but why bet your compiler on it?
> > Since you are encoding the result of the hash **in the class file**, I
> > think it is necessary to ensure it *never* changes. Do remedies exist?
> >
> >
> >
> Add a note in the JavaDoc to the effect that string switches depend on
> the hashcode algorithm not changing. Anyone changing the algorithm in
> spite of such a not could expect serious grief (shot at dawn)!
>
> Mark Thornton
>
>
>
>


From Jonathan.Gibbons at Sun.COM  Tue Dec  8 08:47:55 2009
From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons)
Date: Tue, 08 Dec 2009 08:47:55 -0800
Subject: Strings in Switch
In-Reply-To: <b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D4386.6080907@sun.com>
	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
	<4B1D9FED.7090406@sun.com>
	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>
	<4B1DD0BB.7010709@sun.com>
	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>
	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>
Message-ID: <4B1E833B.1040200@sun.com>

Paul Benedict wrote:
> Joe,
>
>   
>> Such discrepancies and changes were long ago in a Java platform far, far
>> away.  It is vanishingly unlikely that String.hashCode will change again
>> in the SE platform because the "behavioral compatibility" impact would
>> be too large; see
>>     
>
> I agree the change may be unlikely, but why bet your compiler on it?
> Since you are encoding the result of the hash **in the class file**, I
> think it is necessary to ensure it *never* changes. Do remedies exist?
>
> Paul
>
>   
If hell were to freeze over, and String.hashCode were to change in JDK 
n, n >=8, then javac could emit different code for Strings in switch, 
depending on the value of -target.

-- Jon


From pbenedict at apache.org  Tue Dec  8 09:00:26 2009
From: pbenedict at apache.org (Paul Benedict)
Date: Tue, 8 Dec 2009 11:00:26 -0600
Subject: Strings in Switch
In-Reply-To: <4B1E833B.1040200@sun.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D4386.6080907@sun.com>
	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
	<4B1D9FED.7090406@sun.com>
	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>
	<4B1DD0BB.7010709@sun.com>
	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>
	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>
	<4B1E833B.1040200@sun.com>
Message-ID: <b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>

Jon,

On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
<Jonathan.Gibbons at sun.com> wrote:
> If hell were to freeze over, and String.hashCode were to change in JDK n, n
>>=8, then javac could emit different code for Strings in switch, depending
> on the value of -target.

Regarding the state of hell, I don't think a compiler implementation
should ever rely on such a gamble. The implication is obvious: if JDK
N makes a change (by Oracle, by some future owner of OpenJDK -- who
knows what happens 10+ years from now), then class files using the
OpenJDK de-sugaring would break. The emitted hash results would no
longer match the runtime hashes and execution would be unpredictable.

To safely emit hash results into byte code, I think you obviously need
to go the extra stretch and make a ruling on the algorithm never
changing. Isn't that just simply called being responsible?

Paul


From fredrik.ohrstrom at oracle.com  Wed Dec  9 02:42:04 2009
From: fredrik.ohrstrom at oracle.com (=?ISO-8859-1?Q?Fredrik_=D6hrstr=F6m?=)
Date: Wed, 09 Dec 2009 11:42:04 +0100
Subject: Strings in Switch
In-Reply-To: <b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>	<4B1D4386.6080907@sun.com>	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>	<4B1D9FED.7090406@sun.com>	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>	<4B1DD0BB.7010709@sun.com>	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>	<4B1E833B.1040200@sun.com>
	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>
Message-ID: <4B1F7EFC.2030601@oracle.com>

This discussion reeks of premature optimization.... A tableswitch on
arbitrary large numbers (aka hashcodes) must be compiled into a sequence
of compares anyway (at least on the x86 platform). If the tableswitch
happens on a sequence of relatively consecutive  numbers, then the JVM
can create a jump table. But for hashcodes, no way!

Therefore a sequence of compares that work with the interned string
pointers will be faster. If interning is slow (and/or wastes memory)
then a sequence of tailored compares that work directly on the
characters will be the fastest. For example:

switch (s) {
  case "Hello World"  : .... break;
  case "Hello Wooot" : .... break;
  default: ....
}

Could, for example, be compiled into the pseudo-c code:

if (s.length == 11) {
  if (s.chars[8] == L'r' && !wcscmp(s.chars,  L"Hello World")) { ...;
goto done; }
  if (s.chars[8] == L'o' && !wcscmp(s.chars,  L"Hello Wooot")) { ...;
goto done; }
}
/*default*/
  ....
done:

Now should javac do this advanced analysis? No! Javac should only
generate straight forward string compares and jumps that is a relatively
easy pattern for the JVM to recognize as a string switch. Then the JVM
can do the advanced optimizations if and when the code is actually
determined to be a hot spot.

//Fredrik

Paul Benedict skrev:
> Jon,
>
> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
> <Jonathan.Gibbons at sun.com> wrote:
>   
>> If hell were to freeze over, and String.hashCode were to change in JDK n, n
>>     
>>> =8, then javac could emit different code for Strings in switch, depending
>>>       
>> on the value of -target.
>>     
>
> Regarding the state of hell, I don't think a compiler implementation
> should ever rely on such a gamble. The implication is obvious: if JDK
> N makes a change (by Oracle, by some future owner of OpenJDK -- who
> knows what happens 10+ years from now), then class files using the
> OpenJDK de-sugaring would break. The emitted hash results would no
> longer match the runtime hashes and execution would be unpredictable.
>
> To safely emit hash results into byte code, I think you obviously need
> to go the extra stretch and make a ruling on the algorithm never
> changing. Isn't that just simply called being responsible?
>
> Paul
>
>   


From reinier at zwitserloot.com  Wed Dec  9 02:53:34 2009
From: reinier at zwitserloot.com (Reinier Zwitserloot)
Date: Wed, 9 Dec 2009 11:53:34 +0100
Subject: Strings in Switch
In-Reply-To: <4B1F7EFC.2030601@oracle.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D9FED.7090406@sun.com>
	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>
	<4B1DD0BB.7010709@sun.com>
	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>
	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>
	<4B1E833B.1040200@sun.com>
	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>
	<4B1F7EFC.2030601@oracle.com>
Message-ID: <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>

As I understand it, switch-in-strings is handled during the "lower" phase of
javac, which must desugar the string switch into legal java code.

This makes a series of if/elseif cases actually impossible, due to switch's
unique behaviour in regards to fall-through.... I think. Let's try this out.
If we have:

switch(someString) {
    case "Hello1":
       m1();
    default:
    case "Hello2":
       m2();
       break;
    case "Hello3":
       m3();
}

how should this translate to a series of if statements, in a way that is
easier than the current nested double switch scenario? I don't really see a
way.

There's a compromise, where the original string-to-integer conversion is
done with a series of ifs instead of a switch on hashCode. I don't really
care about removing the dependency on string's hashCode, but if this is
simpler, than, by all means. Until there's proof otherwise, I side with
Fredrik that a switch on hashCodes is not going to have a measurable
performance impact. As an example, the above would desugar to (with optional
switch on string's length during string-to-number conversion omitted. That
may actually be a good idea; it's straight forward and does have an obvious
performance benefit):

int $unique;
if ("Hello1".equals(someString)) $unique = 0;
else if ("Hello2".equals(someString)) $unique = 1;
else if ("Hello3".equals(someString)) $unique = 2;
else $unique = 3;

switch ($unique) {
    case 0:
       m1();
    case 3:
    case 1:
       m2();
       break;
    case 2:
       m3();
}


It avoids dependency on string hashcode (which, for the record, I do not
think needs to be avoided), and it's straightforward and simple for all
possible forms of string-in-switch that I can think of.

--Reinier Zwitserloot


On Wed, Dec 9, 2009 at 11:42 AM, Fredrik ?hrstr?m <
fredrik.ohrstrom at oracle.com> wrote:

> This discussion reeks of premature optimization.... A tableswitch on
> arbitrary large numbers (aka hashcodes) must be compiled into a sequence
> of compares anyway (at least on the x86 platform). If the tableswitch
> happens on a sequence of relatively consecutive  numbers, then the JVM
> can create a jump table. But for hashcodes, no way!
>
> Therefore a sequence of compares that work with the interned string
> pointers will be faster. If interning is slow (and/or wastes memory)
> then a sequence of tailored compares that work directly on the
> characters will be the fastest. For example:
>
> switch (s) {
>  case "Hello World"  : .... break;
>  case "Hello Wooot" : .... break;
>  default: ....
> }
>
> Could, for example, be compiled into the pseudo-c code:
>
> if (s.length == 11) {
>  if (s.chars[8] == L'r' && !wcscmp(s.chars,  L"Hello World")) { ...;
> goto done; }
>  if (s.chars[8] == L'o' && !wcscmp(s.chars,  L"Hello Wooot")) { ...;
> goto done; }
> }
> /*default*/
>  ....
> done:
>
> Now should javac do this advanced analysis? No! Javac should only
> generate straight forward string compares and jumps that is a relatively
> easy pattern for the JVM to recognize as a string switch. Then the JVM
> can do the advanced optimizations if and when the code is actually
> determined to be a hot spot.
>
> //Fredrik
>
> Paul Benedict skrev:
> > Jon,
> >
> > On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
> > <Jonathan.Gibbons at sun.com> wrote:
> >
> >> If hell were to freeze over, and String.hashCode were to change in JDK
> n, n
> >>
> >>> =8, then javac could emit different code for Strings in switch,
> depending
> >>>
> >> on the value of -target.
> >>
> >
> > Regarding the state of hell, I don't think a compiler implementation
> > should ever rely on such a gamble. The implication is obvious: if JDK
> > N makes a change (by Oracle, by some future owner of OpenJDK -- who
> > knows what happens 10+ years from now), then class files using the
> > OpenJDK de-sugaring would break. The emitted hash results would no
> > longer match the runtime hashes and execution would be unpredictable.
> >
> > To safely emit hash results into byte code, I think you obviously need
> > to go the extra stretch and make a ruling on the algorithm never
> > changing. Isn't that just simply called being responsible?
> >
> > Paul
> >
> >
>
>
>


From Ulf.Zibis at gmx.de  Wed Dec  9 04:50:03 2009
From: Ulf.Zibis at gmx.de (Ulf Zibis)
Date: Wed, 09 Dec 2009 13:50:03 +0100
Subject: Strings in Switch
In-Reply-To: <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>	<4B1D9FED.7090406@sun.com>	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>	<4B1DD0BB.7010709@sun.com>	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>	<4B1E833B.1040200@sun.com>	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>	<4B1F7EFC.2030601@oracle.com>
	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>
Message-ID: <4B1F9CFB.5060904@gmx.de>

+1

... but isn't 'case "Hello2":' superfluous ? I guess it's covered by 
'default:'

Additionally String#equals(Object object) could be optimized to benefit 
from the hash codes:
    int equalByHashThreshold = 2;

    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = count;
            if (n == anotherString.count &&
                    (equalByHashThreshold == 0 || --equalByHashThreshold 
== 0) &&
                    (anotherString.equalByHashThreshold == 0 || 
--anotherString.equalByHashThreshold == 0) &&
                    hash() == anotherString.hash()) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = offset;
                int j = anotherString.offset;
                while (n-- != 0) {
                    if (v1[i++] != v2[j++])
                        return false;
                }
                return true;
            }
        }
        return false;
    }

    public int hashCode() {
        int h = hash;
        if (h == 0) {
            int off = offset;
            char val[] = value;
            int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
            equalByHashThreshold = 0;
        }
        return h;
    }

Alternative:
    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = count;
            if (n == anotherString.count &&
                    hash != 0 && anotherString.hash != 0 &&
                    hash() == anotherString.hash()) {
                hash = -1;
                anotherString.hash = -1;
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = offset;
                int j = anotherString.offset;
                while (n-- != 0) {
                    if (v1[i++] != v2[j++])
                        return false;
                }
                return true;
            }
        }
        return false;
    }

    public int hashCode() {
        int h = hash;
        if (h == 0 || --h == 0) {
            int off = offset;
            char val[] = value;
            int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }


-Ulf


Am 09.12.2009 11:53, Reinier Zwitserloot schrieb:
> As I understand it, switch-in-strings is handled during the "lower" phase of
> javac, which must desugar the string switch into legal java code.
>
> This makes a series of if/elseif cases actually impossible, due to switch's
> unique behaviour in regards to fall-through.... I think. Let's try this out.
> If we have:
>
> switch(someString) {
>     case "Hello1":
>        m1();
>     default:
>     case "Hello2":
>        m2();
>        break;
>     case "Hello3":
>        m3();
> }
>
> how should this translate to a series of if statements, in a way that is
> easier than the current nested double switch scenario? I don't really see a
> way.
>
> There's a compromise, where the original string-to-integer conversion is
> done with a series of ifs instead of a switch on hashCode. I don't really
> care about removing the dependency on string's hashCode, but if this is
> simpler, than, by all means. Until there's proof otherwise, I side with
> Fredrik that a switch on hashCodes is not going to have a measurable
> performance impact. As an example, the above would desugar to (with optional
> switch on string's length during string-to-number conversion omitted. That
> may actually be a good idea; it's straight forward and does have an obvious
> performance benefit):
>
> int $unique;
> if ("Hello1".equals(someString)) $unique = 0;
> else if ("Hello2".equals(someString)) $unique = 1;
> else if ("Hello3".equals(someString)) $unique = 2;
> else $unique = 3;
>
> switch ($unique) {
>     case 0:
>        m1();
>     case 3:
>     case 1:
>        m2();
>        break;
>     case 2:
>        m3();
> }
>
>
> It avoids dependency on string hashcode (which, for the record, I do not
> think needs to be avoided), and it's straightforward and simple for all
> possible forms of string-in-switch that I can think of.
>
> --Reinier Zwitserloot
>
>
>
> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik ?hrstr?m <
> fredrik.ohrstrom at oracle.com> wrote:
>
>   
>> This discussion reeks of premature optimization.... A tableswitch on
>> arbitrary large numbers (aka hashcodes) must be compiled into a sequence
>> of compares anyway (at least on the x86 platform). If the tableswitch
>> happens on a sequence of relatively consecutive  numbers, then the JVM
>> can create a jump table. But for hashcodes, no way!
>>
>> Therefore a sequence of compares that work with the interned string
>> pointers will be faster. If interning is slow (and/or wastes memory)
>> then a sequence of tailored compares that work directly on the
>> characters will be the fastest. For example:
>>
>> switch (s) {
>>  case "Hello World"  : .... break;
>>  case "Hello Wooot" : .... break;
>>  default: ....
>> }
>>
>> Could, for example, be compiled into the pseudo-c code:
>>
>> if (s.length == 11) {
>>  if (s.chars[8] == L'r' && !wcscmp(s.chars,  L"Hello World")) { ...;
>> goto done; }
>>  if (s.chars[8] == L'o' && !wcscmp(s.chars,  L"Hello Wooot")) { ...;
>> goto done; }
>> }
>> /*default*/
>>  ....
>> done:
>>
>> Now should javac do this advanced analysis? No! Javac should only
>> generate straight forward string compares and jumps that is a relatively
>> easy pattern for the JVM to recognize as a string switch. Then the JVM
>> can do the advanced optimizations if and when the code is actually
>> determined to be a hot spot.
>>
>> //Fredrik
>>
>> Paul Benedict skrev:
>>     
>>> Jon,
>>>
>>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
>>> <Jonathan.Gibbons at sun.com> wrote:
>>>
>>>       
>>>> If hell were to freeze over, and String.hashCode were to change in JDK
>>>>         
>> n, n
>>     
>>>>> =8, then javac could emit different code for Strings in switch,
>>>>>           
>> depending
>>     
>>>> on the value of -target.
>>>>
>>>>         
>>> Regarding the state of hell, I don't think a compiler implementation
>>> should ever rely on such a gamble. The implication is obvious: if JDK
>>> N makes a change (by Oracle, by some future owner of OpenJDK -- who
>>> knows what happens 10+ years from now), then class files using the
>>> OpenJDK de-sugaring would break. The emitted hash results would no
>>> longer match the runtime hashes and execution would be unpredictable.
>>>
>>> To safely emit hash results into byte code, I think you obviously need
>>> to go the extra stretch and make a ruling on the algorithm never
>>> changing. Isn't that just simply called being responsible?
>>>
>>> Paul
>>>
>>>
>>>       
>>
>>     
>
>   


From fredrik.ohrstrom at oracle.com  Wed Dec  9 07:34:38 2009
From: fredrik.ohrstrom at oracle.com (=?UTF-8?B?RnJlZHJpayDDlmhyc3Ryw7Zt?=)
Date: Wed, 09 Dec 2009 16:34:38 +0100
Subject: Strings in Switch
In-Reply-To: <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>	
	<4B1D9FED.7090406@sun.com>	
	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>	
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>	
	<4B1DD0BB.7010709@sun.com>	
	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>	
	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>	
	<4B1E833B.1040200@sun.com>	
	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>	
	<4B1F7EFC.2030601@oracle.com>
	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>
Message-ID: <4B1FC38E.2010702@oracle.com>

Reinier Zwitserloot skrev:
> int $unique;
> if ("Hello1".equals(someString)) $unique = 0;
> else if ("Hello2".equals(someString)) $unique = 1;
> else if ("Hello3".equals(someString)) $unique = 2;
> else $unique = 3;
>
> switch ($unique) {
>     case 0:
>        m1();
>     case 3:
>     case 1:
>        m2();
>        break;
>     case 2:
>        m3();
> }
>
>
> It avoids dependency on string hashcode (which, for the record, I do 
> not think needs to be avoided), and it's straightforward and simple 
> for all possible forms of string-in-switch that I can think of.
Yes, this is much better. This is much easier to understand and the 
pattern is trivial to catch in the JVM. There are  many opportunities 
for the compiler (even without strings-in-switch awareness) to optimize 
this sequence of compares and it avoids forcing a full calculation of a 
hash code that has to traverse the full string. Remember that a string 
compare can terminate early, a hashcode calculation cannot. Also a 
string compare works on as large blocks as possible per iteration 
(8bytes in 64 bit machines, even 16byte blocks with SSE2). If the JVM 
decides that it would be beneficial to use its own internal hashcodes to 
optimize the code, then it can do so.

//Fredrik
> --Reinier Zwitserloot
>
>
>
> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik ?hrstr?m 
> <fredrik.ohrstrom at oracle.com <mailto:fredrik.ohrstrom at oracle.com>> wrote:
>
>     This discussion reeks of premature optimization.... A tableswitch on
>     arbitrary large numbers (aka hashcodes) must be compiled into a
>     sequence
>     of compares anyway (at least on the x86 platform). If the tableswitch
>     happens on a sequence of relatively consecutive  numbers, then the JVM
>     can create a jump table. But for hashcodes, no way!
>
>     Therefore a sequence of compares that work with the interned string
>     pointers will be faster. If interning is slow (and/or wastes memory)
>     then a sequence of tailored compares that work directly on the
>     characters will be the fastest. For example:
>
>     switch (s) {
>      case "Hello World"  : .... break;
>      case "Hello Wooot" : .... break;
>      default: ....
>     }
>
>     Could, for example, be compiled into the pseudo-c code:
>
>     if (s.length == 11) {
>      if (s.chars[8] == L'r' && !wcscmp(s.chars,  L"Hello World")) { ...;
>     goto done; }
>      if (s.chars[8] == L'o' && !wcscmp(s.chars,  L"Hello Wooot")) { ...;
>     goto done; }
>     }
>     /*default*/
>      ....
>     done:
>
>     Now should javac do this advanced analysis? No! Javac should only
>     generate straight forward string compares and jumps that is a
>     relatively
>     easy pattern for the JVM to recognize as a string switch. Then the JVM
>     can do the advanced optimizations if and when the code is actually
>     determined to be a hot spot.
>
>     //Fredrik
>
>     Paul Benedict skrev:
>     > Jon,
>     >
>     > On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
>     > <Jonathan.Gibbons at sun.com <mailto:Jonathan.Gibbons at sun.com>> wrote:
>     >
>     >> If hell were to freeze over, and String.hashCode were to change
>     in JDK n, n
>     >>
>     >>> =8, then javac could emit different code for Strings in
>     switch, depending
>     >>>
>     >> on the value of -target.
>     >>
>     >
>     > Regarding the state of hell, I don't think a compiler implementation
>     > should ever rely on such a gamble. The implication is obvious:
>     if JDK
>     > N makes a change (by Oracle, by some future owner of OpenJDK -- who
>     > knows what happens 10+ years from now), then class files using the
>     > OpenJDK de-sugaring would break. The emitted hash results would no
>     > longer match the runtime hashes and execution would be
>     unpredictable.
>     >
>     > To safely emit hash results into byte code, I think you
>     obviously need
>     > to go the extra stretch and make a ruling on the algorithm never
>     > changing. Isn't that just simply called being responsible?
>     >
>     > Paul
>     >
>     >
>
>
>


From markmahieu at googlemail.com  Wed Dec  9 07:37:04 2009
From: markmahieu at googlemail.com (Mark Mahieu)
Date: Wed, 9 Dec 2009 15:37:04 +0000
Subject: Strings in Switch
In-Reply-To: <560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D9FED.7090406@sun.com>
	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>
	<4B1DD0BB.7010709@sun.com>
	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>
	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>
	<4B1E833B.1040200@sun.com>
	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>
	<4B1F7EFC.2030601@oracle.com>
	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>
Message-ID: <C12E4054-109E-41DD-A1A8-2F6C8CF19CBD@googlemail.com>


On 9 Dec 2009, at 10:53, Reinier Zwitserloot wrote:

> As I understand it, switch-in-strings is handled during the "lower" phase of
> javac, which must desugar the string switch into legal java code.

Hmm, that's not quite how I understand it.

I picture the "lower" phase as a bridge between the side of javac which is concerned with the Java Language spec (parsing, flow analysis etc), and the side which deals with the VM spec (ie. bytecode generation).  Its existence means that neither side need be unnecessarily complicated by details of the other.

So, its input is syntax trees which are valid as far as the language spec is concerned, and its output is a simpler set of trees which can be used by the "gen" phase to produce valid JVM classes - but that 'simpler' set is not necessarily an exact subset of the trees used by earlier phases; ie. the output need not be directly representable as valid Java *language* code (synthetics and some uses of "let" expressions for example).

> As an example, the above would desugar to (with optional
> switch on string's length during string-to-number conversion omitted. That
> may actually be a good idea; it's straight forward and does have an obvious
> performance benefit):
> 
> int $unique;
> if ("Hello1".equals(someString)) $unique = 0;
> else if ("Hello2".equals(someString)) $unique = 1;
> else if ("Hello3".equals(someString)) $unique = 2;
> else $unique = 3;

I'm afraid a translation along these lines is likely to entice end users to attempt premature optimisation by messing with the order of the cases.

But I still don't see the problem with what Joe proposed (months ago) and implemented.


Regards,

Mark


From fredrik.ohrstrom at oracle.com  Wed Dec  9 07:52:23 2009
From: fredrik.ohrstrom at oracle.com (=?UTF-8?B?RnJlZHJpayDDlmhyc3Ryw7Zt?=)
Date: Wed, 09 Dec 2009 16:52:23 +0100
Subject: Strings in Switch
In-Reply-To: <4B1F9CFB.5060904@gmx.de>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>	<4B1D9FED.7090406@sun.com>	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>	<4B1DD0BB.7010709@sun.com>	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>	<4B1E833B.1040200@sun.com>	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>	<4B1F7EFC.2030601@oracle.com>
	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>
	<4B1F9CFB.5060904@gmx.de>
Message-ID: <4B1FC7B7.5070108@oracle.com>

Ulf Zibis skrev:
> ... but isn't 'case "Hello2":' superfluous ? I guess it's covered by 
> 'default:'
Yes, but I think the example highlighted the intricacies of the switch 
syntax. :)
> Additionally String#equals(Object object) could be optimized to 
> benefit from the hash codes:
Maintaining up to date hashcodes for every string will essentially force 
you to access every strings twice.
There are not enough equals operations on the strings to make up for 
this precalculation cost. Especially since a compare can be so much more 
efficient than a hashcode calculation. Besides, a lot of strings are 
already interned which means that you can always start by checking identity.

//Fredrik
>    int equalByHashThreshold = 2;
>
>    public boolean equals(Object anObject) {
>        if (this == anObject) {
>            return true;
>        }
>        if (anObject instanceof String) {
>            String anotherString = (String)anObject;
>            int n = count;
>            if (n == anotherString.count &&
>                    (equalByHashThreshold == 0 || 
> --equalByHashThreshold == 0) &&
>                    (anotherString.equalByHashThreshold == 0 || 
> --anotherString.equalByHashThreshold == 0) &&
>                    hash() == anotherString.hash()) {
>                char v1[] = value;
>                char v2[] = anotherString.value;
>                int i = offset;
>                int j = anotherString.offset;
>                while (n-- != 0) {
>                    if (v1[i++] != v2[j++])
>                        return false;
>                }
>                return true;
>            }
>        }
>        return false;
>    }
>
>    public int hashCode() {
>        int h = hash;
>        if (h == 0) {
>            int off = offset;
>            char val[] = value;
>            int len = count;
>
>            for (int i = 0; i < len; i++) {
>                h = 31*h + val[off++];
>            }
>            hash = h;
>            equalByHashThreshold = 0;
>        }
>        return h;
>    }
>
> Alternative:
>    public boolean equals(Object anObject) {
>        if (this == anObject) {
>            return true;
>        }
>        if (anObject instanceof String) {
>            String anotherString = (String)anObject;
>            int n = count;
>            if (n == anotherString.count &&
>                    hash != 0 && anotherString.hash != 0 &&
>                    hash() == anotherString.hash()) {
>                hash = -1;
>                anotherString.hash = -1;
>                char v1[] = value;
>                char v2[] = anotherString.value;
>                int i = offset;
>                int j = anotherString.offset;
>                while (n-- != 0) {
>                    if (v1[i++] != v2[j++])
>                        return false;
>                }
>                return true;
>            }
>        }
>        return false;
>    }
>
>    public int hashCode() {
>        int h = hash;
>        if (h == 0 || --h == 0) {
>            int off = offset;
>            char val[] = value;
>            int len = count;
>
>            for (int i = 0; i < len; i++) {
>                h = 31*h + val[off++];
>            }
>            hash = h;
>        }
>        return h;
>    }
>
>
> -Ulf
>
>
> Am 09.12.2009 11:53, Reinier Zwitserloot schrieb:
>> As I understand it, switch-in-strings is handled during the "lower" 
>> phase of
>> javac, which must desugar the string switch into legal java code.
>>
>> This makes a series of if/elseif cases actually impossible, due to 
>> switch's
>> unique behaviour in regards to fall-through.... I think. Let's try 
>> this out.
>> If we have:
>>
>> switch(someString) {
>>     case "Hello1":
>>        m1();
>>     default:
>>     case "Hello2":
>>        m2();
>>        break;
>>     case "Hello3":
>>        m3();
>> }
>>
>> how should this translate to a series of if statements, in a way that is
>> easier than the current nested double switch scenario? I don't really 
>> see a
>> way.
>>
>> There's a compromise, where the original string-to-integer conversion is
>> done with a series of ifs instead of a switch on hashCode. I don't 
>> really
>> care about removing the dependency on string's hashCode, but if this is
>> simpler, than, by all means. Until there's proof otherwise, I side with
>> Fredrik that a switch on hashCodes is not going to have a measurable
>> performance impact. As an example, the above would desugar to (with 
>> optional
>> switch on string's length during string-to-number conversion omitted. 
>> That
>> may actually be a good idea; it's straight forward and does have an 
>> obvious
>> performance benefit):
>>
>> int $unique;
>> if ("Hello1".equals(someString)) $unique = 0;
>> else if ("Hello2".equals(someString)) $unique = 1;
>> else if ("Hello3".equals(someString)) $unique = 2;
>> else $unique = 3;
>>
>> switch ($unique) {
>>     case 0:
>>        m1();
>>     case 3:
>>     case 1:
>>        m2();
>>        break;
>>     case 2:
>>        m3();
>> }
>>
>>
>> It avoids dependency on string hashcode (which, for the record, I do not
>> think needs to be avoided), and it's straightforward and simple for all
>> possible forms of string-in-switch that I can think of.
>>
>> --Reinier Zwitserloot
>>
>>
>>
>> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik ?hrstr?m <
>> fredrik.ohrstrom at oracle.com> wrote:
>>
>>  
>>> This discussion reeks of premature optimization.... A tableswitch on
>>> arbitrary large numbers (aka hashcodes) must be compiled into a 
>>> sequence
>>> of compares anyway (at least on the x86 platform). If the tableswitch
>>> happens on a sequence of relatively consecutive  numbers, then the JVM
>>> can create a jump table. But for hashcodes, no way!
>>>
>>> Therefore a sequence of compares that work with the interned string
>>> pointers will be faster. If interning is slow (and/or wastes memory)
>>> then a sequence of tailored compares that work directly on the
>>> characters will be the fastest. For example:
>>>
>>> switch (s) {
>>>  case "Hello World"  : .... break;
>>>  case "Hello Wooot" : .... break;
>>>  default: ....
>>> }
>>>
>>> Could, for example, be compiled into the pseudo-c code:
>>>
>>> if (s.length == 11) {
>>>  if (s.chars[8] == L'r' && !wcscmp(s.chars,  L"Hello World")) { ...;
>>> goto done; }
>>>  if (s.chars[8] == L'o' && !wcscmp(s.chars,  L"Hello Wooot")) { ...;
>>> goto done; }
>>> }
>>> /*default*/
>>>  ....
>>> done:
>>>
>>> Now should javac do this advanced analysis? No! Javac should only
>>> generate straight forward string compares and jumps that is a 
>>> relatively
>>> easy pattern for the JVM to recognize as a string switch. Then the JVM
>>> can do the advanced optimizations if and when the code is actually
>>> determined to be a hot spot.
>>>
>>> //Fredrik
>>>
>>> Paul Benedict skrev:
>>>    
>>>> Jon,
>>>>
>>>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
>>>> <Jonathan.Gibbons at sun.com> wrote:
>>>>
>>>>      
>>>>> If hell were to freeze over, and String.hashCode were to change in 
>>>>> JDK
>>>>>         
>>> n, n
>>>    
>>>>>> =8, then javac could emit different code for Strings in switch,
>>>>>>           
>>> depending
>>>    
>>>>> on the value of -target.
>>>>>
>>>>>         
>>>> Regarding the state of hell, I don't think a compiler implementation
>>>> should ever rely on such a gamble. The implication is obvious: if JDK
>>>> N makes a change (by Oracle, by some future owner of OpenJDK -- who
>>>> knows what happens 10+ years from now), then class files using the
>>>> OpenJDK de-sugaring would break. The emitted hash results would no
>>>> longer match the runtime hashes and execution would be unpredictable.
>>>>
>>>> To safely emit hash results into byte code, I think you obviously need
>>>> to go the extra stretch and make a ruling on the algorithm never
>>>> changing. Isn't that just simply called being responsible?
>>>>
>>>> Paul
>>>>
>>>>
>>>>       
>>>
>>>     
>>
>>   
>


From per at bothner.com  Wed Dec  9 08:37:14 2009
From: per at bothner.com (Per Bothner)
Date: Wed, 09 Dec 2009 08:37:14 -0800
Subject: Strings in Switch
In-Reply-To: <4B1F7EFC.2030601@oracle.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>	<4B1D4386.6080907@sun.com>	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>	<4B1D9FED.7090406@sun.com>	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>	<4B1DD0BB.7010709@sun.com>	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>	<4B1E833B.1040200@sun.com>	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>
	<4B1F7EFC.2030601@oracle.com>
Message-ID: <4B1FD23A.1050404@bothner.com>

On 12/09/2009 02:42 AM, Fredrik ?hrstr?m wrote:
> A tableswitch on
> arbitrary large numbers (aka hashcodes) must be compiled into a sequence
> of compares anyway (at least on the x86 platform). If the tableswitch
> happens on a sequence of relatively consecutive  numbers, then the JVM
> can create a jump table. But for hashcodes, no way!

A tableswitch on arbitrary large numbers can be compiled
to use binary search in a sorted array, which should be fairly
efficient.  (That is why the tableswitch entries are required to
be sorted.)
-- 
	--Per Bothner
per at bothner.com   http://per.bothner.com/


From per at bothner.com  Wed Dec  9 08:42:37 2009
From: per at bothner.com (Per Bothner)
Date: Wed, 09 Dec 2009 08:42:37 -0800
Subject: Strings in Switch
In-Reply-To: <4B1FC7B7.5070108@oracle.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>	<4B1D9FED.7090406@sun.com>	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>	<4B1DD0BB.7010709@sun.com>	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>	<4B1E833B.1040200@sun.com>	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>	<4B1F7EFC.2030601@oracle.com>	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>	<4B1F9CFB.5060904@gmx.de>
	<4B1FC7B7.5070108@oracle.com>
Message-ID: <4B1FD37D.9020002@bothner.com>

On 12/09/2009 07:52 AM, Fredrik ?hrstr?m wrote:
> Especially since a compare can be so much more
> efficient than a hashcode calculation.

I'd be surprised if there is noticeable difference
on modern desktop-class processors: Either way you
have to do the memory reads, and that's what costs
- computation is close to free.
-- 
	--Per Bothner
per at bothner.com   http://per.bothner.com/


From forax at univ-mlv.fr  Wed Dec  9 09:03:25 2009
From: forax at univ-mlv.fr (=?UTF-8?B?UsOpbWkgRm9yYXg=?=)
Date: Wed, 09 Dec 2009 18:03:25 +0100
Subject: Strings in Switch
In-Reply-To: <4B1FC7B7.5070108@oracle.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>	<4B1D9FED.7090406@sun.com>	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>	<4B1DD0BB.7010709@sun.com>	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>	<4B1E833B.1040200@sun.com>	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>	<4B1F7EFC.2030601@oracle.com>	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>	<4B1F9CFB.5060904@gmx.de>
	<4B1FC7B7.5070108@oracle.com>
Message-ID: <4B1FD85D.8060701@univ-mlv.fr>

Le 09/12/2009 16:52, Fredrik ?hrstr?m a ?crit :
> Ulf Zibis skrev:
>    
>> ... but isn't 'case "Hello2":' superfluous ? I guess it's covered by
>> 'default:'
>>      
> Yes, but I think the example highlighted the intricacies of the switch
> syntax. :)
>    
>> Additionally String#equals(Object object) could be optimized to
>> benefit from the hash codes:
>>      
> Maintaining up to date hashcodes for every string will essentially force
> you to access every strings twice.
> There are not enough equals operations on the strings to make up for
> this precalculation cost. Especially since a compare can be so much more
> efficient than a hashcode calculation. Besides, a lot of strings are
> already interned which means that you can always start by checking identity.
>    

Hi Fredrik,
Don't forget that String.hashCode() are only calculated once
(in openjdk implementation), String is an non-mutable object.

R?mi


> //Fredrik
>    


>>     int equalByHashThreshold = 2;
>>
>>     public boolean equals(Object anObject) {
>>         if (this == anObject) {
>>             return true;
>>         }
>>         if (anObject instanceof String) {
>>             String anotherString = (String)anObject;
>>             int n = count;
>>             if (n == anotherString.count&&
>>                     (equalByHashThreshold == 0 ||
>> --equalByHashThreshold == 0)&&
>>                     (anotherString.equalByHashThreshold == 0 ||
>> --anotherString.equalByHashThreshold == 0)&&
>>                     hash() == anotherString.hash()) {
>>                 char v1[] = value;
>>                 char v2[] = anotherString.value;
>>                 int i = offset;
>>                 int j = anotherString.offset;
>>                 while (n-- != 0) {
>>                     if (v1[i++] != v2[j++])
>>                         return false;
>>                 }
>>                 return true;
>>             }
>>         }
>>         return false;
>>     }
>>
>>     public int hashCode() {
>>         int h = hash;
>>         if (h == 0) {
>>             int off = offset;
>>             char val[] = value;
>>             int len = count;
>>
>>             for (int i = 0; i<  len; i++) {
>>                 h = 31*h + val[off++];
>>             }
>>             hash = h;
>>             equalByHashThreshold = 0;
>>         }
>>         return h;
>>     }
>>
>> Alternative:
>>     public boolean equals(Object anObject) {
>>         if (this == anObject) {
>>             return true;
>>         }
>>         if (anObject instanceof String) {
>>             String anotherString = (String)anObject;
>>             int n = count;
>>             if (n == anotherString.count&&
>>                     hash != 0&&  anotherString.hash != 0&&
>>                     hash() == anotherString.hash()) {
>>                 hash = -1;
>>                 anotherString.hash = -1;
>>                 char v1[] = value;
>>                 char v2[] = anotherString.value;
>>                 int i = offset;
>>                 int j = anotherString.offset;
>>                 while (n-- != 0) {
>>                     if (v1[i++] != v2[j++])
>>                         return false;
>>                 }
>>                 return true;
>>             }
>>         }
>>         return false;
>>     }
>>
>>     public int hashCode() {
>>         int h = hash;
>>         if (h == 0 || --h == 0) {
>>             int off = offset;
>>             char val[] = value;
>>             int len = count;
>>
>>             for (int i = 0; i<  len; i++) {
>>                 h = 31*h + val[off++];
>>             }
>>             hash = h;
>>         }
>>         return h;
>>     }
>>
>>
>> -Ulf
>>
>>
>> Am 09.12.2009 11:53, Reinier Zwitserloot schrieb:
>>      
>>> As I understand it, switch-in-strings is handled during the "lower"
>>> phase of
>>> javac, which must desugar the string switch into legal java code.
>>>
>>> This makes a series of if/elseif cases actually impossible, due to
>>> switch's
>>> unique behaviour in regards to fall-through.... I think. Let's try
>>> this out.
>>> If we have:
>>>
>>> switch(someString) {
>>>      case "Hello1":
>>>         m1();
>>>      default:
>>>      case "Hello2":
>>>         m2();
>>>         break;
>>>      case "Hello3":
>>>         m3();
>>> }
>>>
>>> how should this translate to a series of if statements, in a way that is
>>> easier than the current nested double switch scenario? I don't really
>>> see a
>>> way.
>>>
>>> There's a compromise, where the original string-to-integer conversion is
>>> done with a series of ifs instead of a switch on hashCode. I don't
>>> really
>>> care about removing the dependency on string's hashCode, but if this is
>>> simpler, than, by all means. Until there's proof otherwise, I side with
>>> Fredrik that a switch on hashCodes is not going to have a measurable
>>> performance impact. As an example, the above would desugar to (with
>>> optional
>>> switch on string's length during string-to-number conversion omitted.
>>> That
>>> may actually be a good idea; it's straight forward and does have an
>>> obvious
>>> performance benefit):
>>>
>>> int $unique;
>>> if ("Hello1".equals(someString)) $unique = 0;
>>> else if ("Hello2".equals(someString)) $unique = 1;
>>> else if ("Hello3".equals(someString)) $unique = 2;
>>> else $unique = 3;
>>>
>>> switch ($unique) {
>>>      case 0:
>>>         m1();
>>>      case 3:
>>>      case 1:
>>>         m2();
>>>         break;
>>>      case 2:
>>>         m3();
>>> }
>>>
>>>
>>> It avoids dependency on string hashcode (which, for the record, I do not
>>> think needs to be avoided), and it's straightforward and simple for all
>>> possible forms of string-in-switch that I can think of.
>>>
>>> --Reinier Zwitserloot
>>>
>>>
>>>
>>> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik ?hrstr?m<
>>> fredrik.ohrstrom at oracle.com>  wrote:
>>>
>>>
>>>        
>>>> This discussion reeks of premature optimization.... A tableswitch on
>>>> arbitrary large numbers (aka hashcodes) must be compiled into a
>>>> sequence
>>>> of compares anyway (at least on the x86 platform). If the tableswitch
>>>> happens on a sequence of relatively consecutive  numbers, then the JVM
>>>> can create a jump table. But for hashcodes, no way!
>>>>
>>>> Therefore a sequence of compares that work with the interned string
>>>> pointers will be faster. If interning is slow (and/or wastes memory)
>>>> then a sequence of tailored compares that work directly on the
>>>> characters will be the fastest. For example:
>>>>
>>>> switch (s) {
>>>>   case "Hello World"  : .... break;
>>>>   case "Hello Wooot" : .... break;
>>>>   default: ....
>>>> }
>>>>
>>>> Could, for example, be compiled into the pseudo-c code:
>>>>
>>>> if (s.length == 11) {
>>>>   if (s.chars[8] == L'r'&&  !wcscmp(s.chars,  L"Hello World")) { ...;
>>>> goto done; }
>>>>   if (s.chars[8] == L'o'&&  !wcscmp(s.chars,  L"Hello Wooot")) { ...;
>>>> goto done; }
>>>> }
>>>> /*default*/
>>>>   ....
>>>> done:
>>>>
>>>> Now should javac do this advanced analysis? No! Javac should only
>>>> generate straight forward string compares and jumps that is a
>>>> relatively
>>>> easy pattern for the JVM to recognize as a string switch. Then the JVM
>>>> can do the advanced optimizations if and when the code is actually
>>>> determined to be a hot spot.
>>>>
>>>> //Fredrik
>>>>
>>>> Paul Benedict skrev:
>>>>
>>>>          
>>>>> Jon,
>>>>>
>>>>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
>>>>> <Jonathan.Gibbons at sun.com>  wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> If hell were to freeze over, and String.hashCode were to change in
>>>>>> JDK
>>>>>>
>>>>>>              
>>>> n, n
>>>>
>>>>          
>>>>>>> =8, then javac could emit different code for Strings in switch,
>>>>>>>
>>>>>>>                
>>>> depending
>>>>
>>>>          
>>>>>> on the value of -target.
>>>>>>
>>>>>>
>>>>>>              
>>>>> Regarding the state of hell, I don't think a compiler implementation
>>>>> should ever rely on such a gamble. The implication is obvious: if JDK
>>>>> N makes a change (by Oracle, by some future owner of OpenJDK -- who
>>>>> knows what happens 10+ years from now), then class files using the
>>>>> OpenJDK de-sugaring would break. The emitted hash results would no
>>>>> longer match the runtime hashes and execution would be unpredictable.
>>>>>
>>>>> To safely emit hash results into byte code, I think you obviously need
>>>>> to go the extra stretch and make a ruling on the algorithm never
>>>>> changing. Isn't that just simply called being responsible?
>>>>>
>>>>> Paul
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>
>>>>          
>>>
>>>        
>>      
>
>    


From Ulf.Zibis at gmx.de  Wed Dec  9 09:58:14 2009
From: Ulf.Zibis at gmx.de (Ulf Zibis)
Date: Wed, 09 Dec 2009 18:58:14 +0100
Subject: Strings in Switch
In-Reply-To: <4B1F9CFB.5060904@gmx.de>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>	<4B1D9FED.7090406@sun.com>	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>	<4B1DD0BB.7010709@sun.com>	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>	<4B1E833B.1040200@sun.com>	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>	<4B1F7EFC.2030601@oracle.com>	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>
	<4B1F9CFB.5060904@gmx.de>
Message-ID: <4B1FE536.2060208@gmx.de>

Alternative (correction):
    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = count;
            if (n == anotherString.count) {
                if (hash == 0)
                    hash = -1;        // mark 1st invokation of equals()
                else if (anotherString.hash == 0)
                    anotherString.hash == -1; // mark 1st invokation "
                // on 2nd invocation now first try hash code comparision:
                else if (hash() != anotherString.hash())
                    return false;
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = offset;
                int j = anotherString.offset;
                while (n-- != 0)
                    if (v1[i++] != v2[j++])
                        return false;
                return true;

            }
        }
        return false;
    }

    public int hashCode() {
        int h = hash;
        if (h == 0 || ++h == 0) {
            int off = offset;
            char val[] = value;
            int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }


-Ulf


Am 09.12.2009 13:50, Ulf Zibis schrieb:
> +1
>
> ... but isn't 'case "Hello2":' superfluous ? I guess it's covered by 
> 'default:'
>
> Additionally String#equals(Object object) could be optimized to benefit 
> from the hash codes:
>     int equalByHashThreshold = 2;
>
>     public boolean equals(Object anObject) {
>         if (this == anObject) {
>             return true;
>         }
>         if (anObject instanceof String) {
>             String anotherString = (String)anObject;
>             int n = count;
>             if (n == anotherString.count &&
>                     (equalByHashThreshold == 0 || --equalByHashThreshold 
> == 0) &&
>                     (anotherString.equalByHashThreshold == 0 || 
> --anotherString.equalByHashThreshold == 0) &&
>                     hash() == anotherString.hash()) {
>                 char v1[] = value;
>                 char v2[] = anotherString.value;
>                 int i = offset;
>                 int j = anotherString.offset;
>                 while (n-- != 0) {
>                     if (v1[i++] != v2[j++])
>                         return false;
>                 }
>                 return true;
>             }
>         }
>         return false;
>     }
>
>     public int hashCode() {
>         int h = hash;
>         if (h == 0) {
>             int off = offset;
>             char val[] = value;
>             int len = count;
>
>             for (int i = 0; i < len; i++) {
>                 h = 31*h + val[off++];
>             }
>             hash = h;
>             equalByHashThreshold = 0;
>         }
>         return h;
>     }
>
> Alternative:
>     public boolean equals(Object anObject) {
>         if (this == anObject) {
>             return true;
>         }
>         if (anObject instanceof String) {
>             String anotherString = (String)anObject;
>             int n = count;
>             if (n == anotherString.count &&
>                     hash != 0 && anotherString.hash != 0 &&
>                     hash() == anotherString.hash()) {
>                 hash = -1;
>                 anotherString.hash = -1;
>                 char v1[] = value;
>                 char v2[] = anotherString.value;
>                 int i = offset;
>                 int j = anotherString.offset;
>                 while (n-- != 0) {
>                     if (v1[i++] != v2[j++])
>                         return false;
>                 }
>                 return true;
>             }
>         }
>         return false;
>     }
>
>     public int hashCode() {
>         int h = hash;
>         if (h == 0 || --h == 0) {
>             int off = offset;
>             char val[] = value;
>             int len = count;
>
>             for (int i = 0; i < len; i++) {
>                 h = 31*h + val[off++];
>             }
>             hash = h;
>         }
>         return h;
>     }
>
>
> -Ulf
>
>
> Am 09.12.2009 11:53, Reinier Zwitserloot schrieb:
>   
>> As I understand it, switch-in-strings is handled during the "lower" phase of
>> javac, which must desugar the string switch into legal java code.
>>
>> This makes a series of if/elseif cases actually impossible, due to switch's
>> unique behaviour in regards to fall-through.... I think. Let's try this out.
>> If we have:
>>
>> switch(someString) {
>>     case "Hello1":
>>        m1();
>>     default:
>>     case "Hello2":
>>        m2();
>>        break;
>>     case "Hello3":
>>        m3();
>> }
>>
>> how should this translate to a series of if statements, in a way that is
>> easier than the current nested double switch scenario? I don't really see a
>> way.
>>
>> There's a compromise, where the original string-to-integer conversion is
>> done with a series of ifs instead of a switch on hashCode. I don't really
>> care about removing the dependency on string's hashCode, but if this is
>> simpler, than, by all means. Until there's proof otherwise, I side with
>> Fredrik that a switch on hashCodes is not going to have a measurable
>> performance impact. As an example, the above would desugar to (with optional
>> switch on string's length during string-to-number conversion omitted. That
>> may actually be a good idea; it's straight forward and does have an obvious
>> performance benefit):
>>
>> int $unique;
>> if ("Hello1".equals(someString)) $unique = 0;
>> else if ("Hello2".equals(someString)) $unique = 1;
>> else if ("Hello3".equals(someString)) $unique = 2;
>> else $unique = 3;
>>
>> switch ($unique) {
>>     case 0:
>>        m1();
>>     case 3:
>>     case 1:
>>        m2();
>>        break;
>>     case 2:
>>        m3();
>> }
>>
>>
>> It avoids dependency on string hashcode (which, for the record, I do not
>> think needs to be avoided), and it's straightforward and simple for all
>> possible forms of string-in-switch that I can think of.
>>
>> --Reinier Zwitserloot
>>
>>
>>
>> On Wed, Dec 9, 2009 at 11:42 AM, Fredrik ?hrstr?m <
>> fredrik.ohrstrom at oracle.com> wrote:
>>
>>   
>>     
>>> This discussion reeks of premature optimization.... A tableswitch on
>>> arbitrary large numbers (aka hashcodes) must be compiled into a sequence
>>> of compares anyway (at least on the x86 platform). If the tableswitch
>>> happens on a sequence of relatively consecutive  numbers, then the JVM
>>> can create a jump table. But for hashcodes, no way!
>>>
>>> Therefore a sequence of compares that work with the interned string
>>> pointers will be faster. If interning is slow (and/or wastes memory)
>>> then a sequence of tailored compares that work directly on the
>>> characters will be the fastest. For example:
>>>
>>> switch (s) {
>>>  case "Hello World"  : .... break;
>>>  case "Hello Wooot" : .... break;
>>>  default: ....
>>> }
>>>
>>> Could, for example, be compiled into the pseudo-c code:
>>>
>>> if (s.length == 11) {
>>>  if (s.chars[8] == L'r' && !wcscmp(s.chars,  L"Hello World")) { ...;
>>> goto done; }
>>>  if (s.chars[8] == L'o' && !wcscmp(s.chars,  L"Hello Wooot")) { ...;
>>> goto done; }
>>> }
>>> /*default*/
>>>  ....
>>> done:
>>>
>>> Now should javac do this advanced analysis? No! Javac should only
>>> generate straight forward string compares and jumps that is a relatively
>>> easy pattern for the JVM to recognize as a string switch. Then the JVM
>>> can do the advanced optimizations if and when the code is actually
>>> determined to be a hot spot.
>>>
>>> //Fredrik
>>>
>>> Paul Benedict skrev:
>>>     
>>>       
>>>> Jon,
>>>>
>>>> On Tue, Dec 8, 2009 at 10:47 AM, Jonathan Gibbons
>>>> <Jonathan.Gibbons at sun.com> wrote:
>>>>
>>>>       
>>>>         
>>>>> If hell were to freeze over, and String.hashCode were to change in JDK
>>>>>         
>>>>>           
>>> n, n
>>>     
>>>       
>>>>>> =8, then javac could emit different code for Strings in switch,
>>>>>>           
>>>>>>             
>>> depending
>>>     
>>>       
>>>>> on the value of -target.
>>>>>
>>>>>         
>>>>>           
>>>> Regarding the state of hell, I don't think a compiler implementation
>>>> should ever rely on such a gamble. The implication is obvious: if JDK
>>>> N makes a change (by Oracle, by some future owner of OpenJDK -- who
>>>> knows what happens 10+ years from now), then class files using the
>>>> OpenJDK de-sugaring would break. The emitted hash results would no
>>>> longer match the runtime hashes and execution would be unpredictable.
>>>>
>>>> To safely emit hash results into byte code, I think you obviously need
>>>> to go the extra stretch and make a ruling on the algorithm never
>>>> changing. Isn't that just simply called being responsible?
>>>>
>>>> Paul
>>>>
>>>>
>>>>       
>>>>         
>>>     
>>>       
>>   
>>     
>
>
>   


From Ulf.Zibis at gmx.de  Wed Dec  9 10:29:47 2009
From: Ulf.Zibis at gmx.de (Ulf Zibis)
Date: Wed, 09 Dec 2009 19:29:47 +0100
Subject: Strings in Switch
In-Reply-To: <15e8b9d20912091005h157044b5heb8d45419db29d59@mail.gmail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1DD0BB.7010709@sun.com>
	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>
	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>
	<4B1E833B.1040200@sun.com>
	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>
	<4B1F7EFC.2030601@oracle.com>
	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>
	<4B1F9CFB.5060904@gmx.de> <4B1FE536.2060208@gmx.de>
	<15e8b9d20912091005h157044b5heb8d45419db29d59@mail.gmail.com>
Message-ID: <4B1FEC9B.5020509@gmx.de>

Yes, but ONLY if hash code is not calculated yet AND equals() is invoked 
1st time.
I assume, that all interned Strings e.g. constants yet have the hash 
value precalculated.
This is to avoid (1) the little more expensive hash code computation if 
a short living string is only equated once, and (2) an additional field 
in each string object to save a kind of threshold marker.
The win of adding hash code compare into String#equals() would come to 
account, if a string would be equated often.

-Ulf


Am 09.12.2009 19:05, Neal Gafter schrieb:
> Do you really want to set the has code of every string in the world to -1?
>
> On Wed, Dec 9, 2009 at 9:58 AM, Ulf Zibis <Ulf.Zibis at gmx.de 
> <mailto:Ulf.Zibis at gmx.de>> wrote:
>
>        public boolean equals(Object anObject) { ...
>
>                    if (hash == 0)
>                        hash = -1;        // mark 1st invokation of
>     equals()
>                    else if (anotherString.hash == 0)
>                        anotherString.hash == -1; // mark 1st invokation "
>        ... }
>
>        public int hashCode() {
>            int h = hash;
>            if (h == 0 || ++h == 0) { ...
>            }
>            return h;
>        }
>
>


From Joe.Darcy at Sun.COM  Wed Dec  9 10:30:35 2009
From: Joe.Darcy at Sun.COM (Joseph D. Darcy)
Date: Wed, 09 Dec 2009 10:30:35 -0800
Subject: Strings in Switch
In-Reply-To: <4B1F7EFC.2030601@oracle.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D4386.6080907@sun.com>
	<b9e663070912071443r44391764v5d1da24206f770ba@mail.gmail.com>
	<4B1D9FED.7090406@sun.com>
	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>
	<4B1DD0BB.7010709@sun.com>
	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>
	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>
	<4B1E833B.1040200@sun.com>
	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>
	<4B1F7EFC.2030601@oracle.com>
Message-ID: <4B1FECCB.6070408@sun.com>

Fredrik ?hrstr?m wrote:
> This discussion reeks of premature optimization.... A tableswitch on
> arbitrary large numbers (aka hashcodes) must be compiled into a sequence
> of compares anyway (at least on the x86 platform). If the tableswitch
> happens on a sequence of relatively consecutive  numbers, then the JVM
> can create a jump table. But for hashcodes, no way!
>   

Implementing strings in switch has always been a fertile topic for 
discussion!

The purpose of the synthesized initial switch in the two-switch strings 
in switch implementation is to create a dense contiguous set of integral 
jump targets for the second switch that are easy to digest for a JVM.

When designing this strings in switch implementation, various factors 
came into play including minimizing the worst-case behavior in terms of 
number of character comparisons.  Currently the strings being switched 
on is expected to be traversed at most twice, once to compute the hash 
code and again to be compared at the hash site.  (If there are hash 
collisions, multiple compares could occur at the hash site.)

For a chain of if-equals-else-if-equals chain, the number of expected 
character comparisons will be likely be higher since when the string 
being switched is present as a target, on average it would be compared 
with about half the target strings.

Depending on the fraction of strings that have hash codes precomputed, 
the fraction of switched on strings that are or are not in the target 
list, and various other properties of the strings being switched on and 
the strings in the target set, different strings in switch 
implementations can be driven to have pathological behavior.

That said, I believe the current strings in switch implementation is 
correct and should have acceptable performance.

I'd be willing to investigate re-engineering the strings in switch 
implementation once:

1) A greater number of the Coin features are implemented, specified, and 
tested.
2) There is some usage of strings in switch to guide the implementation 
strategy.

-Joe


From Joe.Darcy at Sun.COM  Wed Dec  9 10:35:29 2009
From: Joe.Darcy at Sun.COM (Joseph D. Darcy)
Date: Wed, 09 Dec 2009 10:35:29 -0800
Subject: Strings in Switch
In-Reply-To: <C12E4054-109E-41DD-A1A8-2F6C8CF19CBD@googlemail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<4B1D9FED.7090406@sun.com>
	<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>
	<4B1DD0BB.7010709@sun.com>
	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>
	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>
	<4B1E833B.1040200@sun.com>
	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>
	<4B1F7EFC.2030601@oracle.com>
	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>
	<C12E4054-109E-41DD-A1A8-2F6C8CF19CBD@googlemail.com>
Message-ID: <4B1FEDF1.7060606@sun.com>

Mark Mahieu wrote:
> On 9 Dec 2009, at 10:53, Reinier Zwitserloot wrote:
>
>   
>> As I understand it, switch-in-strings is handled during the "lower" phase of
>> javac, which must desugar the string switch into legal java code.
>>     
>
> Hmm, that's not quite how I understand it.
>
> I picture the "lower" phase as a bridge between the side of javac which is concerned with the Java Language spec (parsing, flow analysis etc), and the side which deals with the VM spec (ie. bytecode generation).  Its existence means that neither side need be unnecessarily complicated by details of the other.
>
> So, its input is syntax trees which are valid as far as the language spec is concerned, and its output is a simpler set of trees which can be used by the "gen" phase to produce valid JVM classes - but that 'simpler' set is not necessarily an exact subset of the trees used by earlier phases; ie. the output need not be directly representable as valid Java *language* code (synthetics and some uses of "let" expressions for example).
>   

Mark,

Your description of Lower is correct.

-Joe


From Ulf.Zibis at gmx.de  Wed Dec  9 11:17:01 2009
From: Ulf.Zibis at gmx.de (Ulf Zibis)
Date: Wed, 09 Dec 2009 20:17:01 +0100
Subject: Strings in Switch
In-Reply-To: <4B1FC38E.2010702@oracle.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>		<4B1D9FED.7090406@sun.com>		<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>		<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>		<4B1DD0BB.7010709@sun.com>		<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>		<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>		<4B1E833B.1040200@sun.com>		<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>		<4B1F7EFC.2030601@oracle.com>	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>
	<4B1FC38E.2010702@oracle.com>
Message-ID: <4B1FF7AD.50502@gmx.de>

Am 09.12.2009 16:34, Fredrik ?hrstr?m schrieb:
> .... to optimize
> this sequence of compares and it avoids forcing a full calculation of a 
> hash code that has to traverse the full string. Remember that a string 
> compare can terminate early, a hashcode calculation cannot. Also a 
> string compare works on as large blocks as possible per iteration 
> (8bytes in 64 bit machines, even 16byte blocks with SSE2). 

Good point. If a compare on a string is rarely, and especially if it's 
total length is not trivial, the hash code computation should be more 
expensive, but after some repeated compairs on the same string, the 
hashcode algorithm would win.
Here an enhanced String#equals() implementation, which values the length 
of every invoked compare on characters:

    int equalByHashThreshold = count;

    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = count;
            if (n == anotherString.count &&
                    (equalByHashThreshold > 0 ||
                    hash() == anotherString.hash())) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = offset;
                int j = anotherString.offset;
                while (n-- != 0)
                    if (v1[i++] != v2[j++]) {
                        if (equalByHashThreshold > 0)
                            equalByHashThreshold -= (count - n);
                        return false;
                    }
                return true;
            }
        }
        return false;
    }

    public int hashCode() {
        int h = hash;
        if (h == 0) {
            int off = offset;
            char val[] = value;
            int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
            equalByHashThreshold = 0;
        }
        return h;
    }


-Ulf


From r.spilker at gmail.com  Thu Dec 10 00:53:20 2009
From: r.spilker at gmail.com (Roel Spilker)
Date: Thu, 10 Dec 2009 09:53:20 +0100
Subject: Strings in Switch
In-Reply-To: <4B1FC38E.2010702@oracle.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>
	<4B1DD0BB.7010709@sun.com>
	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>
	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>
	<4B1E833B.1040200@sun.com>
	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>
	<4B1F7EFC.2030601@oracle.com>
	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>
	<4B1FC38E.2010702@oracle.com>
Message-ID: <b9add78d0912100053hb36e9f0ve3174519f368433@mail.gmail.com>

Frederik

Two things:
- We're talking switch statements on String literals here. I don't expect
people to use long Strings here.
- Performance would only be a problem if you would execute the code very
often. Luckily, the hashCode of String is cached. So that should take care
of a potential performance hazard.

Roel

On Wed, Dec 9, 2009 at 4:34 PM, Fredrik ?hrstr?m <
fredrik.ohrstrom at oracle.com> wrote:

> Yes, this is much better. This is much easier to understand and the
> pattern is trivial to catch in the JVM. There are  many opportunities
> for the compiler (even without strings-in-switch awareness) to optimize
> this sequence of compares and it avoids forcing a full calculation of a
> hash code that has to traverse the full string. Remember that a string
> compare can terminate early, a hashcode calculation cannot. Also a
> string compare works on as large blocks as possible per iteration
> (8bytes in 64 bit machines, even 16byte blocks with SSE2). If the JVM
> decides that it would be beneficial to use its own internal hashcodes to
> optimize the code, then it can do so.
>
>


From tball at google.com  Thu Dec 10 07:48:18 2009
From: tball at google.com (Tom Ball)
Date: Thu, 10 Dec 2009 07:48:18 -0800
Subject: Strings in Switch
In-Reply-To: <C12E4054-109E-41DD-A1A8-2F6C8CF19CBD@googlemail.com>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com> 
	<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com> 
	<4B1DD0BB.7010709@sun.com>
	<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com> 
	<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com> 
	<4B1E833B.1040200@sun.com>
	<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com> 
	<4B1F7EFC.2030601@oracle.com>
	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com> 
	<C12E4054-109E-41DD-A1A8-2F6C8CF19CBD@googlemail.com>
Message-ID: <ecf3d2390912100748q15075863qcf316a9002a49d67@mail.gmail.com>

On Wed, Dec 9, 2009 at 7:37 AM, Mark Mahieu <markmahieu at googlemail.com>wrote:

>
> On 9 Dec 2009, at 10:53, Reinier Zwitserloot wrote:
>
> > As I understand it, switch-in-strings is handled during the "lower" phase
> of
> > javac, which must desugar the string switch into legal java code.
>
> Hmm, that's not quite how I understand it.
>
> I picture the "lower" phase as a bridge between the side of javac which is
> concerned with the Java Language spec (parsing, flow analysis etc), and the
> side which deals with the VM spec (ie. bytecode generation).  Its existence
> means that neither side need be unnecessarily complicated by details of the
> other.
>
> So, its input is syntax trees which are valid as far as the language spec
> is concerned, and its output is a simpler set of trees which can be used by
> the "gen" phase to produce valid JVM classes - but that 'simpler' set is not
> necessarily an exact subset of the trees used by earlier phases; ie. the
> output need not be directly representable as valid Java *language* code
> (synthetics and some uses of "let" expressions for example).
>

That's exactly right.  It's common when drafting language change specs to
show the JVM output as simplified Java, but that's because the writer and
reviewers generally know Java much better than JVM byte-code and can thus
more easily spot mistakes.  It's never been a JVM requirement, however.

Josh and I ran into this with the ARM spec, where he was wrestling with what
sort of synthetic super type might be needed for some corner cases.  It
turns out that no new types were needed, as the previous compiler phases had
already verified the code was typesafe.  So all the code needed to do (from
the JVM requirements) was to directly use the resource variable, without it
needing to be recast or incur any risk of a runtime type exception.

Tom


From pbenedict at apache.org  Wed Dec 16 10:29:09 2009
From: pbenedict at apache.org (Paul Benedict)
Date: Wed, 16 Dec 2009 12:29:09 -0600
Subject: Strings in Switch .. and classes
Message-ID: <b9e663070912161029p77027bb9j1834c078eeb2820e@mail.gmail.com>

It occurred to me that if hashCode() really is an acceptable and
fool-proof (yet to be convinced) way to implement strings in switch,
this potentially opens a further enhancement to switch on Class.
Perhaps in JDK 8, we could convert many instanceof checks into label
cases.

Class c = o.getClass();
if (c instanceof String) {  .. }
else if (c instanceof Integer) { ..}
else if (c instanceof Date) { .. }

Can be de-sugared into the new String switch:

switch (object.getClass().getName()) {
  case "java.lang.String":
  case "java.lang.Integer":
  case "java.util.Date":
  }
}

PS: Since instanceof evaluates null to false, as like any other switch
statement, an if-check should still be done in front lest an NPE
occurs.

Paul


From Thomas.Hawtin at Sun.COM  Wed Dec 16 10:37:05 2009
From: Thomas.Hawtin at Sun.COM (Tom Hawtin)
Date: Wed, 16 Dec 2009 18:37:05 +0000
Subject: Strings in Switch .. and classes
In-Reply-To: <b9e663070912161029p77027bb9j1834c078eeb2820e@mail.gmail.com>
References: <b9e663070912161029p77027bb9j1834c078eeb2820e@mail.gmail.com>
Message-ID: <4B2928D1.5080008@sun.com>

Paul Benedict wrote:

> Class c = o.getClass();
> if (c instanceof String) {  .. }

> Can be de-sugared into the new String switch:
> 
> switch (object.getClass().getName()) {
>   case "java.lang.String":

Class names are not unique.

Tom Hawtin


From pbenedict at apache.org  Wed Dec 16 10:51:39 2009
From: pbenedict at apache.org (Paul Benedict)
Date: Wed, 16 Dec 2009 12:51:39 -0600
Subject: Strings in Switch .. and classes
In-Reply-To: <4B2928D1.5080008@sun.com>
References: <b9e663070912161029p77027bb9j1834c078eeb2820e@mail.gmail.com>
	<4B2928D1.5080008@sun.com>
Message-ID: <b9e663070912161051u1d31c7b1hfe5ac2bd6a58e8fc@mail.gmail.com>

Tom,

On Wed, Dec 16, 2009 at 12:37 PM, Tom Hawtin <Thomas.Hawtin at sun.com> wrote:
> Paul Benedict wrote:
>
>> Class c = o.getClass();
>> if (c instanceof String) { ?.. }
>
>> Can be de-sugared into the new String switch:
>>
>> switch (object.getClass().getName()) {
>> ?case "java.lang.String":
>
> Class names are not unique.
>

Can you expound on this some more? I am surprised, so I want to hear
more about it. I thought these two are equivalent, no?

boolean x = c instanceof String;
boolean y = c.getName().equals("java.lang.String");

Paul


From mthornton at optrak.co.uk  Wed Dec 16 11:28:06 2009
From: mthornton at optrak.co.uk (Mark Thornton)
Date: Wed, 16 Dec 2009 19:28:06 +0000
Subject: Strings in Switch .. and classes
In-Reply-To: <b9e663070912161051u1d31c7b1hfe5ac2bd6a58e8fc@mail.gmail.com>
References: <b9e663070912161029p77027bb9j1834c078eeb2820e@mail.gmail.com>	<4B2928D1.5080008@sun.com>
	<b9e663070912161051u1d31c7b1hfe5ac2bd6a58e8fc@mail.gmail.com>
Message-ID: <4B2934C6.5030202@optrak.co.uk>

Paul Benedict wrote:
> Tom,
>
> On Wed, Dec 16, 2009 at 12:37 PM, Tom Hawtin <Thomas.Hawtin at sun.com> wrote:
>   
>> Paul Benedict wrote:
>>
>>     
>>> Class c = o.getClass();
>>> if (c instanceof String) {  .. }
>>>       
>>> Can be de-sugared into the new String switch:
>>>
>>> switch (object.getClass().getName()) {
>>>  case "java.lang.String":
>>>       
>> Class names are not unique.
>>
>>     
>
> Can you expound on this some more? I am surprised, so I want to hear
> more about it. I thought these two are equivalent, no?
>
> boolean x = c instanceof String;
> boolean y = c.getName().equals("java.lang.String");
>
> Paul
>
>   
Classes of the same name can be loaded by different ClassLoader's, the 
full identity of a Class is a pair of (ClassLoader, classname). This 
shouldn't happen with classes like String which are part of the base 
platform.


Mark Thornton


From Jonathan.Gibbons at Sun.COM  Wed Dec 16 12:17:16 2009
From: Jonathan.Gibbons at Sun.COM (Jonathan Gibbons)
Date: Wed, 16 Dec 2009 12:17:16 -0800
Subject: Strings in Switch .. and classes
In-Reply-To: <b9e663070912161029p77027bb9j1834c078eeb2820e@mail.gmail.com>
References: <b9e663070912161029p77027bb9j1834c078eeb2820e@mail.gmail.com>
Message-ID: <4B29404C.1080609@sun.com>

Paul Benedict wrote:
> It occurred to me that if hashCode() really is an acceptable and
> fool-proof (yet to be convinced) way to implement strings in switch,
> this potentially opens a further enhancement to switch on Class.
> Perhaps in JDK 8, we could convert many instanceof checks into label
> cases.
>
> Class c = o.getClass();
> if (c instanceof String) {  .. }
> else if (c instanceof Integer) { ..}
> else if (c instanceof Date) { .. }
>
> Can be de-sugared into the new String switch:
>
> switch (object.getClass().getName()) {
>   case "java.lang.String":
>   case "java.lang.Integer":
>   case "java.util.Date":
>   }
> }
>
> PS: Since instanceof evaluates null to false, as like any other switch
> statement, an if-check should still be done in front lest an NPE
> occurs.
>
> Paul
>
>   


Setting aside issues of class identity, the proposed desugaring does not 
take the use of subtypes into account.

-- Jon


From pbenedict at apache.org  Wed Dec 16 12:25:15 2009
From: pbenedict at apache.org (Paul Benedict)
Date: Wed, 16 Dec 2009 14:25:15 -0600
Subject: Strings in Switch .. and classes
In-Reply-To: <4B29404C.1080609@sun.com>
References: <b9e663070912161029p77027bb9j1834c078eeb2820e@mail.gmail.com>
	<4B29404C.1080609@sun.com>
Message-ID: <b9e663070912161225m36db790ao5ef7dcd2118bee63@mail.gmail.com>

Jon, you are correct. My email has some glaring holes. After all, it
was just a quick thought, but definitely needs more coherence. Thanks.

Paul

On Wed, Dec 16, 2009 at 2:17 PM, Jonathan Gibbons
<Jonathan.Gibbons at sun.com> wrote:
> Paul Benedict wrote:
>>
>> It occurred to me that if hashCode() really is an acceptable and
>> fool-proof (yet to be convinced) way to implement strings in switch,
>> this potentially opens a further enhancement to switch on Class.
>> Perhaps in JDK 8, we could convert many instanceof checks into label
>> cases.
>>
>> Class c = o.getClass();
>> if (c instanceof String) { ?.. }
>> else if (c instanceof Integer) { ..}
>> else if (c instanceof Date) { .. }
>>
>> Can be de-sugared into the new String switch:
>>
>> switch (object.getClass().getName()) {
>> ?case "java.lang.String":
>> ?case "java.lang.Integer":
>> ?case "java.util.Date":
>> ?}
>> }
>>
>> PS: Since instanceof evaluates null to false, as like any other switch
>> statement, an if-check should still be done in front lest an NPE
>> occurs.
>>
>> Paul
>>
>>
>
>
> Setting aside issues of class identity, the proposed desugaring does not
> take the use of subtypes into account.
>
> -- Jon
>


From matthew at matthewadams.me  Wed Dec 16 23:16:26 2009
From: matthew at matthewadams.me (Matthew Adams)
Date: Wed, 16 Dec 2009 23:16:26 -0800
Subject: Too late compile-time type checked reflection syntax sugar proposal?
Message-ID: <1ba389ce0912162316t6d2d026jd21685f9f3cd13ba@mail.gmail.com>

Hi all,

Do the powers that be consider it too late to consider a new proposal
for JDK 7?  Here it is in ultrashort form, albeit not completely
thought through.  I'm proposing it just to kick it around.

Proposal:
Compile-time type-checked and existence-checked reflection syntax

Description:
Introduce a new, double-dot operator ".." to act as syntax sugar for
accessing reflection information with type & existence checking at
compile time.

Concept:
The double-dot operator, meaning "get metamodel artifact", allows for
much more concise reflective access to things you know about at build
time but must use reflection for some reason.  Trust me, it happens
plenty.  The choice of ".." for the operator was that first, ".."
doesn't introduce a new keyword, and second, in filesystems, ".."
usually means "go up a level", which is essentially what we're doing:
going up a level from model to metamodel.  Looking at the examples,
you can see how much less code it is compared to the reflection-based
equivalent, plus if it's typesafe, you get fewer errors when you're
depending on type safety -- that is, at least you knew at compile time
that things were all good.  It still doesn't mean anything at runtime,
and you could get NoSuchMethodException, etc.

Examples:

1. Get the Field object for the field named "bar" in class Foo:
Field bar = Foo..bar;

// current way
Field bar = Foo.class.getDeclaredField("bar");

2. Get the Method object for the method with signature "myMethod(int
a, String y)" defined on class Goo:
Method m = Goo..myMethod(int,String);
// note scope & return type don't matter

// current way
Method m = Goo.class.getDeclaredMethod("myMethod", new
Class[int.class, String.class] {});

3. Get the Class object for the class Snafu.  This is an interesting
case that offers backward compatibility:
Class c = Snafu..class;
// exactly the same as Snafu.class, the ".." operator's insipiration!!

4. Get the @Foo annotation on the Bar class:
Annotation foo = Bar.. at Foo;

// current way
Annotation foo = Bar.class.getAnnotation(Foo.class);

5. Get the @Foo annotation on the field named "blah" in the class Gorp:
Annotation foo = Gorp..blah.. at Foo;

// current way
Annotation foo = Gorp.class.getDeclaredField("blah").getAnnotation(Foo.class);

6. Get the @Foo annotation on the second parameter of the method
"start(int x, @Foo int y, int z)" defined in class Startable:
Annotation foo = Startable..start(int,int.. at Foo,int);

// current way -- no error checking
Annotation[] anns = Startable.class.getMethod("start", new Class[] {
int.class, int.class, int.class }).getParameterAnnotations()[1];
Annotation foo = null;
for (Annotation ann : anns) {
  if (ann.getClass().equals(Foo.class)) {
    foo = ann;
    break; // got it
  }
}
// foo is either null or a reference to the @Foo annotation instance
on the second parameter of the method

7. Get all of the @Foo annotations on all of the parameters of the
methods "start(@Foo int x, int y, @Foo int z)" defined in class
Startable:
Annotation[] foo = Startable..start(int.. at Foo,int.. at Foo,int.. at Foo);
// returns an array with the first @Foo, null, then the last @Foo

// current way left as an exercise to the reader :)

8. Get the @Foo annotation on the "@Foo start(int x, int y, int z)"
method defined in class Startable:
Annotation foo = Startable..start(int,int,int).. at Foo;

// current way
Annotation foo = Startable.class.getDeclaredMethod("start", new
Class[] { int.class, int.class, int.class }).getAnnotation(Foo.class);


Motivation:

The double-dot operator would allow for compile-time type-checked
reflective operations, like those in the persistence APIs.  For
example, in JPA:

@Entity
public class Department {
  @OneToMany(mappedBy = "department") // note string
  Set<Employee> employees;
  //...
}

becomes

@Entity
public class Department {
  @OneToMany(mappedBy = Employee..department) // checked at compile time
  Set<Employee> employees;
  //...
}

It also is beneficial in many other areas.  Use your imagination!  I
can't think of many more (it's late), but Criteria queries come to
mind...

WDYT?

-matthew

-- 
mailto:matthew at matthewadams.me
skype:matthewadams12
yahoo:matthewadams
aol:matthewadams12
google-talk:matthewadams12 at gmail.com
msn:matthew at matthewadams.me
http://matthewadams.me
http://www.linkedin.com/in/matthewadams


From Joe.Darcy at Sun.COM  Wed Dec 16 23:29:24 2009
From: Joe.Darcy at Sun.COM (Joseph D. Darcy)
Date: Wed, 16 Dec 2009 23:29:24 -0800
Subject: Too late compile-time type checked reflection syntax sugar
	proposal?
In-Reply-To: <1ba389ce0912162316t6d2d026jd21685f9f3cd13ba@mail.gmail.com>
References: <1ba389ce0912162316t6d2d026jd21685f9f3cd13ba@mail.gmail.com>
Message-ID: <4B29DDD4.6080802@sun.com>

Matthew Adams wrote:
> Hi all,
>
> Do the powers that be consider it too late to consider a new proposal
> for JDK 7?

Yes.

Regards,

-Joe


From Ulf.Zibis at gmx.de  Thu Dec 17 02:01:44 2009
From: Ulf.Zibis at gmx.de (Ulf Zibis)
Date: Thu, 17 Dec 2009 11:01:44 +0100
Subject: Strings in Switch .. and classes
In-Reply-To: <b9e663070912161029p77027bb9j1834c078eeb2820e@mail.gmail.com>
References: <b9e663070912161029p77027bb9j1834c078eeb2820e@mail.gmail.com>
Message-ID: <4B2A0188.802@gmx.de>

See also threads:
switch (...) instanceof feature --- 2009-03-30
Extend switch .. case statement for Object types and simple expressions 
--- 2009-03-30
Strings in switch --- 2009-03-30
Extend switch .. case statement for all types and simple expressions 
(update) --- 2009-03-31
JCK feedback on "Strings in Switch" proposal --- 2009-05-24

-Ulf


Paul Benedict schrieb:
> It occurred to me that if hashCode() really is an acceptable and
> fool-proof (yet to be convinced) way to implement strings in switch,
> this potentially opens a further enhancement to switch on Class.
> Perhaps in JDK 8, we could convert many instanceof checks into label
> cases.
>
> Class c = o.getClass();
> if (c instanceof String) {  .. }
> else if (c instanceof Integer) { ..}
> else if (c instanceof Date) { .. }
>
> Can be de-sugared into the new String switch:
>
> switch (object.getClass().getName()) {
>   case "java.lang.String":
>   case "java.lang.Integer":
>   case "java.util.Date":
>   }
> }
>
> PS: Since instanceof evaluates null to false, as like any other switch
> statement, an if-check should still be done in front lest an NPE
> occurs.
>
> Paul
>
>   


From jimmyuniversal at yahoo.com  Thu Dec 17 14:14:48 2009
From: jimmyuniversal at yahoo.com (James Arlow)
Date: Thu, 17 Dec 2009 14:14:48 -0800 (PST)
Subject: Benefit from computing String Hash at compile time?
Message-ID: <201502.54372.qm@web57708.mail.re3.yahoo.com>

I'm not exactly up to date, but even reading entries from December, there are talks about computing the string hash at compile time.  This seems like a bad idea to me when looking towards future compatibility.

For the sake of posterity, it would make more sense to store the strings as literals in the class file, and then compute the hash during the class loading process.  The amount of processing at run-time would be negligible, and it would eliminate the possibility of errors creeping up from an "improved" or non-standard hash function.  

While the "improved" case seems unlikely, it would prevent whole sections of code from breaking simply because a third party JVM introduced an accidental error into the hash process. 

I think everyone can agree its best to not risk cutting off open options unless there is a critical performance penalty that would be addressed by doing so, so for a function that is called once per switch option and only when the class is loaded, I think its safe to forget about compile time hashes altogether.

If people are really worried about performance, then the best option would be to offer two ways to compile the class, one with string literals, for compatibility, and one with Java standard hash values, for performance.  


From opinali at gmail.com  Fri Dec 18 09:38:41 2009
From: opinali at gmail.com (Osvaldo Doederlein)
Date: Fri, 18 Dec 2009 15:38:41 -0200
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <201502.54372.qm@web57708.mail.re3.yahoo.com>
References: <201502.54372.qm@web57708.mail.re3.yahoo.com>
Message-ID: <fb5ec5090912180938h2953e542l6f990f58aaf00be8@mail.gmail.com>

I believe the String hashcode computation could be performed eagerly,
piggy-backing on the (already significant complex) copying and/or decoding
code used by its constructors. For every character, first to last,
produced/added to the this.value array, we also update the hashcode: h =
31*h + character => trivial enough to be a basically "free" addition to the
existing loops.

Some details:
- For large strings, we gain a lot because we don't have to visit all the
characters again (posibly when they are not anymore in the CPU cache) when
the hashcode is first computed (perhaps much later than construction).
- The current algorithms reuses the hashcode 0 to mean "not computed", so it
will recopute everything again for strings that just happen to produce 0
with the current formula. Eager computation avoids this risk, however small.
- Some constructors (remarkably for substring and cloning) rely on
Arrays.copyOfRange(), which implementation is more efficient than any Java
loop (I guess it's a HotSpot intrinsic with optimization for alignment
etc.). In that case, using an explicit loop so we can smuggle the hashcode
calculation inside it, will probably have a measurable disadvantage. But
this disadvantage is only for construction (and then only for large
strings); for strings that are ever hashed, the net saving will always be
still positive.
- Eager computation allows to declare hash as final, which may have some
performance benefic, e.g. for caching in registers.
- Eager computation allows hashCode() to be a trivial getter without any
branch.
- The hashcode function can be factored into a private static method, e.g.
int incHash(int currhash), so this tiny algorthm must not be repeated in a
dozen constructors; that method will be trivial to inline so there's no cost
either for compiled code.
- Admittedly, for interpreted code there are higher disadvantages in the
constructor; but then, I expect most String constructors to appear as the
first methods to be optimized by the JIT - they are just BURNING "hot".
- If we have eager computation, I think it's not worthy caching the hashcode
of literal Strings in the Constant Pool; this requires changing the CP spec
and the classfiles will be 4 bytes bigger for every String literal - a lot
of extra bytes considering how many Strings we typically have (including all
Strings from CP symbols). Still, javac could use the "well-known" hashcode
for special needs like strings-in-switch; other optimizations could be used
more aggressively (precomputed hashtables for huge static symbol tables...).
Let's face it, the String hashcode algorithm has changed in the early days,
but it will never change again.

A+
Osvaldo

2009/12/17 James Arlow <jimmyuniversal at yahoo.com>

> I'm not exactly up to date, but even reading entries from December, there
> are talks about computing the string hash at compile time.  This seems like
> a bad idea to me when looking towards future compatibility.
>
> For the sake of posterity, it would make more sense to store the strings as
> literals in the class file, and then compute the hash during the class
> loading process.  The amount of processing at run-time would be negligible,
> and it would eliminate the possibility of errors creeping up from an
> "improved" or non-standard hash function.
>
> While the "improved" case seems unlikely, it would prevent whole sections
> of code from breaking simply because a third party JVM introduced an
> accidental error into the hash process.
>
> I think everyone can agree its best to not risk cutting off open options
> unless there is a critical performance penalty that would be addressed by
> doing so, so for a function that is called once per switch option and only
> when the class is loaded, I think its safe to forget about compile time
> hashes altogether.
>
> If people are really worried about performance, then the best option would
> be to offer two ways to compile the class, one with string literals, for
> compatibility, and one with Java standard hash values, for performance.
>
>
>
>
>
>


From pbenedict at apache.org  Fri Dec 18 14:29:01 2009
From: pbenedict at apache.org (Paul Benedict)
Date: Fri, 18 Dec 2009 16:29:01 -0600
Subject: Benefit from computing String Hash at compile time?
Message-ID: <b9e663070912181429v553f7e32o9eee5c62d57fd2de@mail.gmail.com>

James,

I concur with your thoughts. It's a risky decision to embed the hash
code into the class file. I can't imagine any other JDK implementation
would attempt this, but perhaps Sun can bet on some things that others
cannot. Regardless, some prominent people disagreed, but I don't think
it changes reality. Either the hash code should forever be made what
it is -- and why couldn't that be done? -- or have an alternate
implementation. I really like your idea of storing the Strings in the
class file and computing their hash when the class loads.

Paul


From abies at adres.pl  Fri Dec 18 15:01:43 2009
From: abies at adres.pl (Artur Biesiadowski)
Date: Sat, 19 Dec 2009 00:01:43 +0100
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <fb5ec5090912180938h2953e542l6f990f58aaf00be8@mail.gmail.com>
References: <201502.54372.qm@web57708.mail.re3.yahoo.com>
	<fb5ec5090912180938h2953e542l6f990f58aaf00be8@mail.gmail.com>
Message-ID: <4B2C09D7.10405@adres.pl>

Osvaldo Doederlein wrote:
> - Some constructors (remarkably for substring and cloning) rely on
> Arrays.copyOfRange(), which implementation is more efficient than any Java
> loop (I guess it's a HotSpot intrinsic with optimization for alignment
> etc.). In that case, using an explicit loop so we can smuggle the hashcode
> calculation inside it, will probably have a measurable disadvantage. But
> this disadvantage is only for construction (and then only for large
> strings); for strings that are ever hashed, the net saving will always be
> still positive.
Especially in case of substring, optimized private constructor is used, 
which just does 3 assignments. With your idea, it would have to iterate 
over all elements. This is quite common operation.

I wonder if there is anything (some Hotspot intrinsic?) preventing quick 
hack on java.lang.String, putting it in bootclasspath/a and measuring 
time of javac few thousands source files, reindexing huge lucene data 
and maybe hsql on some test database. It should at least give a rough 
figure if it changes the speed in any measurable way.

Regards,
Artur Biesiadowski


From Ulf.Zibis at gmx.de  Fri Dec 18 17:09:34 2009
From: Ulf.Zibis at gmx.de (Ulf Zibis)
Date: Sat, 19 Dec 2009 02:09:34 +0100
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <b9e663070912181429v553f7e32o9eee5c62d57fd2de@mail.gmail.com>
References: <b9e663070912181429v553f7e32o9eee5c62d57fd2de@mail.gmail.com>
Message-ID: <4B2C27CE.1080504@gmx.de>

+1

Am 18.12.2009 23:29, Paul Benedict schrieb:
> James,
>
> I concur with your thoughts. It's a risky decision to embed the hash
> code into the class file. I can't imagine any other JDK implementation
> would attempt this, but perhaps Sun can bet on some things that others
> cannot. Regardless, some prominent people disagreed, but I don't think
> it changes reality. Either the hash code should forever be made what
> it is -- and why couldn't that be done? -- or have an alternate
> implementation. I really like your idea of storing the Strings in the
> class file and computing their hash when the class loads.
>
> Paul
>
>
>   


From pbenedict at apache.org  Fri Dec 18 17:42:27 2009
From: pbenedict at apache.org (Paul Benedict)
Date: Fri, 18 Dec 2009 19:42:27 -0600
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com>
References: <b9e663070912181429v553f7e32o9eee5c62d57fd2de@mail.gmail.com>
	<560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com>
Message-ID: <b9e663070912181742xfa20d61k68fb280647465adb@mail.gmail.com>

Reinier,

Thank you for your reply.

On Fri, Dec 18, 2009 at 5:04 PM, Reinier Zwitserloot
<reinier at zwitserloot.com> wrote:
> String.hashCode() has _already_ been defined as unchanging and set in stone.
> We could do so again, if it assuages recently stated fears, though I'm not
> sure what this would accomplish. It's right here:
> http://java.sun.com/javase/6/docs/api/java/lang/String.html#hashCode()

I hope to make some things clear:

My objection relies solely on the fact that it is not "set in stone".
If I remember correctly, Joe had to do research if the API ever
changed (not since at least 1.2). Neither Joe, Jonathan, and Josh
(people well respected) have claimed what you are claiming. The
highest assurance given is that it's "highly unlikely" and only if
"hell freezes over". .

Now I grant the fact it's highly unlikely. I buy off on that. The odds
are hashCode() is not going to change. I also have no philosophical
problems with emitting the value from String.hashCode() into class
files. However, I believe the manufacturer of a JDK should have
*absolute certainty* when making this decision. It's pretty clear to
me this certainty is high, but not absolute. And since OpenJDK is made
by Sun, the bearer of Java, if it is good for them, it's good for
everyone. Follow the leader. Once this decision is made, I assert
String.hashCode() will have to be "set in stone" but only because of
Project Coin and Sun's influence, not the API.

Paul


From opinali at gmail.com  Sat Dec 19 04:53:23 2009
From: opinali at gmail.com (Osvaldo Pinali Doederlein)
Date: Sat, 19 Dec 2009 10:53:23 -0200
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <4B2C09D7.10405@adres.pl>
References: <201502.54372.qm@web57708.mail.re3.yahoo.com>	<fb5ec5090912180938h2953e542l6f990f58aaf00be8@mail.gmail.com>
	<4B2C09D7.10405@adres.pl>
Message-ID: <4B2CCCC3.1090401@gmail.com>

Em 18/12/2009 21:01, Artur Biesiadowski escreveu:
> Osvaldo Doederlein wrote:
>    
>> - Some constructors (remarkably for substring and cloning) rely on
>> Arrays.copyOfRange(), which implementation is more efficient than any Java
>> loop (I guess it's a HotSpot intrinsic with optimization for alignment
>> etc.). In that case, using an explicit loop so we can smuggle the hashcode
>> calculation inside it, will probably have a measurable disadvantage. But
>> this disadvantage is only for construction (and then only for large
>> strings); for strings that are ever hashed, the net saving will always be
>> still positive.
>>      
> Especially in case of substring, optimized private constructor is used,
> which just does 3 assignments. With your idea, it would have to iterate
> over all elements. This is quite common operation.
>    

The short answer: you are right, that's an important special case, 
remarkably in methods using several temporary strings (often substrings 
of some previous string) because temp strings are virtually never 
hashed; and the sharing of String.value is critical to String's 
immutable design. But this only means that eager computation of the 
hashcode is not always a good idea - so, perhaps we can do that eagerly 
in all/most constructors that create a new String.value; or more 
generally, in any constructor where this extra computation is proved to 
not produce any significant performance degradation. For other 
constructors, we just keep String.hash initialized with 0, so the 
current hashCode() is kept unchanged and will calculate the value if 
necessary.

The long answer, I'm coding a prototype impl of this optimization in 
some constructors so I can benchmark this and see if it's worth the 
trouble. As usual it's better telling the code to do all the talking.

> I wonder if there is anything (some Hotspot intrinsic?) preventing quick
> hack on java.lang.String, putting it in bootclasspath/a and measuring
> time of javac few thousands source files, reindexing huge lucene data
> and maybe hsql on some test database. It should at least give a rough
> figure if it changes the speed in any measurable way.
>    

HotSpot is doing some intrinsic tricks for String/StringBuilder (IIRC) 
in recent JDK7 build, but I didn't check these changesets... but I don't 
think it would affect such benchmarking, if we don't change the data 
layout (fields).

A+
Osvaldo


From opinali at gmail.com  Sat Dec 19 05:24:37 2009
From: opinali at gmail.com (Osvaldo Pinali Doederlein)
Date: Sat, 19 Dec 2009 11:24:37 -0200
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <b9e663070912181742xfa20d61k68fb280647465adb@mail.gmail.com>
References: <b9e663070912181429v553f7e32o9eee5c62d57fd2de@mail.gmail.com>	<560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com>
	<b9e663070912181742xfa20d61k68fb280647465adb@mail.gmail.com>
Message-ID: <4B2CD415.5070200@gmail.com>

Em 18/12/2009 23:42, Paul Benedict escreveu:
> On Fri, Dec 18, 2009 at 5:04 PM, Reinier Zwitserloot
> <reinier at zwitserloot.com>  wrote:
>    
>> String.hashCode() has _already_ been defined as unchanging and set in stone.
>> We could do so again, if it assuages recently stated fears, though I'm not
>> sure what this would accomplish. It's right here:
>> http://java.sun.com/javase/6/docs/api/java/lang/String.html#hashCode()
>>      
> I hope to make some things clear:
>
> My objection relies solely on the fact that it is not "set in stone".
> If I remember correctly, Joe had to do research if the API ever
> changed (not since at least 1.2). Neither Joe, Jonathan, and Josh
> (people well respected) have claimed what you are claiming. The
> highest assurance given is that it's "highly unlikely" and only if
> "hell freezes over". .
>
> Now I grant the fact it's highly unlikely. I buy off on that. The odds
> are hashCode() is not going to change. I also have no philosophical
> problems with emitting the value from String.hashCode() into class
> files. However, I believe the manufacturer of a JDK should have
> *absolute certainty* when making this decision. It's pretty clear to
> me this certainty is high, but not absolute. And since OpenJDK is made
> by Sun, the bearer of Java, if it is good for them, it's good for
> everyone. Follow the leader. Once this decision is made, I assert
> String.hashCode() will have to be "set in stone" but only because of
> Project Coin and Sun's influence, not the API.
>    

The hashcode algorithm has changed only once, in 1.2, I checked it too. 
And yes, there's not formal guarantee yet that it won't change again; 
but all it takes is a single line of javadoc, stating that the algorithm 
- which is _already documented and contractual_ at least since 1.2 - 
won't ever change again. Even independent cleanroom implementations (I 
checked GNU Classpath), use the same algorithm.

A+
Osvaldo


From reinier at zwitserloot.com  Sat Dec 19 08:43:07 2009
From: reinier at zwitserloot.com (Reinier Zwitserloot)
Date: Sat, 19 Dec 2009 17:43:07 +0100
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <4B2CD415.5070200@gmail.com>
References: <b9e663070912181429v553f7e32o9eee5c62d57fd2de@mail.gmail.com>
	<560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com>
	<b9e663070912181742xfa20d61k68fb280647465adb@mail.gmail.com>
	<4B2CD415.5070200@gmail.com>
Message-ID: <560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com>

I don't really understand the thread of conversation here. The fact that the
algorithm is explained in the javadoc means that it is part of the java
spec. There is no need to explain that it can't ever change; that notion is
already inherent in the fact that the algorithm is explained in the javadoc.
The mistake seems to be in what 'part of the java spec' means. It does not
actually mean: Cannot possibly change.

Like everything else in java that is frozen, pragmatic issues trump
backwards compatibility. String.hashCode was changed in 1.2, as was
explained earlier, because of a conflict in the JVM spec and the javadoc, as
well as a really stupid algorithm in the JVM spec. Pragmatism won out. An
analysis was made of the impact, and the analysis resulted in the decision
to change it, even though that wasn't, technically, backwards compatible.

That was then. At this point in time, it most definitely will not be
changing ever again, regardless of whether string-in-switch has a dependency
on String.hashCode's implementation.

The javadoc of String should also not be bogged down with the implementation
detail that string-on-switch is dependent on it. Implementation details have
no place in javadoc.

--Reinier Zwitserloot


On Sat, Dec 19, 2009 at 2:24 PM, Osvaldo Pinali Doederlein <
opinali at gmail.com> wrote:

> Em 18/12/2009 23:42, Paul Benedict escreveu:
>
>  On Fri, Dec 18, 2009 at 5:04 PM, Reinier Zwitserloot
>> <reinier at zwitserloot.com>  wrote:
>>
>>
>>> String.hashCode() has _already_ been defined as unchanging and set in
>>> stone.
>>> We could do so again, if it assuages recently stated fears, though I'm
>>> not
>>> sure what this would accomplish. It's right here:
>>> http://java.sun.com/javase/6/docs/api/java/lang/String.html#hashCode()
>>>
>>>
>> I hope to make some things clear:
>>
>> My objection relies solely on the fact that it is not "set in stone".
>> If I remember correctly, Joe had to do research if the API ever
>> changed (not since at least 1.2). Neither Joe, Jonathan, and Josh
>> (people well respected) have claimed what you are claiming. The
>> highest assurance given is that it's "highly unlikely" and only if
>> "hell freezes over". .
>>
>> Now I grant the fact it's highly unlikely. I buy off on that. The odds
>> are hashCode() is not going to change. I also have no philosophical
>> problems with emitting the value from String.hashCode() into class
>> files. However, I believe the manufacturer of a JDK should have
>> *absolute certainty* when making this decision. It's pretty clear to
>> me this certainty is high, but not absolute. And since OpenJDK is made
>> by Sun, the bearer of Java, if it is good for them, it's good for
>> everyone. Follow the leader. Once this decision is made, I assert
>> String.hashCode() will have to be "set in stone" but only because of
>> Project Coin and Sun's influence, not the API.
>>
>>
>
> The hashcode algorithm has changed only once, in 1.2, I checked it too. And
> yes, there's not formal guarantee yet that it won't change again; but all it
> takes is a single line of javadoc, stating that the algorithm - which is
> _already documented and contractual_ at least since 1.2 - won't ever change
> again. Even independent cleanroom implementations (I checked GNU Classpath),
> use the same algorithm.
>
> A+
> Osvaldo
>


From pbenedict at apache.org  Sat Dec 19 09:28:31 2009
From: pbenedict at apache.org (Paul Benedict)
Date: Sat, 19 Dec 2009 11:28:31 -0600
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com>
References: <b9e663070912181429v553f7e32o9eee5c62d57fd2de@mail.gmail.com>
	<560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com>
	<b9e663070912181742xfa20d61k68fb280647465adb@mail.gmail.com>
	<4B2CD415.5070200@gmail.com>
	<560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com>
Message-ID: <b9e663070912190928i6123cb69h101c0f473a6d49be@mail.gmail.com>

Reinier,

> There is no need to explain that it can't ever change; that notion is
> already inherent in the fact that the algorithm is explained in the javadoc.
> The mistake seems to be in what 'part of the java spec' means. It does not
> actually mean: Cannot possibly change.

The algorithm is explained. The documentation is good, isn't it? It
is, however, the documentation is for that version of the Java
platform.

> The javadoc of String should also not be bogged down with the implementation
> detail that string-on-switch is dependent on it. Implementation details have
> no place in javadoc.

I agree with you. No one has to reveal implementation details. All
that is necessary is a note that the algorithm must not change from
JDK version to JDK version.

Moving on...
If anyone at Sun is still listening (::grins::), I prefer to emit a
static method that contains a duplicate of the hashCode() algorithm.
Then, no one has to worry about JDK version upgrades and
String.hashCode() is free for future tweaking.

static int $switch_hashCode(String s) {... }
switch ($switch_hashCode(s)) {
...
}

Paul


From opinali at gmail.com  Sat Dec 19 11:06:48 2009
From: opinali at gmail.com (Osvaldo Pinali Doederlein)
Date: Sat, 19 Dec 2009 17:06:48 -0200
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <b9e663070912190928i6123cb69h101c0f473a6d49be@mail.gmail.com>
References: <b9e663070912181429v553f7e32o9eee5c62d57fd2de@mail.gmail.com>	
	<560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com>	
	<b9e663070912181742xfa20d61k68fb280647465adb@mail.gmail.com>	
	<4B2CD415.5070200@gmail.com>	
	<560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com>
	<b9e663070912190928i6123cb69h101c0f473a6d49be@mail.gmail.com>
Message-ID: <4B2D2448.4090505@gmail.com>

Em 19/12/2009 15:28, Paul Benedict escreveu:
> Moving on...
> If anyone at Sun is still listening (::grins::), I prefer to emit a
> static method that contains a duplicate of the hashCode() algorithm.
> Then, no one has to worry about JDK version upgrades and
> String.hashCode() is free for future tweaking.
>
> static int $switch_hashCode(String s) {... }
> switch ($switch_hashCode(s)) {
> ...
> }
>
> Paul
>    

This is wasteful, first because strings used in switch statements and 
also in hashed collections will be hashed twice; second (and much more 
important), every execution switch(str) needs to call your special 
hashcode function again for str, as this hashcode cannot be cached in 
str. This extra cost makes switch-on-string O(N) on the str.length, 
which makes the hashing compilation strategy pointless.

A+
Osvaldo


From pbenedict at apache.org  Sat Dec 19 16:30:54 2009
From: pbenedict at apache.org (Paul Benedict)
Date: Sat, 19 Dec 2009 18:30:54 -0600
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <4B2D2448.4090505@gmail.com>
References: <b9e663070912181429v553f7e32o9eee5c62d57fd2de@mail.gmail.com>
	<560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com>
	<b9e663070912181742xfa20d61k68fb280647465adb@mail.gmail.com>
	<4B2CD415.5070200@gmail.com>
	<560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com>
	<b9e663070912190928i6123cb69h101c0f473a6d49be@mail.gmail.com>
	<4B2D2448.4090505@gmail.com>
Message-ID: <b9e663070912191630k55fa284do870874942afc2812@mail.gmail.com>

> This is wasteful, first because strings used in switch statements and also
> in hashed collections will be hashed twice; second (and much more
> important), every execution switch(str) needs to call your special hashcode
> function again for str, as this hashcode cannot be cached in str. This extra
> cost makes switch-on-string O(N) on the str.length, which makes the hashing
> compilation strategy pointless.

You make a good point. As for being "hashed twice", that's simply the
cost of removing the reliance on String.hashCode(). Well, couldn't
$switch_hashCode() perform some caching for itself?

Paul


From reinier at zwitserloot.com  Sun Dec 20 02:10:23 2009
From: reinier at zwitserloot.com (Reinier Zwitserloot)
Date: Sun, 20 Dec 2009 11:10:23 +0100
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <b9e663070912191630k55fa284do870874942afc2812@mail.gmail.com>
References: <b9e663070912181429v553f7e32o9eee5c62d57fd2de@mail.gmail.com>
	<560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com>
	<b9e663070912181742xfa20d61k68fb280647465adb@mail.gmail.com>
	<4B2CD415.5070200@gmail.com>
	<560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com>
	<b9e663070912190928i6123cb69h101c0f473a6d49be@mail.gmail.com>
	<4B2D2448.4090505@gmail.com>
	<b9e663070912191630k55fa284do870874942afc2812@mail.gmail.com>
Message-ID: <560fb5ed0912200210s660cbbe6u867926b09f08e0c3@mail.gmail.com>

Huh?

All you need to do is this:

add a method to java.lang.String with the signature:

public synthetic int switchCode() {
    return hashCode();
}


Why would this cause performance issues?

--Reinier Zwitserloot


On Sun, Dec 20, 2009 at 1:30 AM, Paul Benedict <pbenedict at apache.org> wrote:

> > This is wasteful, first because strings used in switch statements and
> also
> > in hashed collections will be hashed twice; second (and much more
> > important), every execution switch(str) needs to call your special
> hashcode
> > function again for str, as this hashcode cannot be cached in str. This
> extra
> > cost makes switch-on-string O(N) on the str.length, which makes the
> hashing
> > compilation strategy pointless.
>
> You make a good point. As for being "hashed twice", that's simply the
> cost of removing the reliance on String.hashCode(). Well, couldn't
> $switch_hashCode() perform some caching for itself?
>
> Paul
>
>


From opinali at gmail.com  Sun Dec 20 04:16:31 2009
From: opinali at gmail.com (Osvaldo Pinali Doederlein)
Date: Sun, 20 Dec 2009 10:16:31 -0200
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <560fb5ed0912200210s660cbbe6u867926b09f08e0c3@mail.gmail.com>
References: <b9e663070912181429v553f7e32o9eee5c62d57fd2de@mail.gmail.com>	
	<560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com>	
	<b9e663070912181742xfa20d61k68fb280647465adb@mail.gmail.com>	
	<4B2CD415.5070200@gmail.com>	
	<560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com>	
	<b9e663070912190928i6123cb69h101c0f473a6d49be@mail.gmail.com>	
	<4B2D2448.4090505@gmail.com>	
	<b9e663070912191630k55fa284do870874942afc2812@mail.gmail.com>
	<560fb5ed0912200210s660cbbe6u867926b09f08e0c3@mail.gmail.com>
Message-ID: <4B2E159F.2090604@gmail.com>

Em 20/12/2009 08:10, Reinier Zwitserloot escreveu:
> Huh?
>
> All you need to do is this:
>
> add a method to java.lang.String with the signature:
>
> public synthetic int switchCode() {
>     return hashCode();
> }
>
>
> Why would this cause performance issues?

This is a good idea, as long as
1) that implementation doesn't change. If it ever needs to change, we're 
again in hell, either having to recompute the switch-hash at every call, 
or wasting an extra int field in the String object to cache this 
secondary hashcode (that would be horribly wasteful because String is 
the single most popular object in most Java heaps, and only a miserably 
tiny fraction of all strings would ever be used in switchs).
2) we don't bother to tight-couple java.lang.String (remarkably with a 
public method) to the switch statement, which is a concern for some posters.

So I still think it is pointless. I say, just put in String.hashCode() 
"...and this algorithm is cast in stone, forever and ever until JDK 
+Infinity", and use it directly from switch, move on.

A+
Osvaldo

>
> --Reinier Zwitserloot
>
>
> On Sun, Dec 20, 2009 at 1:30 AM, Paul Benedict <pbenedict at apache.org 
> <mailto:pbenedict at apache.org>> wrote:
>
>     > This is wasteful, first because strings used in switch
>     statements and also
>     > in hashed collections will be hashed twice; second (and much more
>     > important), every execution switch(str) needs to call your
>     special hashcode
>     > function again for str, as this hashcode cannot be cached in
>     str. This extra
>     > cost makes switch-on-string O(N) on the str.length, which makes
>     the hashing
>     > compilation strategy pointless.
>
>     You make a good point. As for being "hashed twice", that's simply the
>     cost of removing the reliance on String.hashCode(). Well, couldn't
>     $switch_hashCode() perform some caching for itself?
>
>     Paul
>
>


From opinali at gmail.com  Sun Dec 20 09:50:10 2009
From: opinali at gmail.com (Osvaldo Pinali Doederlein)
Date: Sun, 20 Dec 2009 15:50:10 -0200
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <4B2CCCC3.1090401@gmail.com>
References: <201502.54372.qm@web57708.mail.re3.yahoo.com>	<fb5ec5090912180938h2953e542l6f990f58aaf00be8@mail.gmail.com>
	<4B2C09D7.10405@adres.pl> <4B2CCCC3.1090401@gmail.com>
Message-ID: <4B2E63D2.2070407@gmail.com>

Hi,

> The long answer, I'm coding a prototype impl of this optimization in 
> some constructors so I can benchmark this and see if it's worth the 
> trouble. As usual it's better telling the code to do all the talking.

Result of this benchmark, for JDK 7b78, creating a 500-char string:

new String(char value[]):
Normal = (server) 559ns, (client) 520ns; hashCode() = (server) 1046ms, 
(client) 1182ms
Eager hashing = (server) 1704ns, (client) 1609ns

This first test was certainly a disaster, but that was expected - one of 
the constructors that rely on native/intrinsic methods like copyOf(), 
copyOfRange(), arraycopy(). Even adding the construction to hash times 
don't show any advantage for eager hashing.

new String(int[] codePoints, int offset, int count):
Normal = (server) 3277ns, (client) 3342ns
Eager hashing = (server) 3397ns, (client) 3610ns

This one is much better; we can see some performance degradation with 
the eager hashing (+3,66% for server and +8,01% for client), but that's 
a small overhead on already-slow constructors, and summing construction 
to hashing times show a nice advantage (-27% for server, -25% for 
client). I expect the relative overhead of eager hashing to be even 
smaller for the 3 constructors that use StringCoding.decode(), but I 
didn't try these (much more code to hack).

I've used every optimization possible at source level - manual inlining, 
constant folding/propagation, caching fields in locals (sample code 
attached in the end) - to no avail. I believe one significant problem is 
that the hashing function is not friendly to optimization; something 
like str[0] ^ str[1] ^ str[2]... would be much better (i.e., any 
function that allows loop unrolling and SIMD tricks). But, as we've 
already discussed, String.hashCode()'s algorithm cannot be changed so 
it's pointless considering this.

My conclusion is "Myth Busted". Eager hashing helps only the most 
complex constructors, which are very rarely used. Even for these 
constructors, there is a measurable (if small) cost for strings that are 
never hashed, which probably doesn't offset the bigger, but still modest 
gain for those that are eventually hashed.

A+
Osvaldo

     public String(String original) {
         int size = original.count;
         char[] originalValue = original.value;
         char[] v;
         if (originalValue.length > size) {
             // The array representing the String is bigger than the new
             // String itself.  Perhaps this constructor is being called
             // in order to trim the baggage, so make a copy of the array.
             int off = original.offset;
             int newLength = off+size - off;
             v = new char[newLength];
             int h = 0;
             int end = Math.min(originalValue.length - off, newLength);
             for (int i = 0; i < end; ++i) {
               h = 31*h + (v[i] = originalValue[off + i]);
             }
             this.hash = h;
         } else {
             // The array representing the String is the same
             // size as the String, so no point in making a copy.
             v = originalValue;
         }
         this.offset = 0;
         this.count = size;
         this.value = v;
     }


A+
Osvaldo


From mthornton at optrak.co.uk  Sun Dec 20 11:25:47 2009
From: mthornton at optrak.co.uk (Mark Thornton)
Date: Sun, 20 Dec 2009 19:25:47 +0000
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <4B2E63D2.2070407@gmail.com>
References: <201502.54372.qm@web57708.mail.re3.yahoo.com>	<fb5ec5090912180938h2953e542l6f990f58aaf00be8@mail.gmail.com>	<4B2C09D7.10405@adres.pl>
	<4B2CCCC3.1090401@gmail.com> <4B2E63D2.2070407@gmail.com>
Message-ID: <4B2E7A3B.5010307@optrak.co.uk>

Osvaldo Pinali Doederlein wrote:
> attached in the end) - to no avail. I believe one significant problem is 
> that the hashing function is not friendly to optimization; something 
> like str[0] ^ str[1] ^ str[2]... would be much better (i.e., any 
> function that allows loop unrolling and SIMD tricks). But, as we've 
> already discussed, String.hashCode()'s algorithm cannot be changed so 
> it's pointless considering this.
>   

from JavaDoc: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

= s[0]*31^(n-1) + s[2]*31^(n-3) + ... +
  s[1]*31^(n-2) + s[4]*31^(n-4) + ...


int hOdd = 0;
int hEven = 0;
final int mSquared = 31*31;

int ne = s.length() &~1;
for (int i=0; i<ne; i+=2) {
   hEven = mSquared*hEven+s.charAt(i);
   hOdd = mSquared*hOdd+s.charAt(i+1);
}
if (ne < s.length() {
   hEven = mSquared*hEven+s.charAt(ne);
   return hEven+31*hOdd;
}
else {
   return 31*hEven+hOdd;
}

Other variations possible. For really long Strings, divide the string into 
M pieces where M is the number of processors available.

Mark Thornton


From opinali at gmail.com  Sun Dec 20 13:32:38 2009
From: opinali at gmail.com (Osvaldo Pinali Doederlein)
Date: Sun, 20 Dec 2009 19:32:38 -0200
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <4B2E7A3B.5010307@optrak.co.uk>
References: <201502.54372.qm@web57708.mail.re3.yahoo.com>	<fb5ec5090912180938h2953e542l6f990f58aaf00be8@mail.gmail.com>	<4B2C09D7.10405@adres.pl>
	<4B2CCCC3.1090401@gmail.com> <4B2E63D2.2070407@gmail.com>
	<4B2E7A3B.5010307@optrak.co.uk>
Message-ID: <4B2E97F6.2040206@gmail.com>

Em 20/12/2009 17:25, Mark Thornton escreveu:
> Osvaldo Pinali Doederlein wrote:
>> attached in the end) - to no avail. I believe one significant problem 
>> is that the hashing function is not friendly to optimization; 
>> something like str[0] ^ str[1] ^ str[2]... would be much better 
>> (i.e., any function that allows loop unrolling and SIMD tricks). But, 
>> as we've already discussed, String.hashCode()'s algorithm cannot be 
>> changed so it's pointless considering this.
>
> from JavaDoc: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
>
> = s[0]*31^(n-1) + s[2]*31^(n-3) + ... +
>  s[1]*31^(n-2) + s[4]*31^(n-4) + ...

Good idea, I didn't consider this and I believe the JIT is not that 
smart. I did this change to the code, but with bad results again. For 
the simple String(char[]), only regressions (server: 1704->1749ns; 
client: 1609->1630ns). The String(int[],int,int) constructor can't use 
the 2x-unrolled code because it must proceed one char at a time due to 
supplementary code points (I could work around this - but then, that 
would add even more code).

As usual, it's dangerous to pollute very tight loops with extra 
variables and calculations; we risk losing performance due to factors 
like register allocation. The eager hashcode code was bad enough, and 
the more sophisticated version is even worse. Things are different when 
the code is really bound by the FSB; the benchmark allocated 500-char 
strings (roughly 1Kb) but this is too small, the new operator zero-fills 
the char[] so everything in in the L1 when it is initialized. Eager 
hashcode computation would probably look better for huge strings that 
don't fit in the caches, then the extra instructions could be completely 
hidden behind cache misses, but this is an irrelevant scenario for String.

A+
Osvaldo

>
>
> int hOdd = 0;
> int hEven = 0;
> final int mSquared = 31*31;
>
> int ne = s.length() &~1;
> for (int i=0; i<ne; i+=2) {
>   hEven = mSquared*hEven+s.charAt(i);
>   hOdd = mSquared*hOdd+s.charAt(i+1);
> }
> if (ne < s.length() {
>   hEven = mSquared*hEven+s.charAt(ne);
>   return hEven+31*hOdd;
> }
> else {
>   return 31*hEven+hOdd;
> }
>
> Other variations possible. For really long Strings, divide the string 
> into M pieces where M is the number of processors available.
>
> Mark Thornton
>
>
>


From mthornton at optrak.co.uk  Sun Dec 20 13:40:35 2009
From: mthornton at optrak.co.uk (Mark Thornton)
Date: Sun, 20 Dec 2009 21:40:35 +0000
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <4B2E97F6.2040206@gmail.com>
References: <201502.54372.qm@web57708.mail.re3.yahoo.com>	<fb5ec5090912180938h2953e542l6f990f58aaf00be8@mail.gmail.com>	<4B2C09D7.10405@adres.pl>
	<4B2CCCC3.1090401@gmail.com> <4B2E63D2.2070407@gmail.com>
	<4B2E7A3B.5010307@optrak.co.uk> <4B2E97F6.2040206@gmail.com>
Message-ID: <4B2E99D3.3030208@optrak.co.uk>

Osvaldo Pinali Doederlein wrote:
> Em 20/12/2009 17:25, Mark Thornton escreveu:
>> Osvaldo Pinali Doederlein wrote:
>>> attached in the end) - to no avail. I believe one significant 
>>> problem is that the hashing function is not friendly to 
>>> optimization; something like str[0] ^ str[1] ^ str[2]... would be 
>>> much better (i.e., any function that allows loop unrolling and SIMD 
>>> tricks). But, as we've already discussed, String.hashCode()'s 
>>> algorithm cannot be changed so it's pointless considering this.
>>
>> from JavaDoc: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
>>
>> = s[0]*31^(n-1) + s[2]*31^(n-3) + ... +
>>  s[1]*31^(n-2) + s[4]*31^(n-4) + ...
>
> Good idea, I didn't consider this and I believe the JIT is not that 
> smart. I did this change to the code, but with bad results again. For 
> the simple String(char[]), only regressions (server: 1704->1749ns; 
> client: 1609->1630ns). The String(int[],int,int) constructor can't use 
> the 2x-unrolled code because it must proceed one char at a time due to 
> supplementary code points (I could work around this - but then, that 
> would add even more code).
>
> As usual, it's dangerous to pollute very tight loops with extra 
> variables and calculations; we risk losing performance due to factors 
> like register allocation. The eager hashcode code was bad enough, and 
> the more sophisticated version is even worse. Things are different 
> when the code is really bound by the FSB; the benchmark allocated 
> 500-char strings (roughly 1Kb) but this is too small, the new operator 
> zero-fills the char[] so everything in in the L1 when it is 
> initialized. Eager hashcode computation would probably look better for 
> huge strings that don't fit in the caches, then the extra instructions 
> could be completely hidden behind cache misses, but this is an 
> irrelevant scenario for String.
>
For something important like String.hashCode the JIT doesn't have to be 
smart because we can ask the JVM authors to handcode intrinsic 
alternatives. The fastest code is likely to depend on the hardware --- 
x64 has more registers available, the number of integer multiplier units 
varies, etc.

Regards,
Mark Thornton


From opinali at gmail.com  Mon Dec 21 05:49:36 2009
From: opinali at gmail.com (Osvaldo Pinali Doederlein)
Date: Mon, 21 Dec 2009 11:49:36 -0200
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <4B2E99D3.3030208@optrak.co.uk>
References: <201502.54372.qm@web57708.mail.re3.yahoo.com>	<fb5ec5090912180938h2953e542l6f990f58aaf00be8@mail.gmail.com>	<4B2C09D7.10405@adres.pl>
	<4B2CCCC3.1090401@gmail.com> <4B2E63D2.2070407@gmail.com>
	<4B2E7A3B.5010307@optrak.co.uk> <4B2E97F6.2040206@gmail.com>
	<4B2E99D3.3030208@optrak.co.uk>
Message-ID: <4B2F7CF0.9@gmail.com>

Em 20/12/2009 19:40, Mark Thornton escreveu:
> Osvaldo Pinali Doederlein wrote:
>> Good idea, I didn't consider this and I believe the JIT is not that 
>> smart. I did this change to the code, but with bad results again. For 
>> the simple String(char[]), only regressions (server: 1704->1749ns; 
>> client: 1609->1630ns). The String(int[],int,int) constructor can't 
>> use the 2x-unrolled code because it must proceed one char at a time 
>> due to supplementary code points (I could work around this - but 
>> then, that would add even more code).
>>
>> As usual, it's dangerous to pollute very tight loops with extra 
>> variables and calculations; we risk losing performance due to factors 
>> like register allocation. The eager hashcode code was bad enough, and 
>> the more sophisticated version is even worse. Things are different 
>> when the code is really bound by the FSB; the benchmark allocated 
>> 500-char strings (roughly 1Kb) but this is too small, the new 
>> operator zero-fills the char[] so everything in in the L1 when it is 
>> initialized. Eager hashcode computation would probably look better 
>> for huge strings that don't fit in the caches, then the extra 
>> instructions could be completely hidden behind cache misses, but this 
>> is an irrelevant scenario for String.
>>
> For something important like String.hashCode the JIT doesn't have to 
> be smart because we can ask the JVM authors to handcode intrinsic 
> alternatives. The fastest code is likely to depend on the hardware --- 
> x64 has more registers available, the number of integer multiplier 
> units varies, etc.
>

Fair enough, these testes were on a Core2 Duo laptop (Windows 7 32-bit), 
so I repeated in another box with Solaris amd64 (with -d64):

new String(char value[]):
Normal = (server) 475ns, (client) 474ns; hashCode() = (server) 891ms, 
(client) 890ms
Eager hashing (2x unrolled) = (server) 836ns, (client) 837ns

new String(int[] codePoints, int offset, int count):
Normal = (server) 2030ns, (client) 2028ns
Eager hashing (normal) = (server) 2623ns, (client) 2626ns

The results for the simple constructor were surprisingly better - in 
fact they were even "too good" to raise some suspicion, faster than the 
isolated hashCode() cost (but not too much, and weird things often 
happen in optimization). The cost over the standard, non-eager-hashing 
constructors is still too high for popular constructors + strings that 
are never hashed. Still this confirms the superiority of 64-bit; the 
native code for these simple constructors should leave enough spare 
registers that HotSpot could accommodate the extra hash calculation with 
much less impact. But I'm actually surprised because I though modern x86 
CPUs, even (and remarkably) in 32-bit mode, would use tricks like 
virtual register windows so the tiny number of architectural registers 
wouldn't matter that much - it seems this doesn't work as well as 
advertised.

Unfortunately, for the complex constructors there was no gain, even 
slightly worse (+29% for both server and client) although for precise 
comparison I'd need to repeat the 32-bit tests in this different system.

A+
Osvaldo


From Joe.Darcy at Sun.COM  Mon Dec 21 19:32:56 2009
From: Joe.Darcy at Sun.COM (Joseph D. Darcy)
Date: Mon, 21 Dec 2009 19:32:56 -0800
Subject: Benefit from computing String Hash at compile time?
In-Reply-To: <b9e663070912190928i6123cb69h101c0f473a6d49be@mail.gmail.com>
References: <b9e663070912181429v553f7e32o9eee5c62d57fd2de@mail.gmail.com>
	<560fb5ed0912181504h5033d229uf77827beb519460c@mail.gmail.com>
	<b9e663070912181742xfa20d61k68fb280647465adb@mail.gmail.com>
	<4B2CD415.5070200@gmail.com>
	<560fb5ed0912190843x6f2b4d10n2e309320ffa9d133@mail.gmail.com>
	<b9e663070912190928i6123cb69h101c0f473a6d49be@mail.gmail.com>
Message-ID: <4B303DE8.5010103@sun.com>

Paul Benedict wrote:
> Reinier,
>
>   
>> There is no need to explain that it can't ever change; that notion is
>> already inherent in the fact that the algorithm is explained in the javadoc.
>> The mistake seems to be in what 'part of the java spec' means. It does not
>> actually mean: Cannot possibly change.
>>     
>
> The algorithm is explained. The documentation is good, isn't it? It
> is, however, the documentation is for that version of the Java
> platform.
>
>   
>> The javadoc of String should also not be bogged down with the implementation
>> detail that string-on-switch is dependent on it. Implementation details have
>> no place in javadoc.
>>     
>
> I agree with you. No one has to reveal implementation details. All
> that is necessary is a note that the algorithm must not change from
> JDK version to JDK version.
>
> Moving on...
> If anyone at Sun is still listening (::grins::), I prefer to emit a
> static method that contains a duplicate of the hashCode() algorithm.
> Then, no one has to worry about JDK version upgrades and
> String.hashCode() is free for future tweaking.
>
> static int $switch_hashCode(String s) {... }
> switch ($switch_hashCode(s)) {
> ...
> }
>
>   

I'm still listening, but mostly on vacation until early 2010.

I'm quite familiar with the compatibility policies used to evolve the 
JDK and I've written about those in my blog; e.g.

"JDK Release Types and Compatibility Regions"
http://blogs.sun.com/darcy/entry/release_types_compatibility_regions

"Kinds of Compatibility: Source, Binary, and Behavioral"
http://blogs.sun.com/darcy/entry/kinds_of_compatibility

There is a vanishingly small chance changing the hash algorithm of 
string would ever be contemplated for Java SE; the behavioral 
compatibility risk would be too great given the ubiquitous use of the 
String class.  As one of my professors was fond of saying, "making 
virtue of a necessity," since the string hashing algorithm effectively 
cannot be changed, the current strings in switch implementation assumes 
it will be stable.

-Joe


From Ulf.Zibis at gmx.de  Tue Dec 22 05:47:07 2009
From: Ulf.Zibis at gmx.de (Ulf Zibis)
Date: Tue, 22 Dec 2009 14:47:07 +0100
Subject: Strings in Switch
In-Reply-To: <4B1FF7AD.50502@gmx.de>
References: <b9e663070912051735p2c7665d4i2e08c38a692de951@mail.gmail.com>		<4B1D9FED.7090406@sun.com>		<b9e663070912071800m5dc58848x663476cbb4d2c20e@mail.gmail.com>		<560fb5ed0912071934x1aa29e9md3031f668f1429bc@mail.gmail.com>		<4B1DD0BB.7010709@sun.com>		<17b2302a0912080049s765620beo453d59c5df663f0c@mail.gmail.com>		<b9e663070912080639s7a95e633h1dd0bf33068e5106@mail.gmail.com>		<4B1E833B.1040200@sun.com>		<b9e663070912080900t20da5951n3400049824b4e72b@mail.gmail.com>		<4B1F7EFC.2030601@oracle.com>	<560fb5ed0912090253i3f6b7f5dt69d21a3cec3f07b7@mail.gmail.com>	<4B1FC38E.2010702@oracle.com>
	<4B1FF7AD.50502@gmx.de>
Message-ID: <4B30CDDB.20108@gmx.de>

For more details see my bug report:
6912520 - String#equals(Object) should benefit from hash code 
<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6912520>

-Ulf


Am 09.12.2009 20:17, Ulf Zibis schrieb:
>
> If a compare on a string is rarely, and especially if it's 
> total length is not trivial, the hash code computation should be more 
> expensive, but after some repeated compairs on the same string, the 
> hashcode algorithm would win.
> Here an enhanced String#equals() implementation, which values the length 
> of every invoked compare on characters:
>
>     int equalByHashThreshold = count;
>
>     public boolean equals(Object anObject) {
>         if (this == anObject) {
>             return true;
>         }
>         if (anObject instanceof String) {
>             String anotherString = (String)anObject;
>             int n = count;
>             if (n == anotherString.count &&
>                     (equalByHashThreshold > 0 ||
>                     hash() == anotherString.hash())) {
>                 char v1[] = value;
>                 char v2[] = anotherString.value;
>                 int i = offset;
>                 int j = anotherString.offset;
>                 while (n-- != 0)
>                     if (v1[i++] != v2[j++]) {
>                         if (equalByHashThreshold > 0)
>                             equalByHashThreshold -= (count - n);
>                         return false;
>                     }
>                 return true;
>             }
>         }
>         return false;
>     }
>
>     public int hashCode() {
>         int h = hash;
>         if (h == 0) {
>             int off = offset;
>             char val[] = value;
>             int len = count;
>
>             for (int i = 0; i < len; i++) {
>                 h = 31*h + val[off++];
>             }
>             hash = h;
>             equalByHashThreshold = 0;
>         }
>         return h;
>     }
>
>
>
> -Ulf
>
>
>
>
>
>
>   


From info at frankcornelis.be  Thu Dec 24 21:33:16 2009
From: info at frankcornelis.be (Frank Cornelis)
Date: Fri, 25 Dec 2009 06:33:16 +0100
Subject: Field attribute
Message-ID: <4B344E9C.1090304@frankcornelis.be>

Hi,


Here is an idea for a Java language extension. As JBoss Seam user I 
frequently do something like:
@DataModel
private List<MyEntity> entities;

@Factory("entities");
public void initEntities() {
     this.entities = ...
}

If I refactor this code and rename the entities field, my factory method 
won't work anymore as it has to refer to the entities field name. So my 
suggestion would be to introduce a syntax as follows:
@DataModel
private List<MyEntity> entities;

@Factory(@this.entity.getName())
public void initEntities() {
     this.entities = ...
}

So the @this.entities gives you the Field class of the this.entities 
variable.


Kind Regards,
Frank.


From Joe.Darcy at Sun.COM  Sun Dec 27 08:10:52 2009
From: Joe.Darcy at Sun.COM (Joseph D. Darcy)
Date: Sun, 27 Dec 2009 08:10:52 -0800
Subject: Field attribute
In-Reply-To: <4B344E9C.1090304@frankcornelis.be>
References: <4B344E9C.1090304@frankcornelis.be>
Message-ID: <4B37870C.1040905@sun.com>

Frank Cornelis wrote:
> Hi,
>
>
> Here is an idea for a Java language extension. 

The Project Coin call for proposals phase ended many months ago and the 
proposal form is a detailed examination of the language change and its 
implications rather than a sketch of the idea.

To have this kind of idea considered for a future language change, it 
should be submitted to
http://bugreport.sun.com/bugreport/

Regards,

-Joe Darcy


From matthew at matthewadams.me  Sun Dec 27 08:22:55 2009
From: matthew at matthewadams.me (Matthew Adams)
Date: Sun, 27 Dec 2009 08:22:55 -0800
Subject: Field attribute
In-Reply-To: <4B37870C.1040905@sun.com>
References: <4B344E9C.1090304@frankcornelis.be> <4B37870C.1040905@sun.com>
Message-ID: <1ba389ce0912270822u4e43ba5cvde141a0a4663dcfa@mail.gmail.com>

Hi Frank,

I made a similar proposal, but too late:

http://mail.openjdk.java.net/pipermail/coin-dev/2009-December/002638.html

I'll add a bug report myself.

-matthew

On Sun, Dec 27, 2009 at 8:10 AM, Joseph D. Darcy <Joe.Darcy at sun.com> wrote:
> Frank Cornelis wrote:
>> Hi,
>>
>>
>> Here is an idea for a Java language extension.
>
> The Project Coin call for proposals phase ended many months ago and the
> proposal form is a detailed examination of the language change and its
> implications rather than a sketch of the idea.
>
> To have this kind of idea considered for a future language change, it
> should be submitted to
> http://bugreport.sun.com/bugreport/
>
> Regards,
>
> -Joe Darcy
>
>


-- 
mailto:matthew at matthewadams.me
skype:matthewadams12
yahoo:matthewadams
aol:matthewadams12
google-talk:matthewadams12 at gmail.com
msn:matthew at matthewadams.me
http://matthewadams.me
http://www.linkedin.com/in/matthewadams


From matthew at matthewadams.me  Sun Dec 27 08:30:41 2009
From: matthew at matthewadams.me (Matthew Adams)
Date: Sun, 27 Dec 2009 08:30:41 -0800
Subject: Field attribute
In-Reply-To: <1ba389ce0912270822u4e43ba5cvde141a0a4663dcfa@mail.gmail.com>
References: <4B344E9C.1090304@frankcornelis.be> <4B37870C.1040905@sun.com>
	<1ba389ce0912270822u4e43ba5cvde141a0a4663dcfa@mail.gmail.com>
Message-ID: <1ba389ce0912270830x2a0f321fmaf5252bd02bab090@mail.gmail.com>

Bug report/enhancement request filed with Sun just now.

-matthew

On Sun, Dec 27, 2009 at 8:22 AM, Matthew Adams <matthew at matthewadams.me> wrote:
> Hi Frank,
>
> I made a similar proposal, but too late:
>
> http://mail.openjdk.java.net/pipermail/coin-dev/2009-December/002638.html
>
> I'll add a bug report myself.
>
> -matthew
>
> On Sun, Dec 27, 2009 at 8:10 AM, Joseph D. Darcy <Joe.Darcy at sun.com> wrote:
>> Frank Cornelis wrote:
>>> Hi,
>>>
>>>
>>> Here is an idea for a Java language extension.
>>
>> The Project Coin call for proposals phase ended many months ago and the
>> proposal form is a detailed examination of the language change and its
>> implications rather than a sketch of the idea.
>>
>> To have this kind of idea considered for a future language change, it
>> should be submitted to
>> http://bugreport.sun.com/bugreport/
>>
>> Regards,
>>
>> -Joe Darcy
>>
>>
>
>
>
> --
> mailto:matthew at matthewadams.me
> skype:matthewadams12
> yahoo:matthewadams
> aol:matthewadams12
> google-talk:matthewadams12 at gmail.com
> msn:matthew at matthewadams.me
> http://matthewadams.me
> http://www.linkedin.com/in/matthewadams
>


-- 
mailto:matthew at matthewadams.me
skype:matthewadams12
yahoo:matthewadams
aol:matthewadams12
google-talk:matthewadams12 at gmail.com
msn:matthew at matthewadams.me
http://matthewadams.me
http://www.linkedin.com/in/matthewadams


From david.goodenough at linkchoose.co.uk  Sun Dec 27 08:58:51 2009
From: david.goodenough at linkchoose.co.uk (David Goodenough)
Date: Sun, 27 Dec 2009 16:58:51 +0000
Subject: Field attribute
In-Reply-To: <4B344E9C.1090304@frankcornelis.be>
References: <4B344E9C.1090304@frankcornelis.be>
Message-ID: <200912271658.52012.david.goodenough@linkchoose.co.uk>

You might like to look at Lombok and my Beans extension to Lombok.

Lombok can be found at http://projectlombok.org and my Beans extension
can be found at http://dga.co.uk/lombokbeans.

David

On Friday 25 December 2009, Frank Cornelis wrote:
> Hi,
> 
> 
> Here is an idea for a Java language extension. As JBoss Seam user I
> frequently do something like:
> @DataModel
> private List<MyEntity> entities;
> 
> @Factory("entities");
> public void initEntities() {
>      this.entities = ...
> }
> 
> If I refactor this code and rename the entities field, my factory method
> won't work anymore as it has to refer to the entities field name. So my
> suggestion would be to introduce a syntax as follows:
> @DataModel
> private List<MyEntity> entities;
> 
> @Factory(@this.entity.getName())
> public void initEntities() {
>      this.entities = ...
> }
> 
> So the @this.entities gives you the Field class of the this.entities
> variable.
> 
> 
> Kind Regards,
> Frank.
>