From charles.nutter at sun.com  Tue Jun  3 16:56:06 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Tue, 03 Jun 2008 18:56:06 -0500
Subject: Longjumps considered inexpensive...until they aren't
In-Reply-To: <611B8DC7-B9E0-4FB6-8893-A9AD3267A8CE@sun.com>
References: <483D2213.2010307@sun.com>
	<611B8DC7-B9E0-4FB6-8893-A9AD3267A8CE@sun.com>
Message-ID: <4845DA16.8080504@sun.com>

John Rose wrote:
> I see two paths here:
> 
> 1. Force the JIT to inline the code patterns you care about.  (One  
> logical Ruby method should be compiled in one JIT compilation task.)
> 
> 2. Make the JVM process the exceptions you care about more  
> efficiently, in the out-of-line case.
> 
> Both should be investigated.  Both are probably good graduate  
> theses...  Anyone?
> 
> The easiest one to try first is #1, since we already have a compiler  
> oracle, and could quickly wire up the right guidance statements (in a  
> JVM tweak), if we could figure out what Ruby wants to say to the JVM  
> about inlining.

Ok, I've done some investigation with a LogCompilation tool and noticed 
a few things.

- We have a few key methods in the call path that are "too big" in the 
log output...important stuff like InlineCachingCallSite.call. Those will 
be obvious areas we can look to fix. For now I've continued some 
investigation setting MaxInlineSize up a bit.

- The max inline level seems to be a larger problem. It appears in my 
OpenJDK source that the default max level is 9. If you look at the 
traces I provided in earlier emails we're looking at a bare minimum of 
8-9 calls in the simpliest case of a non-local return, and there's very 
little I can do to reduce that on pre-invokedynamic JVMs:

at ruby.__dash_e__.block_0$RUBY$__block__(-e:1)
at ruby.__dash_e__BlockCallback$block_0$RUBY$__block__xx1.call(Unknown
Source)
at org.jruby.runtime.CompiledBlockLight.yield(CompiledBlockLight.java:107)
at org.jruby.runtime.CompiledBlockLight.yield(CompiledBlockLight.java:88)
at org.jruby.runtime.Block.yield(Block.java:109)
at org.jruby.RubyInteger.times(RubyInteger.java:163)
at org.jruby.RubyIntegerInvoker$times_method_0_0.call(Unknown Source)
at org.jruby.runtime.CallSite$InlineCachingCallSite.call(CallSite.java:312)
          at ruby.__dash_e__.method__0$RUBY$foo(-e:1)

So in this simple case, there's a pretty slim chance the throw and the 
catch would ever get inlined into the same body of code. And from what 
I'm seeing, there's almost no chance since the "sliding window" of where 
it bases the inlining appears unlikely to land exactly on the "foo" 
method where the non-local return should end up.

Based on these two findings, I tried bumping up the inline level but 
have not been able to re-profile yet. I'd like to know if I'm on the 
right track. Obviously we want to do whatever we can to reduce the 
length of the call path, but it seems like even the best case for us now 
is going to be a few levels deeper than the standard inlining process 
can handle.

Are there other settings I might want to look at for tweaking this? I 
have a build of OpenJDK7 I can fiddle with.


From John.Rose at Sun.COM  Tue Jun  3 18:13:19 2008
From: John.Rose at Sun.COM (John Rose)
Date: Tue, 03 Jun 2008 18:13:19 -0700
Subject: Longjumps considered inexpensive...until they aren't
In-Reply-To: <4845DA16.8080504@sun.com>
References: <483D2213.2010307@sun.com>
	<611B8DC7-B9E0-4FB6-8893-A9AD3267A8CE@sun.com>
	<4845DA16.8080504@sun.com>
Message-ID: <8F372C79-0EF3-43D6-AF1D-CE2DA796A9C8@sun.com>

On Jun 3, 2008, at 4:56 PM, Charles Oliver Nutter wrote:

> Ok, I've done some investigation with a LogCompilation tool and  
> noticed
> a few things.
>
> - We have a few key methods in the call path that are "too big" in the
> log output...important stuff like InlineCachingCallSite.call. Those  
> will
> be obvious areas we can look to fix. For now I've continued some
> investigation setting MaxInlineSize up a bit.

Yes, that's a good start.

> - The max inline level seems to be a larger problem. It appears in my
> OpenJDK source that the default max level is 9. If you look at the
> traces I provided in earlier emails we're looking at a bare minimum of
> 8-9 calls in the simpliest case of a non-local return, and there's  
> very
> little I can do to reduce that on pre-invokedynamic JVMs:

Yes.

> at ruby.__dash_e__.block_0$RUBY$__block__(-e:1)
> at ruby.__dash_e__BlockCallback$block_0$RUBY$__block__xx1.call(Unknown
> Source)
> at org.jruby.runtime.CompiledBlockLight.yield 
> (CompiledBlockLight.java:107)
> at org.jruby.runtime.CompiledBlockLight.yield 
> (CompiledBlockLight.java:88)
> at org.jruby.runtime.Block.yield(Block.java:109)
> at org.jruby.RubyInteger.times(RubyInteger.java:163)
> at org.jruby.RubyIntegerInvoker$times_method_0_0.call(Unknown Source)
> at org.jruby.runtime.CallSite$InlineCachingCallSite.call 
> (CallSite.java:312)
>           at ruby.__dash_e__.method__0$RUBY$foo(-e:1)

Are a lot of those (effectively) tail calls?  This will be more  
common in dynamic languages, and we could tune the inline heuristics  
to prefer to inline tail calls.

We should probably also experiment with (gulp) an @Inline annotation  
for methods, not to override the heuristics, but to add a vote from  
the code writer.  It could be made Hotspot-specific, to start with.   
(I have an annotation parser for the VM classloader, that will be in  
my next post to the mlvm repo.)  For a gross, gross POC hack, the VM  
could try putting looking for the string "_inline" in the method  
name, and bias the inlining heuristic.  There is also a compiler  
oracle function in the JVM which can be used to place advice on  
individual methods, but it is hard to use and not modular (single  
property file).

> Based on these two findings, I tried bumping up the inline level but
> have not been able to re-profile yet. I'd like to know if I'm on the
> right track. Obviously we want to do whatever we can to reduce the
> length of the call path, but it seems like even the best case for  
> us now
> is going to be a few levels deeper than the standard inlining process
> can handle.

Yes, that's the right track.  We should also keep talking about  
incremental changes to the inlining heuristics.

> Are there other settings I might want to look at for tweaking this? I
> have a build of OpenJDK7 I can fiddle with.

I started a wiki page to collect such information, and added a few  
more points there:
   http://wikis.sun.com/display/HotSpotInternals/Inlining

Please consider contributing as you learn what works and doesn't.

-- John


From charles.nutter at sun.com  Wed Jun  4 09:25:12 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Wed, 04 Jun 2008 11:25:12 -0500
Subject: Longjumps considered inexpensive...until they aren't
In-Reply-To: <8F372C79-0EF3-43D6-AF1D-CE2DA796A9C8@sun.com>
References: <483D2213.2010307@sun.com>
	<611B8DC7-B9E0-4FB6-8893-A9AD3267A8CE@sun.com>
	<4845DA16.8080504@sun.com>
	<8F372C79-0EF3-43D6-AF1D-CE2DA796A9C8@sun.com>
Message-ID: <4846C1E8.3020408@sun.com>

John Rose wrote:
> Are a lot of those (effectively) tail calls?  This will be more  
> common in dynamic languages, and we could tune the inline heuristics  
> to prefer to inline tail calls.

Some of them are, but the need to have pre/post call logic (like 
artificial call frames) means many/most of these calls will ultimately 
not be tail calls.

> We should probably also experiment with (gulp) an @Inline annotation  
> for methods, not to override the heuristics, but to add a vote from  
> the code writer.  It could be made Hotspot-specific, to start with.   
> (I have an annotation parser for the VM classloader, that will be in  
> my next post to the mlvm repo.)  For a gross, gross POC hack, the VM  
> could try putting looking for the string "_inline" in the method  
> name, and bias the inlining heuristic.  There is also a compiler  
> oracle function in the JVM which can be used to place advice on  
> individual methods, but it is hard to use and not modular (single  
> property file).

I have managed to eliminate one of the frames and simplify several 
others, with a subsequent perf boost in my simple JRuby nonlocal return 
benchmark (about 25% faster). It's still nowhere near as fast as the 
normal "falling out" return case though.

Another point that became obvious when logging compilation is that all 
our crufty overhead to support Ruby is probably using up the little 
capital we have to spend on inlining. For example, to maintain an 
accurate Ruby-land call trace, we need to update a separate line number 
counter. To support Ruby's unsafe Thread#kill and Thread#raise methods, 
we need to periodically checkpoint. And of course there's the artificial 
frame stack, which holds out-of-band data we can't easily pass through 
the Java call stack (and which still might not be available everywhere 
it's needed). All this ends up getting inlined, because it's fairly 
short paths...but it seems like it makes the total inlined result a lot 
"flatter", probably because the total size of inlined code gets puffed 
up a lot.

Exploration continues. I'm still looking for the magic bullet that will 
eliminate our need for artificial frames.

- Charlie


From John.Rose at Sun.COM  Wed Jun  4 09:51:40 2008
From: John.Rose at Sun.COM (John Rose)
Date: Wed, 04 Jun 2008 09:51:40 -0700
Subject: Longjumps considered inexpensive...until they aren't
In-Reply-To: <4846C1E8.3020408@sun.com>
References: <483D2213.2010307@sun.com>
	<611B8DC7-B9E0-4FB6-8893-A9AD3267A8CE@sun.com>
	<4845DA16.8080504@sun.com>
	<8F372C79-0EF3-43D6-AF1D-CE2DA796A9C8@sun.com>
	<4846C1E8.3020408@sun.com>
Message-ID: <4D4194C3-DECE-40C0-AF01-B7002EFCEAAF@sun.com>

Annotations on call sites might let you put all the static  
information aside.  How much of that stuff is static or could be made  
static?  -- John

On Jun 4, 2008, at 9:25 AM, Charles Oliver Nutter wrote:

> Exploration continues. I'm still looking for the magic bullet that  
> will
> eliminate our need for artificial frames.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20080604/1ef6d822/attachment.html 

From charles.nutter at sun.com  Wed Jun  4 10:07:51 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Wed, 04 Jun 2008 12:07:51 -0500
Subject: Longjumps considered inexpensive...until they aren't
In-Reply-To: <4D4194C3-DECE-40C0-AF01-B7002EFCEAAF@sun.com>
References: <483D2213.2010307@sun.com>
	<611B8DC7-B9E0-4FB6-8893-A9AD3267A8CE@sun.com>
	<4845DA16.8080504@sun.com>
	<8F372C79-0EF3-43D6-AF1D-CE2DA796A9C8@sun.com>
	<4846C1E8.3020408@sun.com>
	<4D4194C3-DECE-40C0-AF01-B7002EFCEAAF@sun.com>
Message-ID: <4846CBE7.1040806@sun.com>

That's still an open question...potentially about half of it. My current 
"best path" for improving the situation is static inspection of the code 
to determine which OOB bits are actually needed and only 
instantiating/initializing those. I'll try to gather more of this 
information today, since I'm in LogCompilation land lately.

John Rose wrote:
> Annotations on call sites might let you put all the static information 
> aside.  How much of that stuff is static or could be made static?  -- John
> 
> On Jun 4, 2008, at 9:25 AM, Charles Oliver Nutter wrote:
> 
>> Exploration continues. I'm still looking for the magic bullet that will 
>>
>> eliminate our need for artificial frames.
>>
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From forax at univ-mlv.fr  Wed Jun  4 18:15:57 2008
From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=)
Date: Thu, 05 Jun 2008 03:15:57 +0200
Subject: Longjumps considered inexpensive...until they aren't
In-Reply-To: <8F372C79-0EF3-43D6-AF1D-CE2DA796A9C8@sun.com>
References: <483D2213.2010307@sun.com>	<611B8DC7-B9E0-4FB6-8893-A9AD3267A8CE@sun.com>	<4845DA16.8080504@sun.com>
	<8F372C79-0EF3-43D6-AF1D-CE2DA796A9C8@sun.com>
Message-ID: <48473E4D.2030700@univ-mlv.fr>

John Rose a ?crit :
...
>
> I started a wiki page to collect such information, and added a few  
> more points there:
>    http://wikis.sun.com/display/HotSpotInternals/Inlining
>
> Please consider contributing as you learn what works and doesn't.
>   
I've sketched a prototype that enable to force inlining at Java Level 
using a
new method of sun.misc.Unsafe.
I think it's a better solution than using an annotation (or an attribute)
because it's more flexible.
It seems to works on a small test.

Because i was in the train on my laptop, i have used the repository 
hotspot/hostpot,
the only available on my hard drive. I will re-target the patch on Da 
Vinci VM
this week-end.

Attachment contains the hotspot patch, the patched sun.misc.Unsafe and a 
small test that inline
java.nio.Buffer.position(int), one method used by System.out.println
that is not inlined by default because its size is too large.
> -- John
>   

R?mi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: inline.patch
Type: text/x-patch
Size: 8318 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20080605/b8648850/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: InlineTest.java
Type: text/x-java
Size: 441 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20080605/b8648850/attachment-0001.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Unsafe.java
Type: text/x-java
Size: 39490 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20080605/b8648850/attachment-0002.bin 

From John.Rose at Sun.COM  Wed Jun  4 19:13:33 2008
From: John.Rose at Sun.COM (John Rose)
Date: Wed, 04 Jun 2008 19:13:33 -0700
Subject: Longjumps considered inexpensive...until they aren't
In-Reply-To: <48473E4D.2030700@univ-mlv.fr>
References: <483D2213.2010307@sun.com>
	<611B8DC7-B9E0-4FB6-8893-A9AD3267A8CE@sun.com>
	<4845DA16.8080504@sun.com>
	<8F372C79-0EF3-43D6-AF1D-CE2DA796A9C8@sun.com>
	<48473E4D.2030700@univ-mlv.fr>
Message-ID: <A1AA2B65-D22A-4F6A-BDDA-C06F41255C5E@sun.com>

On Jun 4, 2008, at 6:15 PM, R?mi Forax wrote:

> I've sketched a prototype that enable to force inlining at Java  
> Level using a
> new method of sun.misc.Unsafe.
> I think it's a better solution than using an annotation (or an  
> attribute)
> because it's more flexible.

True.  Annotations are *almost* what we want for passing this sort of  
information around, except that they must be known at javac time (or  
apt time).

As we refactor the way the JVM loads things, we should consider  
allowing partial class files which contribute annotations to  
previously loaded classes.  That way we wouldn't have to re-invent  
class schema decorations yet another time.

For now, we have been using the compiler oracle mechanism as a way of  
tagging methods to the compilers...

> It seems to works on a small test.

Yes, that's the right experiment to do...  But I think the  
information should be passed through compilerOracle.cpp, which  
already has a "inline" directive.  If that is not the right thing,  
perhaps we want a new command ("force_inline") or, better, an option  
(see the "option" command).

-- John


From forax at univ-mlv.fr  Thu Jun  5 00:11:55 2008
From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=)
Date: Thu, 05 Jun 2008 09:11:55 +0200
Subject: Longjumps considered inexpensive...until they aren't
In-Reply-To: <A1AA2B65-D22A-4F6A-BDDA-C06F41255C5E@sun.com>
References: <483D2213.2010307@sun.com>	<611B8DC7-B9E0-4FB6-8893-A9AD3267A8CE@sun.com>	<4845DA16.8080504@sun.com>	<8F372C79-0EF3-43D6-AF1D-CE2DA796A9C8@sun.com>	<48473E4D.2030700@univ-mlv.fr>
	<A1AA2B65-D22A-4F6A-BDDA-C06F41255C5E@sun.com>
Message-ID: <484791BB.6070003@univ-mlv.fr>

John Rose a ?crit :
> On Jun 4, 2008, at 6:15 PM, R?mi Forax wrote:
>
>   
>> I've sketched a prototype that enable to force inlining at Java  
>> Level using a
>> new method of sun.misc.Unsafe.
>> I think it's a better solution than using an annotation (or an  
>> attribute)
>> because it's more flexible.
>>     
>
> True.  Annotations are *almost* what we want for passing this sort of  
> information around, except that they must be known at javac time (or  
> apt time).
>
> As we refactor the way the JVM loads things, we should consider  
> allowing partial class files which contribute annotations to  
> previously loaded classes.  That way we wouldn't have to re-invent  
> class schema decorations yet another time.
>
> For now, we have been using the compiler oracle mechanism as a way of  
> tagging methods to the compilers...
>
>   
>> It seems to works on a small test.
>>     
>
> Yes, that's the right experiment to do...  But I think the  
> information should be passed through compilerOracle.cpp, which  
> already has a "inline" directive.  If that is not the right thing,  
> perhaps we want a new command ("force_inline") or, better, an option  
> (see the "option" command).
>   
The problem of compilerOracle (my first attempt was to use it)
is that it works because there is not much directive in the file.
It allows wildcards (prefix or suffix)  so all entries are matched
each time the runtime want to know if it can inline a method.

Futhermore, compilerOracle only offers a hint
at least with c1, INLINE_LEVEL, INLINE_SIZE etc have a
higher priority than compilerOracle inline directive (should_inline).
 
> -- John
>   
R?mi


From John.Rose at Sun.COM  Thu Jun  5 00:51:42 2008
From: John.Rose at Sun.COM (John Rose)
Date: Thu, 05 Jun 2008 00:51:42 -0700
Subject: Longjumps considered inexpensive...until they aren't
In-Reply-To: <484791BB.6070003@univ-mlv.fr>
References: <483D2213.2010307@sun.com>
	<611B8DC7-B9E0-4FB6-8893-A9AD3267A8CE@sun.com>
	<4845DA16.8080504@sun.com>
	<8F372C79-0EF3-43D6-AF1D-CE2DA796A9C8@sun.com>
	<48473E4D.2030700@univ-mlv.fr>
	<A1AA2B65-D22A-4F6A-BDDA-C06F41255C5E@sun.com>
	<484791BB.6070003@univ-mlv.fr>
Message-ID: <F8FFD584-B20A-4563-AE74-31B2D19B2462@sun.com>

On Jun 5, 2008, at 12:11 AM, R?mi Forax wrote:

> It allows wildcards (prefix or suffix)  so all entries are matched
> each time the runtime want to know if it can inline a method.

Yes.  This has not been a problem, since the compiler does not  
perform such queries very often.  It does not consume a significant  
fraction of CPU cycles, compared to the running application.

If it were a problem, or if it did not scale well to many marked  
methods, we would want to index the oracle.  At that point it would  
make sense to add a bit to the methodOop layout, saying whether the  
method was known to the oracle's index.

> Futhermore, compilerOracle only offers a hint
> at least with c1, INLINE_LEVEL, INLINE_SIZE etc have a
> higher priority than compilerOracle inline directive (should_inline).

We could add a must_inline or force_inline command to the oracle,  
then, unless there is some performance problem I've missed.

Actually, it is best to change the semantics of the existing inline  
command.  Let's try that first.  If a question arises about  
compatibility, we can make new tuning variables which will govern  
methods hand-marked for inlining, instead of the existing variables:

InlineSmallCodeWhenRequested
MaxInlineSizeWhenRequested
MaxInlineLevelWhenRequested

The inline command in the oracle is a new feature, not yet in wide  
use, so this is a reasonable change.

I see your point about adding a programmatic API.  (The oracle is  
currently a flat file.)  So let's add an Unsafe method, but have it  
take a string and pass it to the oracle to parse.  Take the incoming  
Java string, comvert to utf8 (there's a function in javaClasses.cpp)  
and pass the resulting C string to CompilerOracle::parse_from_line.

What do you think?

-- John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20080605/2cb62592/attachment.html 

From john.rose at sun.com  Fri Jun  6 19:09:18 2008
From: john.rose at sun.com (john.rose at sun.com)
Date: Sat, 07 Jun 2008 02:09:18 +0000
Subject: hg: mlvm/mlvm/hotspot: annot: initial patch to class file parser
Message-ID: <20080607020918.48A71282D3@hg.openjdk.java.net>

Changeset: b4d73d35c59a
Author:    jrose
Date:      2008-06-06 19:08 -0700
URL:       http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/b4d73d35c59a

annot: initial patch to class file parser

+ annot.patch
+ annot.txt
! series


From charles.nutter at sun.com  Sat Jun 14 17:37:24 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Sat, 14 Jun 2008 19:37:24 -0500
Subject: Fixnums: last mile for JRuby math perf
Message-ID: <48546444.1010807@sun.com>

FYI, given the upstart MagLev Ruby implementation (described below), I 
have a newfound respect for the proposed "fixnums in the VM" MLVM 
enhancement. On several benchmarks, especially those related to math, 
the constant fixnum churn has now started to become the primary 
bottleneck. Beyond that, our custom fixnum type obviously stands neatly 
in the way of several optimizations that could be applied to fixnums in 
general.

Reading through John Rose's "fixnums" article, it doesn't sound like 
adding fixnums would be very easy, though it certainly doesn't sound 
impossible either. I was mostly curious if anyone on this list has an 
interest (and more importantly, the necessary skills) in pursuing fixnum 
support in MLVM. And I know John and other Sun folks are in perpetual 
"swamped" mode, but I'd also like to hear if this ranks at all on lists 
of MLVM/DVM priorities inside Sun.

MagLev is a very new, unreleased, untested, incomplete Ruby 
implementation from GemStone based on their SmallTalk VM. In early, 
unofficial results, it positively spanks the other implementations 
(including JRuby) on fixnum/math-heavy benchmarks, largely because (I 
assume) they have true fixnum/SmallInteger support at a VM level. Now I 
seriously doubt that the majority of Ruby applications depend on 
superfast math, given the relative ease in which someone can bail out to 
an extension language like C or Java, but there are plenty of mundane 
operations where true fixnums would really be handy (e.g. simple loops, 
array indexing, etc).

FWIW, Ruby 1.8 and 1.9 use tagged integers for fixnum, and JRuby still 
outperforms both of them with a custom boxed reference type. This is 
almost certainly due to other parts of JRuby being substantially faster 
than either implementation, but I tremble to think what real fixnum 
support in the VM might do. And in my primitive experiments, a 
hand-written Java fib using our fixnum type for all integer operations 
performed no better than normal, dynamic-dispatched, compiled Ruby code 
in JRuby with all optimzations turned on, so we've pretty much turned 
execution and dispatch performance up as high as they can go without 
unboxing fixnum itself (which we will not do, given the monumental task 
of supporting additional primitive call paths).

On a side note: any recommendations for optimizing fixnums on current 
JVMs? A few immediate ideas come to mind: fixnum cache (which we already 
have for -128..127), heavier memory ratios toward "newer" generations, 
and in general trying to reduce fixnum churn where possible (literal 
caches, static analysis to find unneded constructions...). Comments on 
these and others?

- Charlie


From per at bothner.com  Sun Jun 15 22:48:02 2008
From: per at bothner.com (Per Bothner)
Date: Sun, 15 Jun 2008 22:48:02 -0700
Subject: Fixnums: last mile for JRuby math perf
In-Reply-To: <48546444.1010807@sun.com>
References: <48546444.1010807@sun.com>
Message-ID: <4855FE92.8040806@bothner.com>

Charles Oliver Nutter wrote:
> On a side note: any recommendations for optimizing fixnums on current 
> JVMs?

"Optimizing fixnums" is of course a number of different problems.
One is "optimizing generic arithmetic" on unknown types - most
of which will be fixnums.

Kawa does pretty well on optimizing arbitrary-precision integers.
It wins over java.math.BigInteger by using just two fields, one
which is an int and on is an int[].  The latter is only non-null
when the value of the integer doesn't fit in the bounds of int.
I think this would be a worthwhile optimization for BigInteger.
The various operations "fast-path" the common case when the values
fit into an int (and thus the int[] is null).

Kawa also does preallocate the number -100 .. 1024.  I haven't
done any measurements to see if this is a good range to
pre-allocate - that might be useful to get some numbers on.
-- 
	--Per Bothner
per at bothner.com   http://per.bothner.com/


From charles.nutter at sun.com  Mon Jun 16 14:25:43 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Mon, 16 Jun 2008 16:25:43 -0500
Subject: Fixnums: last mile for JRuby math perf
In-Reply-To: <4855FE92.8040806@bothner.com>
References: <48546444.1010807@sun.com> <4855FE92.8040806@bothner.com>
Message-ID: <4856DA57.8050306@sun.com>

Per Bothner wrote:
> Charles Oliver Nutter wrote:
>> On a side note: any recommendations for optimizing fixnums on current 
>> JVMs?
> 
> "Optimizing fixnums" is of course a number of different problems.
> One is "optimizing generic arithmetic" on unknown types - most
> of which will be fixnums.

Ok, be pedantic :) I meant optimizing arithmetic on Fixnum reference 
types. But on a larger scale, I think the whole problem is worth 
discussion and resolution; we have benchmarks now that appear to be 
limited by the cost of dealing with all those Fixnum objects, rather 
than the math itself.

> Kawa does pretty well on optimizing arbitrary-precision integers.
> It wins over java.math.BigInteger by using just two fields, one
> which is an int and on is an int[].  The latter is only non-null
> when the value of the integer doesn't fit in the bounds of int.
> I think this would be a worthwhile optimization for BigInteger.
> The various operations "fast-path" the common case when the values
> fit into an int (and thus the int[] is null).

FWIW, I'm not particularly interested in the performance of 
arbitrary-precision math...for the most part I can fake it by swapping 
out the backing store or mimicking "better" impls in our current Bignum.

> Kawa also does preallocate the number -100 .. 1024.  I haven't
> done any measurements to see if this is a good range to
> pre-allocate - that might be useful to get some numbers on.

We do -127 to 128. I've bumped the range larger and it seems to have 
only a small impact on performance at the cost of a larger startup time 
to initialize the values. The current size is based on java.lang.Integer 
from OpenJDK, and I would expect quickly diminishing returns for 
anything outside that range.

- Charlie


From per at bothner.com  Mon Jun 16 17:03:26 2008
From: per at bothner.com (Per Bothner)
Date: Mon, 16 Jun 2008 17:03:26 -0700
Subject: Fixnums: last mile for JRuby math perf
In-Reply-To: <4856DA57.8050306@sun.com>
References: <48546444.1010807@sun.com> <4855FE92.8040806@bothner.com>
	<4856DA57.8050306@sun.com>
Message-ID: <4856FF4E.2020905@bothner.com>

Charles Oliver Nutter wrote:
> Per Bothner wrote:
>> Charles Oliver Nutter wrote:
>>> On a side note: any recommendations for optimizing fixnums on current 
>>> JVMs?
>> "Optimizing fixnums" is of course a number of different problems.
>> One is "optimizing generic arithmetic" on unknown types - most
>> of which will be fixnums.
> 
> Ok, be pedantic :) I meant optimizing arithmetic on Fixnum reference 
> types.

Not to be pointlessly pedantic, but presumably you don't know at
compile time that you're working with fixnums - otherwise one might
as well use primitive JVM ints.  So part of the cost is checking
at runtime that the operands are fixnums.  Is your assumption
this will be taken care of using invokedynamic?

> FWIW, I'm not particularly interested in the performance of 
> arbitrary-precision math...for the most part I can fake it by swapping 
> out the backing store or mimicking "better" impls in our current Bignum.

Right, but the mention of MagLev/GemStone/SmallTalk did make it
sound like you do support arbitrary-precision math - though you rightly
aren't focused on its performance.  In any case, if one *does* support
runtime-dispatched arithmetic then the extra cost of supporting
arbitrary-precision integers should be small - as long as you can
optimize for the fixnum case - which java.math.BigInteger doesn't
do very well.

A problem that annoys me is that if you use java.math.BigInteger
for all integers then that is expensive.  OTOH if you use BigInteger
only for bignums and use java.lang.Integer for fixnums then the
mapping of the type 'integer' is a union type, which neither Java
nor the JVM support.  This doesn't matter if everything is dynamic,
but if your language has possibly-optional type specifications
and/or type inference then the best you can do is java.lang.Number.
Which sucks for anyone who likes static typing.  The Kawa solution
is to use its own gnu.math.IntNum class.  (Originally, I didn't
have much choice - the IntNum class actually predates the addition
of java.math.BigInteger in JDK 1.1 ...)
-- 
	--Per Bothner
per at bothner.com   http://per.bothner.com/


From benh at ibsglobalweb.com  Mon Jun 16 21:42:52 2008
From: benh at ibsglobalweb.com (Ben Hutchison)
Date: Tue, 17 Jun 2008 14:42:52 +1000
Subject: Value Types for Java
Message-ID: <485740CC.7090203@ibsglobalweb.com>

I would really like to see support for value types (aka "structs") added 
to a JVM. My desire is for a more efficient way to store and process 
bulk quantities of small, simple objects.

Its an issue with a very low profile in the Java community, it seems. 
I'd be interested in hearing from others who want, or better yet, are 
working towards, this goal.

Ive written up a post outlining my case for value type support:
http://benhutchison.wordpress.com/2008/06/15/the-jvm-needs-value-types/

My motivation for here is the belief that the Da Vinci project remains 
the least unlikely pathway by which value type support will be 
implemented, given that reviews of the JVM & bytecode specs don't 
exactly happen all the time.

-Ben

-- 

	
*Ben Hutchison
Senior Developer
* Level 2 476 St Kilda Road Melbourne VIC 3004
T 613 8807 5252 | F 613 8807 5203 | M 0423 879 534 | 
www.ibsglobalweb.com <http://www.ibsglobalweb.com/>

This e-mail (and any attachments to this e-mail) is for the exclusive 
use of the person, firm or corporation to which it is addressed and may 
contain information that by law is privileged, confidential or protected 
by copyright. If you are not the intended recipient or the person 
responsible for delivering this e-mail to the intended recipient, you 
are notified that any use, disclosure, distribution, printing or copying 
of this e-mail transmission is prohibited by law and that the contents 
must be kept strictly confidential. If you have received this e-mail in 
error, kindly notify us immediately on + 613 8807 0168 or respond to the 
sender by return e-mail. The original transmission of this e-mail must 
be destroyed.

Internet Business Systems Australia Pty Ltd accepts no responsibility 
for any viruses this e-mail may contain. This notice should not be removed.


From miles at milessabin.com  Tue Jun 17 00:20:11 2008
From: miles at milessabin.com (Miles Sabin)
Date: Tue, 17 Jun 2008 08:20:11 +0100
Subject: Value Types for Java
In-Reply-To: <485740CC.7090203@ibsglobalweb.com>
References: <485740CC.7090203@ibsglobalweb.com>
Message-ID: <30961e500806170020i19c5547at7a24510957e9c4a9@mail.gmail.com>

On Tue, Jun 17, 2008 at 5:42 AM, Ben Hutchison <benh at ibsglobalweb.com> wrote:
> Ive written up a post outlining my case for value type support:
> http://benhutchison.wordpress.com/2008/06/15/the-jvm-needs-value-types/

Surely, in that very article you describe the exact encoding that's
needed to get almost all of the benefits of value types that you list
without any JVM changes: inline value type fields into the enclosing
object, stack frame or argument list.

Arrays are a little less natural, but one array per field would seem
likely to do most of the job on the assumption that most value types
have few fields (so, eg. an array of (String, int) would be encoded as
an array of string and an array of Int). In cases where the fields are
all or partly of a homogeneous type the arrays of their encodings
could be fused (eg. and array of n complex doubles could be encoded as
an array of 2n doubles).

Obviously the language has to hide this from the programmer, but that
seems largely trivial. Interestingly, it seems quite possible that a
smart compiler could go a long way with types which are not explicitly
flagged as value types by inferring that particular objects could be
inlined in some situations (there's a small literature out there on
"object inlining").

So doesn't this get you what you want? What am I missing?

Cheers,


Miles


From benh at ibsglobalweb.com  Tue Jun 17 22:15:54 2008
From: benh at ibsglobalweb.com (Ben Hutchison)
Date: Wed, 18 Jun 2008 15:15:54 +1000
Subject: Value Types for Java
In-Reply-To: <30961e500806170020i19c5547at7a24510957e9c4a9@mail.gmail.com>
References: <485740CC.7090203@ibsglobalweb.com>
	<30961e500806170020i19c5547at7a24510957e9c4a9@mail.gmail.com>
Message-ID: <48589A0A.8090605@ibsglobalweb.com>

Miles Sabin wrote:
> On Tue, Jun 17, 2008 at 5:42 AM, Ben Hutchison <benh at ibsglobalweb.com> wrote:
>   
>> Ive written up a post outlining my case for value type support:
>> http://benhutchison.wordpress.com/2008/06/15/the-jvm-needs-value-types/
>>     
>
> Surely, in that very article you describe the exact encoding that's
> needed to get almost all of the benefits of value types that you list
> without any JVM changes: inline value type fields into the enclosing
> object, stack frame or argument list.
>
> Arrays are a little less natural, but one array per field would seem
> likely to do most of the job on the assumption that most value types
> have few fields (so, eg. an array of (String, int) would be encoded as
> an array of string and an array of Int). [snip]
>
> So doesn't this get you what you want? What am I missing?
>   
It seems to do 90% of what I want. Its definitely an option I'd consider 
as last resort. It's just really awkward.

The awkwardness is that the programmers view has to be so divergent from 
the underlying representation, particularly in the difficult-but-crucial 
array case

Where the discontinuity is revealed, the effect on the programmer could 
be jarring:

1. Reflection APIs. What would one see in reflection? How could one 
reflectively invoke a method on an array-contained value type, where the 
actual fields are smeared across a series of arrays?

2. Debugging. A typical Java debugger would expose the underlying 
"mangled" array form, which looks horrible in the case of nested arrays 
of value types. True debugger support would need to perform the reverse 
translation from inlined representation back to a value type based 
representation.

My gut feeling is that the awkwardness of the representation would limit 
the widespread adoption of such a scheme.

Also, in terms of what it doesn't do:

1. Hard to see any way to extend to interpret a ByteBuffer region as an 
array of some value type, a feature ultimately needed for bulk interop 
with external IO/processes, I feel.
2. Separate arrays per field is less cache friendly than true value types

-Ben
-- 

	
*Ben Hutchison
Senior Developer
* Level 2 476 St Kilda Road Melbourne VIC 3004
T 613 8807 5252 | F 613 8807 5203 | M 0423 879 534 | 
www.ibsglobalweb.com <http://www.ibsglobalweb.com/>

This e-mail (and any attachments to this e-mail) is for the exclusive 
use of the person, firm or corporation to which it is addressed and may 
contain information that by law is privileged, confidential or protected 
by copyright. If you are not the intended recipient or the person 
responsible for delivering this e-mail to the intended recipient, you 
are notified that any use, disclosure, distribution, printing or copying 
of this e-mail transmission is prohibited by law and that the contents 
must be kept strictly confidential. If you have received this e-mail in 
error, kindly notify us immediately on + 613 8807 0168 or respond to the 
sender by return e-mail. The original transmission of this e-mail must 
be destroyed.

Internet Business Systems Australia Pty Ltd accepts no responsibility 
for any viruses this e-mail may contain. This notice should not be removed.


From miles at milessabin.com  Wed Jun 18 00:51:08 2008
From: miles at milessabin.com (Miles Sabin)
Date: Wed, 18 Jun 2008 08:51:08 +0100
Subject: Value Types for Java
In-Reply-To: <48589A0A.8090605@ibsglobalweb.com>
References: <485740CC.7090203@ibsglobalweb.com>
	<30961e500806170020i19c5547at7a24510957e9c4a9@mail.gmail.com>
	<48589A0A.8090605@ibsglobalweb.com>
Message-ID: <30961e500806180051g482d19edr4e074a723873275c@mail.gmail.com>

On Wed, Jun 18, 2008 at 6:15 AM, Ben Hutchison <benh at ibsglobalweb.com> wrote:
> Also, in terms of what it doesn't do:
>
> 1. Hard to see any way to extend to interpret a ByteBuffer region as an
> array of some value type, a feature ultimately needed for bulk interop
> with external IO/processes, I feel.

What do you mean here? That a ByteBuffer be viewable as an array of
(value) objects? Again, I'm really not sure what it is that you want
to do that you can't already.

> 2. Separate arrays per field is less cache friendly than true value types

That might be the case, but it's very dependent on the actual access
patterns in a given application.

Cheers,


Miles


From benh at ibsglobalweb.com  Wed Jun 18 01:28:25 2008
From: benh at ibsglobalweb.com (Ben Hutchison)
Date: Wed, 18 Jun 2008 18:28:25 +1000
Subject: Value Types for Java
In-Reply-To: <30961e500806180051g482d19edr4e074a723873275c@mail.gmail.com>
References: <485740CC.7090203@ibsglobalweb.com>	<30961e500806170020i19c5547at7a24510957e9c4a9@mail.gmail.com>	<48589A0A.8090605@ibsglobalweb.com>
	<30961e500806180051g482d19edr4e074a723873275c@mail.gmail.com>
Message-ID: <4858C729.1020002@ibsglobalweb.com>

Miles Sabin wrote:
> On Wed, Jun 18, 2008 at 6:15 AM, Ben Hutchison <benh at ibsglobalweb.com> wrote:
>   
>> Also, in terms of what it doesn't do:
>>
>> 1. Hard to see any way to extend to interpret a ByteBuffer region as an
>> array of some value type, a feature ultimately needed for bulk interop
>> with external IO/processes, I feel.
>>     
> What do you mean here? That a ByteBuffer be viewable as an array of
> (value) objects? 
Yes. Viewable and updatable.

 The example case I have in mind is an interaction between a Java 
process and a 3D graphics driver. Eg a Java process would be 
writing/updating triangle data in the buffer, and the graphics driver 
rendering it. (I admit it might be difficult to get this to work even 
with value type support, given that Java's philosophy says precise 
memory level representation is undefined by spec)

> Again, I'm really not sure what it is that you want
> to do that you can't already.
>   
Your earlier idea seemed to be that heterogeneous value types would 
translate into multiple side-by-side arrays accessed in unison. Most 
external processes you might want to interact with aren't going to 
"think" that way. They would typically expect to find fields of a 
related struct next to each other.

(Granted you did mention a  single array encoding for homogenous types, 
but I am concerned thats too fragile in practice - the moment it becomes 
heterogenous, the encoding scheme totally changes.)

So, you could have one encoding of a given value type to/from 
ByteBuffer, and another otherwise, but that seems messy.

You surely agree that your proposal represents a (worthy) workaround to 
work in current JVM architectures, and not the ideal solution?

-Ben

-- 

	
*Ben Hutchison
Senior Developer
* Level 2 476 St Kilda Road Melbourne VIC 3004
T 613 8807 5252 | F 613 8807 5203 | M 0423 879 534 | 
www.ibsglobalweb.com <http://www.ibsglobalweb.com/>

This e-mail (and any attachments to this e-mail) is for the exclusive 
use of the person, firm or corporation to which it is addressed and may 
contain information that by law is privileged, confidential or protected 
by copyright. If you are not the intended recipient or the person 
responsible for delivering this e-mail to the intended recipient, you 
are notified that any use, disclosure, distribution, printing or copying 
of this e-mail transmission is prohibited by law and that the contents 
must be kept strictly confidential. If you have received this e-mail in 
error, kindly notify us immediately on + 613 8807 0168 or respond to the 
sender by return e-mail. The original transmission of this e-mail must 
be destroyed.

Internet Business Systems Australia Pty Ltd accepts no responsibility 
for any viruses this e-mail may contain. This notice should not be removed.


From Kenneth.Russell at Sun.COM  Wed Jun 18 09:08:37 2008
From: Kenneth.Russell at Sun.COM (Kenneth Russell)
Date: Wed, 18 Jun 2008 09:08:37 -0700
Subject: Value Types for Java
In-Reply-To: <4858C729.1020002@ibsglobalweb.com>
References: <485740CC.7090203@ibsglobalweb.com>
	<30961e500806170020i19c5547at7a24510957e9c4a9@mail.gmail.com>
	<48589A0A.8090605@ibsglobalweb.com>
	<30961e500806180051g482d19edr4e074a723873275c@mail.gmail.com>
	<4858C729.1020002@ibsglobalweb.com>
Message-ID: <48593305.4090602@sun.com>

It's possible to do what you want with the existing NIO Buffer classes 
and a helper tool. GlueGen (http://gluegen.dev.java.net/) generates Java 
classes to access C data structures. These accessor classes wrap 
ByteBuffers and provide setter / getter methods for accessing the fields 
of the structs. It maintains parallel ByteBuffer, IntBuffer, etc. views 
of the same memory to avoid multiple memory fetches for multi-byte 
primitive values. It supports arrays of C structures by providing a Java 
array of objects, each of which refers to a slice of the underlying C 
storage.

It's not efficient. The need to dereference through the NIO Buffers 
means that each memory fetch gets expanded into something like three or 
four, assuming the HotSpot compiler can't common up some of those 
fetches for multiple accesses to the same object.

When talking to the graphics card via an API like OpenGL there is less 
need to view the data in the form of structures. 
http://jogl.dev.java.net/ and http://jogl-demos.dev.java.net/ have links 
to various demos and JavaOne presentations. For several years we have 
consistently been able to achieve at least 90% of C speed in 3D 
graphical applications written in Java.

Still, value types would be very useful for exactly the cases you 
describe. I would be concerned about the semantic change of 
pass-by-value and what that would mean for the easy readability and 
understanding of Java source code.

-Ken

Ben Hutchison wrote:
> Miles Sabin wrote:
>> On Wed, Jun 18, 2008 at 6:15 AM, Ben Hutchison <benh at ibsglobalweb.com> wrote:
>>   
>>> Also, in terms of what it doesn't do:
>>>
>>> 1. Hard to see any way to extend to interpret a ByteBuffer region as an
>>> array of some value type, a feature ultimately needed for bulk interop
>>> with external IO/processes, I feel.
>>>     
>> What do you mean here? That a ByteBuffer be viewable as an array of
>> (value) objects? 
> Yes. Viewable and updatable.
> 
>  The example case I have in mind is an interaction between a Java 
> process and a 3D graphics driver. Eg a Java process would be 
> writing/updating triangle data in the buffer, and the graphics driver 
> rendering it. (I admit it might be difficult to get this to work even 
> with value type support, given that Java's philosophy says precise 
> memory level representation is undefined by spec)
> 
>> Again, I'm really not sure what it is that you want
>> to do that you can't already.
>>   
> Your earlier idea seemed to be that heterogeneous value types would 
> translate into multiple side-by-side arrays accessed in unison. Most 
> external processes you might want to interact with aren't going to 
> "think" that way. They would typically expect to find fields of a 
> related struct next to each other.
> 
> (Granted you did mention a  single array encoding for homogenous types, 
> but I am concerned thats too fragile in practice - the moment it becomes 
> heterogenous, the encoding scheme totally changes.)
> 
> So, you could have one encoding of a given value type to/from 
> ByteBuffer, and another otherwise, but that seems messy.
> 
> You surely agree that your proposal represents a (worthy) workaround to 
> work in current JVM architectures, and not the ideal solution?
> 
> -Ben
> 


From miles at milessabin.com  Thu Jun 19 02:28:41 2008
From: miles at milessabin.com (Miles Sabin)
Date: Thu, 19 Jun 2008 10:28:41 +0100
Subject: Value Types for Java
In-Reply-To: <4858C729.1020002@ibsglobalweb.com>
References: <485740CC.7090203@ibsglobalweb.com>
	<30961e500806170020i19c5547at7a24510957e9c4a9@mail.gmail.com>
	<48589A0A.8090605@ibsglobalweb.com>
	<30961e500806180051g482d19edr4e074a723873275c@mail.gmail.com>
	<4858C729.1020002@ibsglobalweb.com>
Message-ID: <30961e500806190228o7baf642fjbf3b7d9f580837de@mail.gmail.com>

On Wed, Jun 18, 2008 at 9:28 AM, Ben Hutchison <benh at ibsglobalweb.com> wrote:
> The example case I have in mind is an interaction between a Java
> process and a 3D graphics driver. Eg a Java process would be
> writing/updating triangle data in the buffer, and the graphics driver
> rendering it. (I admit it might be difficult to get this to work even
> with value type support, given that Java's philosophy says precise
> memory level representation is undefined by spec)

Right. This strikes me as almost completely orthogonal to the value
type issue. What you really want is safe and efficient access to raw
memory and devices. I think it would make a lot of sense to think
about that independently.

Cheers,


Miles


From charles.nutter at sun.com  Sat Jun 21 00:31:38 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Sat, 21 Jun 2008 16:31:38 +0900
Subject: A sudden concern about invokedynamic
Message-ID: <485CAE5A.9070709@sun.com>

It occurred to me just now there could be a small snag with 
invokedynamic. Maybe it's addressed in the spec, but I don't have it 
here right now. If it's not a problem, a 5-second explanation would do. 
Otherwise, we might want to discuss here, since it's certainly 
multi-language related.

How would invokedynamic work if we don't have compiled Java bytecode 
anywhere to dispatch to? For example, JRuby is mixed mode right now, and 
some methods are interpreted (AST or various bytecode specs) while some 
methods get compiled to JVM bytecode on-the-fly. For the compiled ones, 
invokedynamic would just get a normal method handle. What about for the 
interpreted ones? Essentially we want to be able to pass out a method 
handle that's actually just a call into the JRuby interpreter.

I don't recall seeing anything in the docs that might show how to do 
that. Did I miss something?

- Charlie


From charles.nutter at sun.com  Sat Jun 21 02:41:19 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Sat, 21 Jun 2008 18:41:19 +0900
Subject: Time to reconsider m:n or green threading options?
Message-ID: <485CCCBF.1040409@sun.com>

Seems to me lately all the super-scaling languages and runtimes getting 
attention are based on m:n or green threads across multiple processes. 
Given that kernel threads obviously can't scale up to the tens of 
thousands of concurrent processes e.g. Erlang can handle, is it possibly 
a good time to consider adding an m:n threading model back into JVM?

Forgive my ignorance about the history of threading in JVM...what I 
remember is that green threads used to be the only threading model, at 
least on platforms I used. I thought I'd heard that it was m:n at some 
point as well. And I thought I'd heard that Solaris was actually m:n 
internally at some point too. Too much noise floating around in this brain.

Thoughts? Here's another framework trying to solve the threading issue 
with bytecode postprocessing, similar to how Rife already supports 
continuations. Seems like there's definitely demand for this, eh?

http://www.malhar.net/sriram/kilim/

- Charlie


From pdoubleya at gmail.com  Sat Jun 21 03:18:36 2008
From: pdoubleya at gmail.com (Patrick Wright)
Date: Sat, 21 Jun 2008 12:18:36 +0200
Subject: Time to reconsider m:n or green threading options?
In-Reply-To: <485CCCBF.1040409@sun.com>
References: <485CCCBF.1040409@sun.com>
Message-ID: <64efa1ba0806210318m63546c5ambee9118b5db22133@mail.gmail.com>

Looks like the original threading model in the Sun JVM was 1:n threading

http://en.wikipedia.org/wiki/Green_threads

Most of what I found on a quick Google seemed to confirm that article.
Apparently "green threads", as implemented earlier in the Sun JVMs,
had one OS thread, running the VM, which would then manage multiple
green threads. There are apparently different mechanisms for deciding
when to switch between the green threads. Big drawback appears to have
been a) blocking I/O could cause the whole VM to wait and b) there's
only 1 OS thread underneath, meaning no use of multiple processors.

Apparently, the Blackdown JVM used to allow for green or native
threading selection.

I didn't find any references to m:n threading regarding green threads,
though. I suspect the terminology will get in the way in this
discussion.

Actors, as implemented in Scala, allow many Actor instances to be
scheduled from one thread pool, which means the number of Actors can
be many many times the number of threads in use; but that is
implemented as a library. Kilim has impressive benchmarks but is using
some annotations and some complex bytecode manipulation to works its
magic. Both of these work with the current JVM architecture.

What would you want the JVM to provide that it doesn't provide currently?


Patrick


From charles.nutter at sun.com  Sat Jun 21 04:58:26 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Sat, 21 Jun 2008 20:58:26 +0900
Subject: Time to reconsider m:n or green threading options?
In-Reply-To: <64efa1ba0806210318m63546c5ambee9118b5db22133@mail.gmail.com>
References: <485CCCBF.1040409@sun.com>
	<64efa1ba0806210318m63546c5ambee9118b5db22133@mail.gmail.com>
Message-ID: <485CECE2.1070001@sun.com>

Patrick Wright wrote:
> Looks like the original threading model in the Sun JVM was 1:n threading
> 
> http://en.wikipedia.org/wiki/Green_threads

That fits my memory as well.

> Most of what I found on a quick Google seemed to confirm that article.
> Apparently "green threads", as implemented earlier in the Sun JVMs,
> had one OS thread, running the VM, which would then manage multiple
> green threads. There are apparently different mechanisms for deciding
> when to switch between the green threads. Big drawback appears to have
> been a) blocking I/O could cause the whole VM to wait and b) there's
> only 1 OS thread underneath, meaning no use of multiple processors.
> 
> Apparently, the Blackdown JVM used to allow for green or native
> threading selection.
> 
> I didn't find any references to m:n threading regarding green threads,
> though. I suspect the terminology will get in the way in this
> discussion.

Yes, my memory is that Blackdown had the green/native split and on some 
platforms (freebsd for example) there was only green.

> Actors, as implemented in Scala, allow many Actor instances to be
> scheduled from one thread pool, which means the number of Actors can
> be many many times the number of threads in use; but that is
> implemented as a library. Kilim has impressive benchmarks but is using
> some annotations and some complex bytecode manipulation to works its
> magic. Both of these work with the current JVM architecture.

They work but they're limited in scope. If you call into a library that 
has not been manipulated, you either need to manually unroll stack, push 
that call off to another thread, or something else. Same goes for Rife. 
That sort of bytecode manipulation is very clever (i.e. very very 
clever, the kind of clever I really like), but it's not a general enough 
solution.

> What would you want the JVM to provide that it doesn't provide currently?

I would want the JVM to provide the current thread APIs but potentially 
backed by a smaller number of native threads. I want to be able to spin 
up 10k or 100k java.lang.Thread instances :)

- Charlie


From charles.nutter at sun.com  Sat Jun 21 15:48:46 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Sun, 22 Jun 2008 07:48:46 +0900
Subject: Longjumps considered inexpensive...until they aren't
In-Reply-To: <483D2213.2010307@sun.com>
References: <483D2213.2010307@sun.com>
Message-ID: <485D854E.1090908@sun.com>

Charles Oliver Nutter wrote:
> This is a longer post, but very important for JRuby.
> 
> In John Rose's post on using flow-control exceptions for e.g. nonlocal 
> returns, he showed that when the throw and catch are close enough 
> together (i.e. same JIT compilation unit) HotSpot can turn them into 
> jumps, making them run very fast. This seems to be borne out by a simple 
> case in JRuby, a return occurring inside Ruby exception handling:
> 
> def foo; begin; return 1; ensure; end; end
> 
> In order to preserve the stack, JRuby's compiler generates a synthetic 
> method any time it needs to do inline exception handling, such as for 
> begin/rescue/ensure blocks as above. In order to have returns from 
> within those synthetic methods propagate all the way back out through 
> the parent method, a ReturnJump exception is generated. Here's numbers 
> for without the begin/ensure and with it:

BTW, I did find a problem unrelated to Hotspot optimization that 
improved performance substantially, and I describe it here for posterity.

The original call site caching logic in JRuby was structured roughly 
like this:

- lookup method
- invoke method
- if all goes well, cache method reference

The idea was that if the method fails exceptionally, we don't want to 
cache it. However that completely ignored the fact that non-local flow 
control was implemented as an exception. So when a non-local return 
bubbled back out through the call sites, it basically caused them to 
skip caching logic. This meant that any methods between a non-local 
return throw and cache would never cache.

Fixing this (by always caching the method first) brought performance to 
a much more reasonable level. Still not as fast as a "soft return" 
falling through the stack, but it was a real "d'oh" moment when I found 
it. Something to consider for anyone else supporting non-local flow 
control in a call sequence with specific sequencing requirements.

- Charlie


From John.Rose at Sun.COM  Mon Jun 23 12:40:46 2008
From: John.Rose at Sun.COM (John Rose)
Date: Mon, 23 Jun 2008 12:40:46 -0700
Subject: A sudden concern about invokedynamic
In-Reply-To: <485CAE5A.9070709@sun.com>
References: <485CAE5A.9070709@sun.com>
Message-ID: <D567DC4D-7C4C-40DD-8833-A92FAD8126E2@sun.com>

On Jun 21, 2008, at 12:31 AM, Charles Oliver Nutter wrote:

> How would invokedynamic work if we don't have compiled Java bytecode
> anywhere to dispatch to? For example, JRuby is mixed mode right  
> now, and
> some methods are interpreted (AST or various bytecode specs) while  
> some
> methods get compiled to JVM bytecode on-the-fly. For the compiled  
> ones,
> invokedynamic would just get a normal method handle. What about for  
> the
> interpreted ones? Essentially we want to be able to pass out a method
> handle that's actually just a call into the JRuby interpreter.

You probably want to use MethodHandles.insertArgument to introduce a  
hidden receiver argument, which represents the target method as known  
to the interpreter.  You probably also want to use  
MethodHandles.collectArguments to regularize the calling sequence  
which the interpreted method gets.

> I don't recall seeing anything in the docs that might show how to do
> that. Did I miss something?

All those sorts of calling sequence adjustments must be composable  
from method handle adapters, created by the MethodHandles methods  
mentioned in the EDR spec.  The real question is, did the EDR miss  
anything?

-- John


From John.Rose at Sun.COM  Mon Jun 23 12:42:38 2008
From: John.Rose at Sun.COM (John Rose)
Date: Mon, 23 Jun 2008 12:42:38 -0700
Subject: Longjumps considered inexpensive...until they aren't
In-Reply-To: <485D854E.1090908@sun.com>
References: <483D2213.2010307@sun.com> <485D854E.1090908@sun.com>
Message-ID: <59B8186D-0CA6-43BB-9013-CBCCCFC96A92@sun.com>

On Jun 21, 2008, at 3:48 PM, Charles Oliver Nutter wrote:

> - lookup method
> - invoke method
> - if all goes well, cache method reference

Caching after first return could also hurt recursive algorithms  
(e.g., tak).

-- John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20080623/d11051e3/attachment.html 

From John.Rose at Sun.COM  Mon Jun 23 14:11:40 2008
From: John.Rose at Sun.COM (John Rose)
Date: Mon, 23 Jun 2008 14:11:40 -0700
Subject: Time to reconsider m:n or green threading options?
In-Reply-To: <64efa1ba0806210318m63546c5ambee9118b5db22133@mail.gmail.com>
References: <485CCCBF.1040409@sun.com>
	<64efa1ba0806210318m63546c5ambee9118b5db22133@mail.gmail.com>
Message-ID: <E21F0758-0770-427B-8963-9B52F8A7EE46@sun.com>

On Jun 21, 2008, at 3:18 AM, Patrick Wright wrote:

> Big drawback appears to have
> been a) blocking I/O could cause the whole VM to wait and b) there's
> only 1 OS thread underneath, meaning no use of multiple processors.

Yep, those are the reasons.  The transition away from green threads  
occupied a year or two of my early career on the JVM.  It was  
necessary and most unpleasant; some of the worst bugs I've ever seen.

Both kernel threads and green threads are designed around the idea  
that you get a rich virtual processor with all the OS trimmings and  
stack space for as many stack frames as you are ever likely to need.   
This is inherently expensive to instantiate.  You want the JVM to be  
able to pass control between unrelated actors without completely  
switching out the whole register file, etc.

It would be great if we had really lightweight continuations, with a  
JVM scheduler (Scheme calls them engines, I think) which keeps  
running the next one.  The part I can't see yet is how to make stack- 
based and heap-based activation records play together efficiently.   
(Maybe you JIT two versions of every method, with inlining to remove  
overheads as usual?)

-- John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20080623/badf3529/attachment.html 

From John.Rose at Sun.COM  Mon Jun 23 14:15:50 2008
From: John.Rose at Sun.COM (John Rose)
Date: Mon, 23 Jun 2008 14:15:50 -0700
Subject: Time to reconsider m:n or green threading options?
In-Reply-To: <485CECE2.1070001@sun.com>
References: <485CCCBF.1040409@sun.com>
	<64efa1ba0806210318m63546c5ambee9118b5db22133@mail.gmail.com>
	<485CECE2.1070001@sun.com>
Message-ID: <14D7B6EE-E52E-46A9-94FA-EC0BFE4F6E7C@sun.com>

On Jun 21, 2008, at 4:58 AM, Charles Oliver Nutter wrote:

> I would want the JVM to provide the current thread APIs but  
> potentially
> backed by a smaller number of native threads. I want to be able to  
> spin
> up 10k or 100k java.lang.Thread instances :)

It's a hard puzzle (but maybe a good one) to make the current JVM  
thread API fit on top of really lightweight threads.  Do you think  
green threads are really light enough?  I suspect not, though maybe  
you could do 100k.  But when you get to event clouds with 1M or 10M  
actors, green threads (which inherently seem to occupy many kilobytes  
of memory and address space) will start to look sluggish.

-- John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20080623/7a3b0ba2/attachment.html 

From charles.nutter at sun.com  Mon Jun 23 18:58:50 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Tue, 24 Jun 2008 10:58:50 +0900
Subject: Longjumps considered inexpensive...until they aren't
In-Reply-To: <59B8186D-0CA6-43BB-9013-CBCCCFC96A92@sun.com>
References: <483D2213.2010307@sun.com> <485D854E.1090908@sun.com>
	<59B8186D-0CA6-43BB-9013-CBCCCFC96A92@sun.com>
Message-ID: <486054DA.3080707@sun.com>

John Rose wrote:
> Caching after first return could also hurt recursive algorithms (e.g., tak).

Yes, I realized that as well. But I've got this nagging feeling that I 
did it this way for a reason. Perhaps comments do serve a purpose. At 
any rate, if there's a reason, I'm sure it will come up again.

- Charlie


From benh at ibsglobalweb.com  Wed Jun 25 00:48:22 2008
From: benh at ibsglobalweb.com (Ben Hutchison)
Date: Wed, 25 Jun 2008 17:48:22 +1000
Subject: Value Types for Java
In-Reply-To: <30961e500806190228o7baf642fjbf3b7d9f580837de@mail.gmail.com>
References: <485740CC.7090203@ibsglobalweb.com>	<30961e500806170020i19c5547at7a24510957e9c4a9@mail.gmail.com>	<48589A0A.8090605@ibsglobalweb.com>	<30961e500806180051g482d19edr4e074a723873275c@mail.gmail.com>	<4858C729.1020002@ibsglobalweb.com>
	<30961e500806190228o7baf642fjbf3b7d9f580837de@mail.gmail.com>
Message-ID: <4861F846.7050307@ibsglobalweb.com>

Miles Sabin wrote:
> On Wed, Jun 18, 2008 at 9:28 AM, Ben Hutchison <benh at ibsglobalweb.com> wrote:
>   
>> The example case I have in mind is an interaction between a Java
>> process and a 3D graphics driver. Eg a Java process would be
>> writing/updating triangle data in the buffer, and the graphics driver
>> rendering it. (I admit it might be difficult to get this to work even
>> with value type support, given that Java's philosophy says precise
>> memory level representation is undefined by spec)
>>     
>
> Right. This strikes me as almost completely orthogonal to the value
> type issue. What you really want is safe and efficient access to raw
> memory and devices. I think it would make a lot of sense to think
> about that independently.
>   
It doesn't seem orthogonal to me. I'm unsure precisely what you would 
consider "safe and efficient", but the existing ByteBuffer provides a 
non-relocatable memory block that you can access efficiently, if you are 
happy to treat it as int[] or byte[].

I hope that value types might enable a memory block to be viewed in a 
more object-oriented fashion, as an array of value-type /objects/; eg 
(going back to my earlier example) as Triangle[].Complete with named 
fields, nested structure, and methods.

Note also, I see value types as being predominantly useful intra-JVM. 
They are not simply a method for IO with external processes, yet that is 
an area where they might add convenience.

(Apologies for delayed reply - have been away from work for several days)

Regards
Ben

-- 

	
*Ben Hutchison
Senior Developer
* Level 2 476 St Kilda Road Melbourne VIC 3004
T 613 8807 5252 | F 613 8807 5203 | M 0423 879 534 | 
www.ibsglobalweb.com <http://www.ibsglobalweb.com/>

This e-mail (and any attachments to this e-mail) is for the exclusive 
use of the person, firm or corporation to which it is addressed and may 
contain information that by law is privileged, confidential or protected 
by copyright. If you are not the intended recipient or the person 
responsible for delivering this e-mail to the intended recipient, you 
are notified that any use, disclosure, distribution, printing or copying 
of this e-mail transmission is prohibited by law and that the contents 
must be kept strictly confidential. If you have received this e-mail in 
error, kindly notify us immediately on + 613 8807 0168 or respond to the 
sender by return e-mail. The original transmission of this e-mail must 
be destroyed.

Internet Business Systems Australia Pty Ltd accepts no responsibility 
for any viruses this e-mail may contain. This notice should not be removed.


From benh at ibsglobalweb.com  Wed Jun 25 01:11:03 2008
From: benh at ibsglobalweb.com (Ben Hutchison)
Date: Wed, 25 Jun 2008 18:11:03 +1000
Subject: Value Types for Java
In-Reply-To: <48593305.4090602@sun.com>
References: <485740CC.7090203@ibsglobalweb.com>	<30961e500806170020i19c5547at7a24510957e9c4a9@mail.gmail.com>	<48589A0A.8090605@ibsglobalweb.com>	<30961e500806180051g482d19edr4e074a723873275c@mail.gmail.com>	<4858C729.1020002@ibsglobalweb.com>
	<48593305.4090602@sun.com>
Message-ID: <4861FD97.6040102@ibsglobalweb.com>

Kenneth Russell wrote:
>  I would be concerned about the semantic change of 
> pass-by-value and what that would mean for the easy readability and 
> understanding of Java source code
I would point to the CLR as a successful example here. Value types and 
references types mingle fairly painlessly in .NET.

Ive done two commercial projects in .NET and spent a fair amount of time 
reading blogs, forums and articles in that community. I haven't got the 
impression that structs are a common stumbling block or pain point for 
.NET developers: developers sometimes wish structs allowed inheritance; 
the ability for structs to take null value seems to be one significantly 
desired feature, thats now provided via a transparently boxed form (see 
"Nullable types").

Admittedly, that platform offered them from the outset (and even 
specification of field layout in memory) whereas Java has been 
reference-types-only for > 12 years, so it's a bigger cultural change. 
But given the proliferation of JVM languages whose needs drive this 
mailing list, it seems like the Java community is in the mood for change.

-Ben

-- 

	
*Ben Hutchison
Senior Developer
* Level 2 476 St Kilda Road Melbourne VIC 3004
T 613 8807 5252 | F 613 8807 5203 | M 0423 879 534 | 
www.ibsglobalweb.com <http://www.ibsglobalweb.com/>

This e-mail (and any attachments to this e-mail) is for the exclusive 
use of the person, firm or corporation to which it is addressed and may 
contain information that by law is privileged, confidential or protected 
by copyright. If you are not the intended recipient or the person 
responsible for delivering this e-mail to the intended recipient, you 
are notified that any use, disclosure, distribution, printing or copying 
of this e-mail transmission is prohibited by law and that the contents 
must be kept strictly confidential. If you have received this e-mail in 
error, kindly notify us immediately on + 613 8807 0168 or respond to the 
sender by return e-mail. The original transmission of this e-mail must 
be destroyed.

Internet Business Systems Australia Pty Ltd accepts no responsibility 
for any viruses this e-mail may contain. This notice should not be removed.


From jbaker at zyasoft.com  Wed Jun 25 12:36:35 2008
From: jbaker at zyasoft.com (Jim Baker)
Date: Wed, 25 Jun 2008 13:36:35 -0600
Subject: Time to reconsider m:n or green threading options?
Message-ID: <d03bb4010806251236r63ee12f9id2c8085c2e517858@mail.gmail.com>

[just joined the mailing list and I don't have the mail-fu to connect this
up to the original thread]

I'm quite confident that in Jython, we don't want such full-blown green
threads. One-shot continuations with bare-metal semantics that we can
schedule exactly the way we want would work for us. This would correspond
rather closely well to the successful greenlet model, or to Stackless, that
is being used to support the millions of actors/tasklets/greenlets seen in
slide.com or Eve Online.

- Jim

-- 
Jim Baker
jbaker at zyasoft.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20080625/ebcd4b80/attachment.html