Fwd: Fwd: Proposal for generics over primitives needs a rethink

martin odersky martin.odersky at epfl.ch
Mon Jan 5 17:48:53 UTC 2015

On Mon, Jan 5, 2015 at 12:06 PM, Remi Forax <forax at univ-mlv.fr> wrote:
> On 01/05/2015 02:54 AM, Gavin King wrote:
>> Sorry, Remi, I missed this email, it went straight to my spam folder
>> or something for some reason.
>> On Sat, Jan 3, 2015 at 5:10 PM, Remi Forax <forax at univ-mlv.fr> wrote:
>>> On 01/02/2015 07:13 PM, Gavin King wrote:
>>> The first major problem of Any is that it introduce a new primitive kind,
>>> a lot of codes relies on the fact that you can only have boolean, byte,
>>> short,
>>> char, int, long, double and Object in Java. Introducing a new type will
>>> make a lot of code to behave weirdly.
>> You mean code in Java programs?
>> I'm not seeing this. By nature it's impossible to switch over the
>> primitive types with instanceof or whatever, so I'm having a really
>> hard time visualizing what kind of Java code would break.
>> Examples?
> switch(type.getName()) { ...
>>> The second major problem is how to implement it.
>>> It can be implemented as a compiler fiction only like in C# or in Scala,
>>> but in that case apart that one can write ArrayList<int>, it will
>>> rely on a form of boxing.
>> Well, that's not quite right. That the concrete instantiation
>> ArrayList<Any> works using boxing doesn't say much about whether
>> ArrayList<int> does. You can still use specialization, AFAICT.
> yes, but you don't need Any to do primitive type specialization,
> you just need Any as a bound of a type parameter that's
> the whole point of the discussion now ?
>>> It can be implemented in the VM. It seems to be the best solution
>>> but the engineering cost is insane and in fact Any will not solve
>>> the specialization problem.
>>> Why the engineering cost is very high ?
>>> Currently most of the Java VM implementations rely on the fact that
>>> you know upfront (without executing the code) if a field or a local
>>> variable
>>> is a reference or an object. In term of GC, there is a very clear
>>> distinction
>>> between the precise GC algorithms used in Java VM and the conservative
>>> GC algorithms used by example in C. Technically, with Any, you can
>>> implement something in the middle between these two categories of GC,
>>> nevertheless, it changes the category of GCs, Java will be able to use.
>> But you're introducing a new kind of field/variable here. Pre-existing
>> code should keep behaving the same. It's only for new usages of this
>> new thing that the new behavior is required.
>> I understand that the VM would need to change to accommodate this. But
>> I think it makes the VM more useful,
> Yes, it makes the VM more powerful, I think we can all agree with that.
> The problem is that its a pervasive change that as far as I know
> nobody has even tried.
> I will be very happy to see Redhat to found a project like this,
> taking the source of hotspot and trying to add Any.
> About the separation of behaviors, it's exactly like saying,
> if I use a JVM but only with primitive types, no objects,
> I will not pay the price of having to deal with objects.
> That's not true, the VM will still prepare internal data structures
> to be able to crawle the stack during a GC even if a GC
> will never occur in that case.
>> and I strongly speculate that all the dynamic language folks would thank
>> you for it.
> :)
> Having all values stored as Any require the VM to check the runtime type of
> all operands
> for each operation, this is what CPython does, but not what more advanced
> dynamic language runtimes do.
> They do optimistic speculative type analysis, i.e. generate a code by
> examples for ints
> and de-optimise if a type is a String and then re-generate a new code.
> So sure Any will help, but not as much as you think.
>>> Any will not solve the specialization problem.
>>> Let say we now we have introduce Any, so we can create an array of Any.
>>> But an array of Any can be at runtime by subtyping either
>>> an array of doubles or an array of objects.
>>> But by doing this we introduce a subtyping relationship between things
>>> that
>>> are not structurally equivalent. A possible solution is to align all
>>> cells
>>> of all arrays
>>> to 64 bits which is a very bad property because even if you don't use
>>> Any,
>>> you pay the cost of that you may use it. Another solution, the one used
>>> by
>>> V8,
>>> is to box all doubles (and long in Java) when you store them in an
>>> array*.
>>> So an array of Any will not be magically an array of ints, an array of
>>> doubles
>>> and an array of objects without some trade-offs.
>> Hrm, you're right, I had not considered that byte[] is an Any[], due
>> to the totally broken "covariance" of arrays in Java.
>> That sucks, and it's worse that it sucks because of the type system
>> being broken :-(

I think Any[] needs needs to erase either to Object, or to a new type
such as "AnyArray". You need to use reflection to access the elements
of of an Any[] by looking at the array element type tag.

In Scala, we do not have exactly the same problem as Array is
invariant, but an analogous problem arises for []T, where T is an
unbounded type variable. We erase []T to Object. It does not run
super-fast now (performance hit of about a factor of 10-20 compared to
native array access),  but it looks easy to optimize if the JVM is
aware of it.


 - Martin

>> Couldn't we just say that a byte[] actually isn't an Any[]? I mean,
>> that's actually *correct*. Why double down on stupid?
> how to implement ArrayList in that case ?
> class ArrayList<T extends Any> {
>   T[] array;   // <-- this is an array of Any !
> }
> cheers,
> Rémi

Martin Odersky

More information about the valhalla-dev mailing list