ArrayFactory SAM type / toArray

Wed Sep 19 15:13:43 PDT 2012

> I don't find these arguments convincing.  There's no race (any more than
> there is for any bulk operation) as the allocation is done by the object
> itself.  The allocation stuff is pretty much a red herring: most users
> don't preallocate the array. So it seems to me that using factories here
> might amount to needless complexity and inconsistency.

I agree with you that most users don't pre-allocate the array.  Which 
makes the existing form of toArray even more unfortunate!  Because then 
the allocation always involves multiple reflective calls.  (Some of 
which are sometimes optimized by some VMs in some conditions, but none 
of which are always optimized by all VMs in all conditions.)  So the 
performance will always be worse in the toArray(T[]) formulation.

The fundamental problem is that the client knows how best to create the 
array (the client can say "new Foo" but the library cannot say "new T", 
and therefore has to fall back to reflection), but the library knows 
best how big the array should be.

This is a classic example of the sort of differences in APIs you get 
when designing an API with or without closures.  The client knows how; 
the library knows how much; ideally we'd like for the client to pass 
that knowledge into the library.  The approximations we got when it is 
hard to combine these are unfortunate; we can do better now.

I'd find David's suggestion of toArray(Class) more compelling (in some 
sense it is the most "right" in that it doesn't conflate "what" with 
"how") except I don't buy that the intrinsification of reflective array 
allocation in some VMs in some compilation modes in some situations 
makes all the reflective costs go away.

We're creating a new API here.  All things being equal, we should lean 
on consistency with existing APIs when we can, but obviously that is 
just a guideline (someday we're going to have to contend with the fact 
that an int isn't big enough to store the size of collections.)  The 
existing toArray signatures are the best we could have done at the time 
(and that was a very different time), but that doesn't mean we shouldn't 
seek to do any better.

Here are what the client callsites might look like in various cases:

  // status quo
  Foo[] foos = ...toArray(new Foo[0]);            // ugh reflection
  Foo[] foos = ...toArray(new Foo[xyz.size()]);   // ugh ugly and racy

  // proposed
  Foo[] foos = ...toArray(n -> new Foo[n]);

  // David's alternative
  Foo[] foos = ...toArray(Foo.class);

I don't see the "complexity of factories" being a big problem here -- if 
people can deal with lambdas at all, this is a pretty simple case, and 
its only a few characters longer than the "new Foo[0]" version.  I think 
the lambda code reads pretty naturally.  (Actually I find the "new 
Foo[0]" the most confusing -- why would I pass in a new empty array?)