Excessive copying of Collection.toArray()
Stuart Marks
stuart.marks at oracle.com
Thu Sep 4 03:55:13 UTC 2025
>> There have been previous attempts to reduce copying here and there, but they do suffer from the problems you have described. See https://github.com/openjdk/jdk/pull/12212
>
> That's a valuable link. It touches on the same problem and potential
> solutions. It's a shame it was abandoned. This superficially simple
> problem touches performance, security, and bootstrapping and that
> scares people off.
I think that PR is a good example of why we don't like to start off with code. As
you noted there are a bunch of subtleties to the problem. Starting off with a PR
with code mixes the fundamental issues with coding issues and the whole thing goes
off into the weeds. If the code was written under the assumption that it was trying
to solve problem A, but the real problem is problem B, then it turns into an uphill
struggle to convert the existing code from solving problem A to solving problem B.
Or worse, the code change bounces around trying to solve several problems but solves
none of them well. (There have been several other examples of this phenomenon.)
> But I would still plead that there is some low-hanging fruit for
> improvement here. There are defensive clones and "is it an ArrayList
> (but not ArrayList subclass!!!)" optimizations copy-pasted in half a
> dozen scattered places. If that optimization is still valuable, it
> would benefit from a reusable utility method, where it can be
> centrally maintained, documented, and evangelized. If not, it should
> be deleted.
Agreed, a simple-though-incomplete approach might provide some useful improvements
in the short term.
> Also, the knowledge of why the defensive clone is made at all, and why
> it must be of type Object[] and not a covariant subclass, these are
> currently arcane secrets held only by those who know the history of
> the associated bug reports. That's another part of why I argue for a
> central method that can be named and discussed. (I originally
> requested it in a webbug report, and was told it would be better
> discussed on the mailing list instead.)
Two issues here.
The need for an extra defensive copy was established and fixed via the security
vulnerability process and as such, all discussion of it was embargoed. Sorry.
However, the sources are all out there and if you look at the code prior to commit
[1] you can see the vulnerable code. Also, the eagle-eyed Heinz Kabutz noticed this
and wrote about it in his Java Specialists' newsletter [2]. (Leave it to Heinz to
relate the ArrayList.class check to the particular local origin of Cretan
vegetables.) But I think you've discerned the issue yourself already.
The second issue is that, by specification, the array returned by the no-arg
toArray() method must have a component type of Object. The specification previously
wasn't clear about this, and this led to a variety of bugs. See [3] and bugs linked
from there. Now, *why* is it specified this way? Well, the static return type of
toArray() is Object[], which leads callers to believe they can store any object into
it. If toArray() were to return an array with some other component type, they'd get
ArrayStoreException. So, the spec requires that you get what it says on the tin.
s'marks
[1]: https://github.com/openjdk/jdk/commit/343ecd806bb05065d31124573b3e58a864cfc617
[2]:
https://www.javaspecialists.eu/archive/Issue290-CopyOnWriteArrayList-and-Collection.toArray.html
[3]: https://bugs.openjdk.org/browse/JDK-8160406
More information about the core-libs-dev
mailing list