Excessive copying of Collection.toArray()

Shaun Spiller shaunspiller at gmail.com
Wed Sep 3 18:21:53 UTC 2025


> Hello Shaun, note that the optimizing compiler in the JVM (C2) can likely eliminate 3 of the 4 copies, all except 1 (which is not a defensive copy).
> 2 could be eliminated when escape analysis finds the result of HashSet.toArray does not escape beyond the toArray method, so subsequent clone is elided.
> 4 could be eliminated when EA concludes the result of ArrayList.toArray does not escape beyond the toArray method, so subsequent clone is elided.
> 3 could be eliminated when EA concludes the `ArrayList` does not escape beyond the sortedImmutableList method. This one is harder for compilers than the previous two.

Thank you. I didn't know the JVM could potentially elide array clones,
but I would say that either this optimization is insufficient, OR the
optimizations that classes are currently using to avoid the clone are
then redundant. It can't be both.

> There have been previous attempts to reduce copying here and there, but they do suffer from the problems you have described. See https://github.com/openjdk/jdk/pull/12212

That's a valuable link. It touches on the same problem and potential
solutions. It's a shame it was abandoned. This superficially simple
problem touches performance, security, and bootstrapping and that
scares people off.

But I would still plead that there is some low-hanging fruit for
improvement here. There are defensive clones and "is it an ArrayList
(but not ArrayList subclass!!!)" optimizations copy-pasted in half a
dozen scattered places. If that optimization is still valuable, it
would benefit from a reusable utility method, where it can be
centrally maintained, documented, and evangelized. If not, it should
be deleted.

Also, the knowledge of why the defensive clone is made at all, and why
it must be of type Object[] and not a covariant subclass, these are
currently arcane secrets held only by those who know the history of
the associated bug reports. That's another part of why I argue for a
central method that can be named and discussed. (I originally
requested it in a webbug report, and was told it would be better
discussed on the mailing list instead.)


More information about the core-libs-dev mailing list