<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Hello Shaun, note that the optimizing compiler in the JVM (C2) can likely eliminate 3 of the 4 copies, all except 1 (which is not a defensive copy).</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
2 could be eliminated when escape analysis finds the result of <code>HashSet.toArray</code> does not escape beyond the toArray method, so subsequent clone is elided.</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
4 could be eliminated when EA concludes the result of <code>ArrayList.toArray</code> does not escape beyond the toArray method, so subsequent clone is elided.</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
3 could be eliminated when EA concludes the `ArrayList` does not escape beyond the sortedImmutableList method. This one is harder for compilers than the previous two.</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Unlike the trusting by class loader, the escape analysis trusting can apply to user structures.</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
There have been previous attempts to reduce copying here and there, but they do suffer from the problems you have described. See
<a href="https://github.com/openjdk/jdk/pull/12212">https://github.com/openjdk/jdk/pull/12212</a>.</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Regards,</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Chen Liang</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> core-libs-dev <core-libs-dev-retn@openjdk.org> on behalf of Shaun Spiller <shaunspiller@gmail.com><br>
<b>Sent:</b> Tuesday, September 2, 2025 1:19 PM<br>
<b>To:</b> core-libs-dev@openjdk.org <core-libs-dev@openjdk.org><br>
<b>Subject:</b> Excessive copying of Collection.toArray()</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">Hello!<br>
<br>
Please consider the following real utility method I have:<br>
<br>
/**<br>
* Returns an immutable list via {@link List#of} that is the<br>
* sorted content of the collection.<br>
*/<br>
public static <E> @NonNull List<E> sortedImmutableList(<br>
@NonNull Collection<E> col,<br>
@Nullable Comparator<? super E> comparator) {<br>
ArrayList<E> list = new ArrayList<>(col);<br>
list.sort(comparator);<br>
return List.copyOf(list);<br>
}<br>
<br>
Quiz question: Assume this method is invoked with a large HashSet as<br>
its collection argument. How many ARRAY copies of the collection are<br>
created by this method? Clearly there must be at least one for the<br>
result, held internally in the immutable list. Any others?<br>
<br>
<br>
(Scroll for the answer.)<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
.<br>
<br>
I was surprised to realize the answer is four:<br>
<br>
1. ArrayList calls toArray() on the HashSet.<br>
2. ArrayList defensively copies the array to assign to its internal<br>
elementData field, because it doesn't trust HashSet.<br>
3. List.copyOf calls toArray() on the ArrayList.<br>
4. List.of defensively copies the array to assign to its internal<br>
elements field, because it doesn't trust ArrayList.<br>
<br>
It is quite possible to rewrite my particular utility method with a<br>
raw array instead of ArrayList, to eliminate 2 array copies, but this<br>
requires an icky unchecked cast, and by rights is the kind of thing<br>
that shouldn't be necessary anyway.<br>
<br>
-----<br>
<br>
In general, this kind of excessive defensive copying is going on ALL THE TIME.<br>
<br>
The underlying nuisance is that a subclass of Collection could<br>
potentially override toArray() to keep a reference to the returned<br>
array and make mischief with it. (Also, for ArrayList, it would be a<br>
problem if the returned array were a subclass of Object[].)<br>
<br>
I did a search thru the JDK and found a slew of cases where code calls<br>
Collection.toArray() and then has to worry about defensively copying<br>
it -- or trying to avoid defensively copying it:<br>
<br>
- java.util.List.copyOf(Collection) -- avoids copying an existing<br>
List.of/copyOf list, but for any other collection it calls toArray()<br>
and then defensively copies it again<br>
<br>
- java.util.ArrayList(Collection) -- constructor has a special check<br>
`if (collection.getClass() == ArrayList.class)`, in which case it<br>
skips the defensive copy and trusts the result of toArray(), but it<br>
does not trust any other collection<br>
<br>
- java.util.Vector(Collection) -- constructor has the same<br>
optimization trick as ArrayList, where it trusts ArrayList, but<br>
creates a defensive copy for any other collection. Vector doesn't even<br>
trust itself!<br>
<br>
- java.util.PriorityQueue.initElementsFromCollection(Collection) --<br>
has optimization for ArrayList, but no other collection<br>
<br>
- java.util.concurrent.PriorityBlockingQueue(Collection) -- has<br>
optimization for ArrayList, but no other collection<br>
<br>
- java.util.concurrent.CopyOnWriteArrayList(Collection) -- has<br>
optimizations for ArrayList and CopyOnWriteArrayList, but no other<br>
collection.<br>
<br>
- java.util.concurrent.CopyOnWriteArrayList.addAllAbsent(Collection)<br>
-- has optimization for ArrayList but no other collection.<br>
<br>
- java.util.stream.Collectors.toUnmodifiableList() -- this avoids an<br>
excess copy, but does so in a very complicated way<br>
<br>
- sun.awt.util.IdentityArrayList(Collection) -- this appears to be an<br>
old copy-paste of ArrayList except with identity indexOf; like<br>
pre-2020 ArrayList, it blindly trusts that `col.toArray()` returns a<br>
new array, although based on this class's extremely limited use there<br>
is no manifestable bug.<br>
<br>
- java.lang.invoke.MethodHandles.dropArguments/dropArgumentsToMatch --<br>
these call `list.toArray()` and then clone the array unconditionally<br>
<br>
- java.time.zone.ZoneRules.of(...) -- calls `lastRules.toArray()` and<br>
then clones the array unconditionally<br>
<br>
-----<br>
<br>
I would suggest that there should be an internal utility method for<br>
use in all these cases, something like:<br>
<br>
public static Object[] safeToArray(Collection<?> c) {<br>
Object[] a = c.toArray();<br>
if (!isTrusted(c)) {<br>
a = Arrays.copyOf(a, a.length, Object[].class);<br>
}<br>
return a;<br>
}<br>
<br>
The delicate issue then is: what is the correct design of "isTrusted"?<br>
The check must be fast, and not spoofable by classes outside the JDK.<br>
<br>
My first thought was to detect that the class is loaded by the<br>
bootstrap class loader:<br>
<br>
static boolean isTrusted(Collection<?> c) {<br>
return c.getClass().getClassLoader() == null;<br>
}<br>
<br>
However, I now realize this is unwise because it would shift the<br>
burden of trust into **every** Collection class in the JDK, instead of<br>
into the classes that actually care about the result. For example,<br>
Collections.UnmodifiableCollection.toArray() simply returns the result<br>
of its internal `col.toArray()`, which immediately carves a hole the<br>
size of a bus thru the check.<br>
<br>
Perhaps better would be for classes to explicitly opt-in and say "Yes,<br>
I promise that my toArray() method is safe". Would a package-private<br>
marker interface work for this?<br>
<br>
I am not sure.<br>
<br>
Anyway, the point is that classes within the JDK should not need to<br>
create so many defensive copies of each other.<br>
<br>
Note that this is becoming more of an issue now that List.of is<br>
becoming a popular "go-to" list for many situations where ArrayList<br>
would traditionally have been used. The "only trust ArrayList" pattern<br>
was never ideal and is starting to age.<br>
<br>
This is my first post here so I apologize for oversights in formatting<br>
or etiquette.<br>
Thanks for reading.<br>
</div>
</span></font></div>
</body>
</html>