RFR JDK-8225339 Optimize HashMap.keySet()/HashMap.values()/HashSet toArray() methods
Stuart Marks
stuart.marks at oracle.com
Wed Jun 5 23:48:35 UTC 2019
On 6/5/19 7:27 AM, Claes Redestad wrote:
> On 2019-06-05 15:49, Tagir Valeev wrote:
>> In particular it's never mentioned in HashSet, HashMap or Collection spec that
>> toArray implements a fail-fast behavior: this is said only about the
>> iterator() method.
>
> that's not entirely true, since the @implSpec for toArray on
> AbstractCollection states it is equivalent to iterating over the
> collection[1].
>
> So since the implementations you're changing inherit their current
> implementation from AbstractCollection and the iterators of HashMap is
> specified to be fail-fast, then that behavior can be argued to be
> specified also for the affected toArray methods.
>
> I'm not fundamentally objecting to the behavior change, but I do think
> it needs careful review and a CSR (or at least plenty of reviewers
> agreeing that one isn't needed).
OK, let's slow down here a bit.
The fail-fast, CME-throwing behavior is primarily of interest when iteration is
external, that is, it's driven by the application making a succession of calls
on the Iterator. The main concern arises when the application makes other
modifications to the collection being iterated, between calls to the Iterator.
There is always the possibility of modification by some other thread, and this
is mentioned obliquely in the spec near "fail-fast", but given memory visibility
issues it's pretty much impossible to make any concrete statements about what
happens if modifications are made by another thread.
In the case of these methods, the iteration occurs entirely within the context
of a single method call. There's no possibility of the application making a
concurrent modification on the same thread. So, while the fail-fast behavior is
being changed, strictly speaking, I consider it "de minimis", as it's pretty
difficult for an application to observe this change in behavior.
Regarding the @implSpec, that applies only to the implementation in
AbstractCollection, and @implSpec is not inherited. Again, it is a change in
behavior since this method is being changed from being inherited to being
overridden, but it's not a specification issue.
In any case I don't think the concurrent modification behavior change is an
issue to be concerned about.
**
Now, regarding visible changes in behavior, it's quite easy to observe this
change in behavior, at least from a subclass of HashSet. A HashSet subclass
could override the iterator() method and detect that it was called when
toArray() is called. With this change, toArray() is overridden, and so
iterator() would no longer be called in this circumstance.
This kind of change is generally allowable, but we have had complaints about the
pattern of self-calls changing from release to release. This isn't a reason NOT
to make the change, but the fact is that it does make changes to the visible
behavior, and this potentially does affect actual code. (Code that I'd consider
poorly written, but still.)
I talked this over with Joe Darcy (CSR chair) and we felt that it would be
prudent to file a CSR a request to document the behavior change.
**
Some comments on the code.
Overall I think the changes are going in the right direction. It's amazing that
after all this time, there are still cases in core collections that are
inheriting the slow Abstract* implementations.
In HashMap, I think the factoring between keysToArray() and prepareArray() can
be improved. The prepareArray() method itself is sensible, in that it implements
to weird toArray(T[]) semantics of allocating a new array if the given one is
too short, and it puts a 'null' at the right place if the given array is too
long, and returns the possibly reallocated array.
The first thing that keysToArray() does is to call prepareArray() and use its
return value. What concerns me is that the no-arg toArray() method does this:
return keysToArray(new Object[size]);
so the array is created with exactly the right size; yet the first thing
keysToArray() does is to call prepareArray(), which checks the size again and
determines that nothing need be done.
It would be a small optimization to shift things around so that keysToArray()
takes an array known to be "prepared" already and to remove its call to
prepareArray(). Then, require the caller to call prepareArray() if necessary.
Then we'd have this:
953 public Object[] toArray() { return keysToArray(new Object[size]); }
954 public <T> T[] toArray(T[] a) { return keysToArray(prepareArray(a)); }
I'm not terribly concerned about avoiding an extra method call. But other code
in this file, for example like the following:
930 Node<K,V>[] tab;
931 int idx = 0;
932 if (size > 0 && (tab = table) != null) {
is written with an embedded assignment to 'tab', probably in order to avoid a
useless load of the 'table' field if the size is zero. (This style occurs in
several places.) So, if this code is making this level of micro-optimizations
already, let's not waste it by calling extra methods unnecessarily. (I don't
know if this will impact the benchmarks though.)
Aside from performance, refactoring keysToArray/prepareArray this way makes more
sense to me anyway.
Similar adjustments to the call sites within HashSet and LinkedHashMap would
need to be done.
I'd recommend adding doc comments to HashMap prepareArray() and keysToArray(),
since they are either called from or overridden from outside this file. It
needn't be a full-on spec comment, but enough to make it clear that other things
are depending on these internal interfaces.
Thanks,
s'marks
More information about the core-libs-dev
mailing list