RFR JDK-8225339 Optimize HashMap.keySet()/HashMap.values()/HashSet toArray() methods

Thu Jun 6 04:25:29 UTC 2019

Hello, Stuart, Claes!

Thanks for review. Indeed, you have a point about behavioral change. I
filed a CSR:
https://bugs.openjdk.java.net/browse/JDK-8225393

I wanted to generate a specdiff and attach to CSR, but it seems that I
don't know how to do it :(
I found nothing in OpenJDK Wiki about this. Could somebody help me? It's
expected that
some differences will appear in HashSet and LinkedHashSet javadoc as now
toArray is
not inherited from AbstractCollection anymore.

Code-related suggestions are addressed in new webrev (along with Roger's
suggestions):
http://cr.openjdk.java.net/~tvaleev/webrev/8225339/r2/

With best regards,
Tagir Valeev.

On Thu, Jun 6, 2019 at 6:48 AM Stuart Marks <stuart.marks at oracle.com> wrote:

>
>
> On 6/5/19 7:27 AM, Claes Redestad wrote:
> > On 2019-06-05 15:49, Tagir Valeev wrote:
> >> In particular it's never mentioned in HashSet, HashMap or Collection
> spec that
> >> toArray implements a fail-fast behavior: this is said only about the
> >> iterator() method.
> >
> > that's not entirely true, since the @implSpec for toArray on
> > AbstractCollection states it is equivalent to iterating over the
> > collection[1].
> >
> > So since the implementations you're changing inherit their current
> > implementation from AbstractCollection and the iterators of HashMap is
> > specified to be fail-fast, then that behavior can be argued to be
> > specified also for the affected toArray methods.
> >
> > I'm not fundamentally objecting to the behavior change, but I do think
> > it needs careful review and a CSR (or at least plenty of reviewers
> > agreeing that one isn't needed).
>
> OK, let's slow down here a bit.
>
> The fail-fast, CME-throwing behavior is primarily of interest when
> iteration is
> external, that is, it's driven by the application making a succession of
> calls
> on the Iterator. The main concern arises when the application makes other
> modifications to the collection being iterated, between calls to the
> Iterator.
>
> There is always the possibility of modification by some other thread, and
> this
> is mentioned obliquely in the spec near "fail-fast", but given memory
> visibility
> issues it's pretty much impossible to make any concrete statements about
> what
> happens if modifications are made by another thread.
>
> In the case of these methods, the iteration occurs entirely within the
> context
> of a single method call. There's no possibility of the application making
> a
> concurrent modification on the same thread. So, while the fail-fast
> behavior is
> being changed, strictly speaking, I consider it "de minimis", as it's
> pretty
> difficult for an application to observe this change in behavior.
>
> Regarding the @implSpec, that applies only to the implementation in
> AbstractCollection, and @implSpec is not inherited. Again, it is a change
> in
> behavior since this method is being changed from being inherited to being
> overridden, but it's not a specification issue.
>
> In any case I don't think the concurrent modification behavior change is
> an
> issue to be concerned about.
>
> **
>
> Now, regarding visible changes in behavior, it's quite easy to observe
> this
> change in behavior, at least from a subclass of HashSet. A HashSet
> subclass
> could override the iterator() method and detect that it was called when
> toArray() is called. With this change, toArray() is overridden, and so
> iterator() would no longer be called in this circumstance.
>
> This kind of change is generally allowable, but we have had complaints
> about the
> pattern of self-calls changing from release to release. This isn't a
> reason NOT
> to make the change, but the fact is that it does make changes to the
> visible
> behavior, and this potentially does affect actual code. (Code that I'd
> consider
> poorly written, but still.)
>
> I talked this over with Joe Darcy (CSR chair) and we felt that it would be
> prudent to file a CSR a request to document the behavior change.
>
> **
>
> Some comments on the code.
>
> Overall I think the changes are going in the right direction. It's amazing
> that
> after all this time, there are still cases in core collections that are
> inheriting the slow Abstract* implementations.
>
> In HashMap, I think the factoring between keysToArray() and prepareArray()
> can
> be improved. The prepareArray() method itself is sensible, in that it
> implements
> to weird toArray(T[]) semantics of allocating a new array if the given one
> is
> too short, and it puts a 'null' at the right place if the given array is
> too
> long, and returns the possibly reallocated array.
>
> The first thing that keysToArray() does is to call prepareArray() and use
> its
> return value.  What concerns me is that the no-arg toArray() method does
> this:
>
>      return keysToArray(new Object[size]);
>
> so the array is created with exactly the right size; yet the first thing
> keysToArray() does is to call prepareArray(), which checks the size again
> and
> determines that nothing need be done.
>
> It would be a small optimization to shift things around so that
> keysToArray()
> takes an array known to be "prepared" already and to remove its call to
> prepareArray(). Then, require the caller to call prepareArray() if
> necessary.
> Then we'd have this:
>
>   953  public Object[] toArray() { return keysToArray(new Object[size]); }
>   954  public <T> T[] toArray(T[] a) { return
> keysToArray(prepareArray(a)); }
>
> I'm not terribly concerned about avoiding an extra method call. But other
> code
> in this file, for example like the following:
>
>   930  Node<K,V>[] tab;
>   931  int idx = 0;
>   932  if (size > 0 && (tab = table) != null) {
>
> is written with an embedded assignment to 'tab', probably in order to
> avoid a
> useless load of the 'table' field if the size is zero. (This style occurs
> in
> several places.) So, if this code is making this level of
> micro-optimizations
> already, let's not waste it by calling extra methods unnecessarily. (I
> don't
> know if this will impact the benchmarks though.)
>
> Aside from performance, refactoring keysToArray/prepareArray this way
> makes more
> sense to me anyway.
>
> Similar adjustments to the call sites within HashSet and LinkedHashMap
> would
> need to be done.
>
> I'd recommend adding doc comments to HashMap prepareArray() and
> keysToArray(),
> since they are either called from or overridden from outside this file. It
> needn't be a full-on spec comment, but enough to make it clear that other
> things
> are depending on these internal interfaces.
>
> Thanks,
>
> s'marks
>