[External] : Re: New candidate JEP: 431: Sequenced Collections

Stuart Marks stuart.marks at oracle.com
Sat Oct 29 00:16:06 UTC 2022



On 10/18/22 8:04 AM, forax at univ-mlv.fr wrote:
>> Introduction of new types always poses a dilemma for library maintainers. They
>> can
>> move forward aggressively and embrace new features, or they can remain on older
>> releases in order to reach a broader audience. That's not a reason to avoid
>> introducing new types.
> 
> I agree in absolute, but in this specific case you want to introduce interfaces deep into the existing hierarchy of interfaces, not on the top.
> The introduction of CharSequence has been painful and very slow, that interface was added in 1.4 but we had to wait 9 more than 10 years later to have parseInt() that takes a CharSequence.
> As an application developer, a decade is the kind of time i will have to wait before i can use a sequenced collection.
> 
> Adding default methods is far faster in term of adoption, computeIfAbsent or removeIf were available at the time of the release of 1.8.
> Default methods have issues too, it only works if a good default implementation exists.

You seem to be talking about introduction of new types in opposition to introduction 
of default methods. This proposal does both. Clearly if you're using an old library, 
you can't use new types with it. But if the old library consumes or produces say, a 
List, new applications can definitely use all the default methods added to List by 
this proposal.

We've previously discussed adding new default methods (getFirst, reversed) to some 
existing type (like Collection) instead of on a new type. The problem is that they 
appear on all collections, even those that have no encounter order. Their 
specifications would thus be so weak as to be meaningless. That's the justification 
for adding a new type: to give these methods meaning.

>>> - LinkedHashMap can be tweaked in two ways, by passing an access order as 3rd
>>> parameter of the constructor or by overriding removeEldesEntry(), in both cases
>>> the resulting LinkedHashMap is at odds with the contract of SequencedMap but
>>> the compiler will happily allow to see those LinkedHashMap as SequencedMap.
>>
>> SequencedMap specifies that entries have an encounter order but it doesn't make
>> any
>> requirements on how that order is established. LinkedHashMap's access-order mode
>> is
>> unusual in that it specifies that specific methods additionally have the side
>> effect
>> of reordering the relevant entry. The removeEldestEntry() method modifies the
>> behavior of mapping-insertion methods to optionally remove the first ("eldest")
>> entry. Neither of these behaviors conflicts with anything in SequencedMap.
> 
> It conflicts with the idea of the first element (or last element)
>    SequencedSet set = ...
>    var first = set.getFirst();
>    assertNotNull(first);
>    set.add("foo");
>    var first2 = set.getFirst();
>    assertEquals(first, first2);   // should always be true, right ?

(LinkedHashSet doesn't have a removeEldestEntry() method; that's only on 
LinkedHashMap. But let's consider the example to be rewritten as if it were about 
LinkedHashMap, or a SequencedSet built from the keySet of a LinkedHashMap.)

See draft specification here:

https://github.com/openjdk/jdk/pull/7387/files#diff-be823ac1fb09a9858a40bdd29359911e534f5d9d3079e69ed31a34e37f72c7a0

There is a definition of encounter order, but there is no specification of how the 
encounter order is established or how it's modified. That's left to implementations 
and subtypes.

This example 1) gets the first element, 2) adds another element, and 3) gets the 
first element again. Step 2 is a modification to the collection, so there's no basis 
for asserting that the element obtained in step 1 always equals the element obtained 
in step 3.

In particular, LinkedHashMap's "removeEldestEntry" feature says that add() 
potentially has additional side effects beyond adding the argument element, 
including of course removing the first element.

Also, in the example, if SequencedSet is a TreeSet<String> containing "x", then 
adding "foo" quite sensibly changes the first element between steps 1 and 3.

>>> - LinkedHashMap/LinkedHashSet are dinosaurs, there are more efficient
>>> implementations of the concepts of ordered set / ordered map both in term of
>>> memory footprint and runtime execution, so adding new interfaces without
>>> exploring new implementations using Valhalla value class and in relation with
>>> the Collection Literal JEP seems premature to me.
>>
>> LinkedHashMap and LinkedHashSet are in fact used fairly frequently. They're not
>> used
>> as much as HashMap, but they're used more than ArrayDeque or TreeMap. Presumably
>> this is because people find LinkedHashMap/Set useful and that the overhead
>> compared
>> to their non-linked counterparts is acceptable for the additional services they
>> provide.
>>
>> The performance and footprint of LinkedHashMap/Set are not impacted either way
>> by
>> the JEP, nor are any potential future performance improvements (including those
>> that
>> might arise because of Valhalla) precluded by the JEP.
> 
> I don't disagree with all you say above, i'm saying that with Valhalla, we need to explore implementations of HashMap based on open addressing to get better cache performance.
> So adding addFirst() to LinkedHashMap now, is a risk.

As the LinkedHashMap implementation stands today (prior to JEP 431) it needs to 
perform a variety of operations that affect the order of the elements. The 
addFirst() method is a new operation, but it's only a small amount of additional 
work, given that there are already other operations that operate on both ends.

In a hypothetical future open-addressed Map, there would need to be some additional 
data structure (likely an array?) that maintains the encounter order. This structure 
would need to support various element shifting and copying operations to maintain 
the ordering; consider what would need to happen with a get() operation on an 
access-ordered map. I don't see any problem with implementing an addFirst() 
operation on this structure.

> Also, addFirst on a sequenced collection is a dangerous method, exactly like List.get() is, depending on the implementation the worst case complexity can be either O(1) or O(n).
> I believe we will all live in a better place if we do not add new methods with such "complexity range".

Not the same at all. The problem with LinkedList.get(i) is that people put it in a 
for-loop over int indexes, which seems like it should be O(n), but as we know it's 
O(n^2). That occurs a lot more frequently than indexed add() or remove().

s'marks



More information about the core-libs-dev mailing list