Why does Set.of disallow duplicate elements?

dfranken.jdk at gmail.com dfranken.jdk at gmail.com
Tue Feb 2 07:43:12 UTC 2021


Okay, so to summarize this discussion a bit:

Set.of is, like its counterpart Map.ofEntries, only to be used with
explicit varargs, not an array which happens to be accepted as a
varargs parameter, this would only confuse things. And if it is called
with varargs, the intention is to catch programming errors at runtime
by throwing an exception on duplicates instead of silently ignoring
them.

If we were to silently allow duplicates in Set.of you would run into
the situation where the number of varargs passed would not match the
size of the Set and this may also be surprising. Yet we already have
the understanding that Sets eliminate duplicates so this might not be
that big of an issue.

I do like the proposal of having a canonical way to turn an array into
a Set though, but that might be a discussion for another time and maybe
it will be better suited if the arrays themselves are somehow improved.

Thanks for your input,

Dave

On Mon, 2021-02-01 at 16:37 -0800, Stuart Marks wrote:
> Indeed it's the case that a varargs method can't determine whether it
> was called 
> with several explicit arguments or whether it was called with an
> array. However, 
> that doesn't have any bearing on whether or not Set.of rejects
> duplicates.
> 
> The model for Set.of is to support a collection-literal-like syntax
> where the 
> programmer can write an arbitrary number of elements in the source
> code for 
> inclusion in the set. Here's an example (though it uses Map.ofEntries
> instead of 
> Set.of, the same rationale applies):
> 
> 
> Map<String, TokenType> tokens = Map.ofEntries(
>      entry("@",     AT),
>      entry("|",     VERTICAL_BAR),
>      entry("#",     HASH),
>      entry("%",     PERCENT),
>      entry(":",     COLON),
>      entry("^",     CARET),
>      entry("&",     AMPERSAND),
>      entry("|",     EXCLAM),
>      entry("?",     QUESTION),
>      entry("$",     DOLLAR),
>      entry("::",    PAAMAYIM_NEKUDOTAYIM),
>      entry("=",     EQUALS),
>      entry(";",     SEMICOLON)
> );
> 
> 
> This errors out instead of silently dropping one of the entries.
> 
> As an optimization, the API provides several fixed-arg overloads of
> Set.of. With few 
> arguments, the fixed-arg methods are called. If more arguments are
> added, at a 
> certain point it transparently switches to the varargs form.
> "Transparently" means 
> that you can't tell (without counting the arguments) whether a fixed-
> arg or varargs 
> form of Set.of will be called. You don't want the duplicate rejection
> semantics to 
> change if you add or remove an argument that happens to cross the
> fixed/varargs 
> threshold. Thus, Set.of rejects duplicates, whether in fixed or
> varargs form.
> 
> Set.copyOf(Arrays.asList(...)) is the best way to deduplicate an
> explicit list of 
> elements into a set.
> 
> s'marks
> 
> 
> 
> 
> On 2/1/21 3:01 PM, Aaron Scott-Boddendijk wrote:
> >   Dave,
> > 
> > > > Dave said...
> > > > Okay, I understand this reasoning, but when you want to
> > > > construct a Set
> > from an array, you might be tempted to use Set.of(...) because it
> > looks
> > like it supports an
> > > > array and indeed, you can do Set.of(new int[] {1, 2 }) I
> > > > believe?
> > > > 
> > > > Maybe this is just a quirk because of how varargs work.
> > 
> > > Rémi said...
> > > Set.of(int[]) will call Set.of(E) with E being an int[].
> > > but
> > > Set.of(new Integer[] { ... }) calls Set.of(...).
> > > 
> > > Yes, exactly, it's a known issue with varargs, you have no way to
> > > say, i
> > don't want this varargs to be called with an array.
> > 
> > I think the confusion is the interaction of boxing and varargs.
> > 
> > > List<Integer> list = List.of(1, 2);
> > 
> > is actually, once auto-boxing is applied by the compiler, executed
> > as...
> > 
> > > List<Integer> list = List.of(Integer.valueOf(1),
> > > Integer.valueOf(2));
> > 
> > So the equivalent explicit array form should use `Integer[]` not
> > `int[]`...
> > 
> > > Integer[] numbers = new Integer[] {1, 2};
> > > List<Integer> list = List.of(numbers);
> > 
> > Interestingly, if you actually wanted a `List<Integer[]>` you would
> > then
> > need to say
> > 
> > > Integer[] numbers = new Integer[] {1, 2};
> > > List<Integer> list = List.<Integer[]>of(numbers);
> > 
> > Which is explicitly telling the compiler what the type arguments
> > are for
> > this invocation of the generic method 'of'' (rather than allowing
> > it to use
> > type-inference)
> > 
> > Regarding the use of `Set.copyOf(Arrays.asList(...))`. I do wonder
> > about
> > improving the ceremony (because I agree that we want an obvious way
> > of
> > getting immutable Sets from non-unique inputs) by following the
> > pattern
> > presented in Optional (`Optional.of` and `Optional.ofNullable`) and
> > providing `Set.of` and `Set.ofMaybeUnique` (better name needed -
> > 'ofOptionallyUnique'?) - to which the implementation could just be
> > `Set.copyOf(Arrays.asList(args))` (unless a more efficient path
> > proves
> > valuable).
> > 
> > `Arrays.asList(...array...)` is not all that expensive. It is _not_
> > an
> > ArrayList but a much simpler type with rather trivial
> > implementations for
> > most methods (and 'always throws' implementations for methods that
> > are
> > unsupported). So not only does it mean that there's no copying
> > occuring to
> > make the list but it's even possible that JIT manages enough
> > specialisation
> > and inlining to elide the allocation entirely (though in practice
> > this
> > doesn't occur as often as we might like).
> > 
> > --
> > Aaron Scott-Boddendijk
> > 
> > On Mon, Feb 1, 2021 at 10:35 AM <forax at univ-mlv.fr> wrote:
> > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > De: "dfranken jdk" <dfranken.jdk at gmail.com>
> > > À: "Remi Forax" <forax at univ-mlv.fr>
> > > Cc: "core-libs-dev" <core-libs-dev at openjdk.java.net>
> > > Envoyé: Dimanche 31 Janvier 2021 13:54:44
> > > Objet: Re: Why does Set.of disallow duplicate elements?
> > > 
> > > 
> > > 
> > > 
> > > BQ_BEGIN
> > > 
> > > Okay, I understand this reasoning, but when you want to construct
> > > a Set
> > > from an array, you might be tempted to use Set.of(...) because it
> > > looks
> > > like it supports an array and indeed, you can do Set.of(new int[]
> > > {1, 2 })
> > > I believe?
> > > 
> > > BQ_END
> > > 
> > > Set.of(int[]) will call Set.of(E) with E being an int[].
> > > but
> > > Set.of(new Integer[] { ... }) calls Set.of(...).
> > > 
> > > 
> > > BQ_BEGIN
> > > 
> > > 
> > > Maybe this is just a quirk because of how varargs work.
> > > 
> > > BQ_END
> > > 
> > > Yes, exactly, it's a known issue with varargs, you have no way to
> > > say, i
> > > don't want this varargs to be called with an array.
> > > 
> > > 
> > > BQ_BEGIN
> > > 
> > > 
> > > I wondered if there was a canonical way to create a Set from an
> > > array, but
> > > couldn't find it, maybe I am missing something?
> > > I did notice Arrays.asList exists (which makes sense because it
> > > creates an
> > > ArrayList backed by the array), but not Arrays.asSet.
> > > 
> > > BQ_END
> > > 
> > > asList() reuse the same backing array, you can not do that with
> > > asSet() or
> > > contains() will be in O(n) in the worst case.
> > > 
> > > 
> > > BQ_BEGIN
> > > 
> > > 
> > > So the way I would create a Set from an array would be either
> > > Arrays.stream(myArr).collect(Collectors.toUnmodifiableSet()) or
> > > new
> > > HashSet<>(Arrays.asList(myArray)) or
> > > Set.copyOf(Arrays.asList(myArray)).
> > > 
> > > BQ_END
> > > 
> > > yes, the last one is the easy way to create an unmodifiable set
> > > from an
> > > array.
> > > 
> > > 
> > > BQ_BEGIN
> > > 
> > > 
> > > I'm not saying the way it is currently implemented is wrong, it's
> > > just
> > > something which can suprise developers as it surprised me. :)
> > > 
> > > BQ_END
> > > 
> > > Arrays are currently second class citizen in Java, because they
> > > are always
> > > modifiable and always covariant (String[] can be seen as a
> > > Object[]).
> > > We have talked several times to introduce new variants of arrays,
> > > non-modifiable one, non-covariant one, etc under the name Array
> > > 2.0, but
> > > Valhalla generics is a blocker for that project.
> > > Once Valhalla is done, it may be a follow up.
> > > 
> > > 
> > > BQ_BEGIN
> > > 
> > > 
> > > Kind regards,
> > > 
> > > Dave
> > > 
> > > BQ_END
> > > 
> > > 
> > > regards,
> > > Rémi
> > > 
> > > 
> > > BQ_BEGIN
> > > 
> > > 
> > > 
> > > Op za 30 jan. 2021 om 21:30 schreef Remi Forax < [ mailto:
> > > forax at univ-mlv.fr | forax at univ-mlv.fr ] >:
> > > 
> > > BQ_BEGIN
> > > Set.of() is the closest way we've got to a literal Set without
> > > having
> > > introduced a special syntax for that in the language.
> > > 
> > > The idea is that if you conceptually want to write
> > > Set<String> set = { "hello", "world" };
> > > instead, you write
> > > Set<String> set = Set.of("hello", "world");
> > > 
> > > In that context, it makes sense to reject Set constructed with
> > > the same
> > > element twice because this is usually a programming error.
> > > So
> > > Set.of("hello", "hello")
> > > throws an IAE.
> > > 
> > > If you want a Set from a collection of elements, you can use
> > > Set.copyOf(List.of("hello", "hello"))
> > > 
> > > regards,
> > > Rémi
> > > 
> > > ----- Mail original -----
> > > > De: "dfranken jdk" < [ mailto:dfranken.jdk at gmail.com |
> > > dfranken.jdk at gmail.com ] >
> > > > À: "core-libs-dev" < [ mailto:core-libs-dev at openjdk.java.net |
> > > core-libs-dev at openjdk.java.net ] >
> > > > Envoyé: Samedi 30 Janvier 2021 19:30:06
> > > > Objet: Why does Set.of disallow duplicate elements?
> > > 
> > > > Dear users,
> > > > 
> > > > While looking at the implementation of Set.of(...) I noticed
> > > > that
> > > > duplicate elements are not allowed, e.g. Set.of(1, 1) will
> > > > throw an
> > > > IllegalArgumentException. Why has it been decided to do this?
> > > > 
> > > > My expectation was that duplicates would simply be removed.
> > > > 
> > > > If I do for instance new HashSet<>(<collection containing
> > > > duplicates>)
> > > > it works and duplicates are removed. To me, it looks a bit
> > > > inconsistent
> > > > to have duplicates removed for a collection passed in the
> > > > constructor,
> > > > but not for a collection (even though it is a vararg array)
> > > > passed to a
> > > > static factory method.
> > > > 
> > > > Kind regards,
> > > > 
> > > > Dave Franken
> > > 
> > > BQ_END
> > > 
> > > 
> > > BQ_END
> > > 
> > > 
> > > 




More information about the core-libs-dev mailing list