Bag (Multiset) Collection
ІП-24 Олександр Ротань
rotan.olexandr at gmail.com
Sat Apr 20 22:21:24 UTC 2024
Also, after some research, I found that HashMultisets also have some area
of application in image detection and event simulations
вс, 21 апр. 2024 г. в 01:19, ІП-24 Олександр Ротань <
rotan.olexandr at gmail.com>:
> Sorry for duplicate, I accidentally sent last message only to you
>
> вс, 21 апр. 2024 г. в 01:19, ІП-24 Олександр Ротань <
> rotan.olexandr at gmail.com>:
>
>> Firstly, about queues. As far as I know, priority queue is implemented as
>> heap, not tree, so things like subQueue doesn't make much sense in this
>> context.
>>
>> About popularity: this is indeed not very popular, might even be
>> negligible outside of sorted multiset (lets adopt java naming conventions
>> for those from now on). Although, I could provide at least few examples:
>>
>> 1. Batch tasks processing, especially in concurrent environments, when
>> tasks could have different priorities. Sometimes tasks that are stored in a
>> queue are processed in batches. Using today's stdlib of Java,
>> acquiring such a batch of tasks from collection would be a O(n * log n)
>> complexity, because each heapification is log n and called after each task
>> is popped. Using TreeMultiset, this could be achieved in log (n). This also
>> applies to messaging.
>>
>> 2. Implementing custom data structures. Not a task most of commercial
>> developers encounter, although for specific types of tasks, like big data
>> processing, custom datastructs could be a thing. Some collections,
>> expeciall ordered, like B-tree, could require sorted multiset as part of
>> internal implementation.
>>
>> 3. Again, for sorted multisets, counting frequency is a common task for
>> commercial developers. Counting element frequencies using multisets is the
>> thing that simply couldn't be achieved by set and is inefficient in list.
>> This could have various use cases, from data displaying in histograms in
>> generated reports to data validation is stateful systems (checking if user
>> has not exceeded permitted amount of attempts, for example).
>>
>> 4. Storage for the result of db queries. Another common task is to fetch
>> some ordered data from the database, and then modify it to preserve order.
>> This category of tasks could be optimized using ordered multisets.
>>
>> 5. Concurent bag, as you just mentioned. This could be used in concurrent
>> environment as I mentioned above, or just as less restricting
>> concurrent collection as preserving sequence of elements could be a complex
>> and resource consuming task in mutable concurrent collections
>>
>> As you might see, almost all of those indeed converge to a sorted
>> collection of duplicating elements. I am not sure If there is much more to
>> it, but what i would like to say is that feature like this would definitely
>> not be redundant for "enterprise" language like Java, where optimizing some
>> operations from O(n) or even O(n * log n) to O (log n) could introduce
>> significant performance improvement, as well as reducing amount of possible
>> bugs by taking on some of responsibilities related to preserving order of
>> elements sorted.
>>
>> вс, 21 апр. 2024 г. в 00:33, Holo The Sage Wolf <holo3146 at gmail.com>:
>>
>> вс, 21 апр. 2024 г. в 01:06, ІП-24 Олександр Ротань <
>> rotan.olexandr at gmail.com>:
>>
>>> Firstly, about queues. As far as I know, priority queue is implemented
>>> as heap, not tree, so things like subQueue doesn't make much sense in this
>>> context.
>>>
>>> About popularity: this is indeed not very popular, might even be
>>> negligible outside of sorted multiset (lets adopt java naming conventions
>>> for those from now on). Although, I could provide at least few examples:
>>>
>>> 1. Batch tasks processing, especially in concurrent environments, when
>>> tasks could have different priorities. Sometimes tasks that are stored in a
>>> queue are processed in batches. Using today's stdlib of Java,
>>> acquiring such a batch of tasks from collection would be a O(n * log n)
>>> complexity, because each heapification is log n and called after each task
>>> is popped. Using TreeMultiset, this could be achieved in log (n). This also
>>> applies to messaging.
>>>
>>> 2. Implementing custom data structures. Not a task most of commercial
>>> developers encounter, although for specific types of tasks, like big data
>>> processing, custom datastructs could be a thing. Some collections,
>>> expeciall ordered, like B-tree, could require sorted multiset as part of
>>> internal implementation.
>>>
>>> 3. Again, for sorted multisets, counting frequency is a common task for
>>> commercial developers. Counting element frequencies using multisets is the
>>> thing that simply couldn't be achieved by set and is inefficient in list.
>>> This could have various use cases, from data displaying in histograms in
>>> generated reports to data validation is stateful systems (checking if user
>>> has not exceeded permitted amount of attempts, for example).
>>>
>>> 4. Storage for the result of db queries. Another common task is to fetch
>>> some ordered data from the database, and then modify it to preserve order.
>>> This category of tasks could be optimized using ordered multisets.
>>>
>>> 5. Concurent bag, as you just mentioned. This could be used in
>>> concurrent environment as I mentioned above, or just as less restricting
>>> concurrent collection as preserving sequence of elements could be a complex
>>> and resource consuming task in mutable concurrent collections
>>>
>>> As you might see, almost all of those indeed converge to a sorted
>>> collection of duplicating elements. I am not sure If there is much more to
>>> it, but what i would like to say is that feature like this would definitely
>>> not be redundant for "enterprise" language like Java, where optimizing some
>>> operations from O(n) or even O(n * log n) to O (log n) could introduce
>>> significant performance improvement, as well as reducing amount of possible
>>> bugs by taking on some of responsibilities related to preserving order of
>>> elements sorted.
>>>
>>> вс, 21 апр. 2024 г. в 00:33, Holo The Sage Wolf <holo3146 at gmail.com>:
>>>
>>>> Let me prefix this by saying that I'm a mathematician, so maybe the
>>>> lingo I use is a bit different. A multi set has no structure apart from
>>>> "counting duplication", just like a set has no structure at all, ordered
>>>> set **is not a set** and ordered multi set **is not a multi set**.
>>>>
>>>> With the clarification you gave I understand what you mean and retract
>>>> my first question (although I believe that "sortedX" or "priorityX" are the
>>>> naming Java has for what you called "orderedX").
>>>>
>>>> > THe whole point of the bag is to store duplicate elements without
>>>> preserving order, which allows things like ordered collections.
>>>>
>>>> No, the whole point of the bag is to store elements. The difference
>>>> between a bag and a list is that a bag has less, anything you can do with a
>>>> bag you can do with a list.
>>>>
>>>> The benefit of having a bag is to have a more restrictive interface for
>>>> the times you don't need the extra structure, as well as the possible
>>>> performant implementation that doesn't need to respect a stricted contract.
>>>>
>>>> CopyOnWriteArratList is more than just a bag.
>>>>
>>>>
>>>> > then that just adds one more use case to potential Bag interface
>>>>
>>>> Indeed, I gave concurrent bag as an example of a possible
>>>> implementation of a bag interface, but you didn't answer my main question,
>>>> just how common is a use if a bag to change the std for it?
>>>>
>>>> To me it sounds like you are interested only on the sorted/navigable
>>>> variations, so an alternative solution will be to improve PriorityQueue to
>>>> have a "subQueue" methods (and friends) as well as extending it to a
>>>> Navigable variation.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, 21 Apr 2024, 00:01 ІП-24 Олександр Ротань, <
>>>> rotan.olexandr at gmail.com> wrote:
>>>>
>>>>> Concurrent Bag is something like CopyOnWriteArrayList or Vector, don't
>>>>> we have it already? (I know vectors aren't optimized). THe whole point of
>>>>> the bag is to store duplicate elements without preserving order, which
>>>>> allows things like ordered collections. There could be mutable collections
>>>>> for concurrency optimized with read-write locks though, if that is what you
>>>>> are referring to as "concurrent bag", and I am missing the point why this
>>>>> can't be a part of list interface, then that just adds one more use case to
>>>>> potential Bag interface.
>>>>>
>>>>> PS: Ordered means it preserves order based on comparator, what you are
>>>>> referring to as ordered essentially means sequential, or am I missing a
>>>>> point?
>>>>>
>>>>> сб, 20 апр. 2024 г. в 23:52, Holo The Sage Wolf <holo3146 at gmail.com>:
>>>>>
>>>>>> By "ordered collection" you mean "unordered collection"?
>>>>>>
>>>>>> How common is it to actually use a multiset in general? Of course
>>>>>> there are use cases, and there are areas in which it is used a lot, but I
>>>>>> don't believe it is common enough to be part of the std with the one
>>>>>> exception of concurrent bag.
>>>>>>
>>>>>> On Sat, 20 Apr 2024, 23:25 ІП-24 Олександр Ротань, <
>>>>>> rotan.olexandr at gmail.com> wrote:
>>>>>>
>>>>>>> In this letter I would like to express some of my thoughts regarding
>>>>>>> the potential Multiset interface.
>>>>>>>
>>>>>>> I, personally, have encountered a few situations where such an
>>>>>>> interface could come in handy, mostly when I needed an ordered collection
>>>>>>> that permits duplicates. That time I used guava`s TreeMultiset, but I think
>>>>>>> Java itself should have such a collection in its std library. While it is
>>>>>>> not a very common problem, there are still a bunch of use cases where such
>>>>>>> things could come in handy.
>>>>>>>
>>>>>>> I am willing to take on development of such thing, but there are a
>>>>>>> few concerns about Multiset:
>>>>>>>
>>>>>>> 1. Is there any other use for this besides ordered collection that
>>>>>>> permits duplicates? I can't remember anything else from the top of my head.
>>>>>>>
>>>>>>> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet
>>>>>>> class hierarchy, while not being directly connected. I think introducing
>>>>>>> any other ordered collection will require some refactoring of existing
>>>>>>> collection interfaces (for example extract SortedCollection from SortedSet,
>>>>>>> Navigable Collection from NavigableSet etc.). From the perspective of clean
>>>>>>> code, this would be the right decision, but I feel like this will be a very
>>>>>>> complex task to accomplish.
>>>>>>>
>>>>>>> 3. Maybe there should be few versions of Tree collection (for
>>>>>>> example, for regular tasks red-black tree and B-Tree for large amounts of
>>>>>>> data). I have some expirience implementing both, but is it really needed in
>>>>>>> standard library?
>>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240421/e006929e/attachment-0001.htm>
More information about the core-libs-dev
mailing list