Comments (4)
Sorry that was a typo, I forgot the logarithm.
One implementation is to keep a min heap of fixed size k. Insert every (item, count) pair into it and remove the minimum whenever the heap size exceeds k. Inserting and removing in this heap is O(log k) since its fixed to size k. Handling ties is tricky in this implementation but can be done.
Another implementation is to heapify all items into a max heap, and then pop the max until we're done. This approach makes it easier to handle ties, since they come at the end. Heapify is O(n). Popping the max from a max heap is O(log n). So overall, this approach is O(k log n).
from eclipse-collections.
the topOccurrences() method has an inefficient implementation that ought to be optimized. The current implementation sorts, as you said, making it O(n log n). The selection algorithm is O(n) and sorting would take O(k log k) in the absence of ties.
from eclipse-collections.
I took a look at Python's most_common
and it takes the front element from a heap k times, making it O(n * k). This is a nice compromise in that it's faster than our implementation, but still pretty easy to implement.
from eclipse-collections.
I took a look at Python's
most_common
and it takes the front element from a heap k times, making it O(n * k). This is a nice compromise in that it's faster than our implementation, but still pretty easy to implement.
That's only decent if k < log(n)
. For k near n, it's impossible to do better than sorting (simply because the result has to be sorted) and we don't want O(n^2). So the algorithm needs to be smart based on k vs n.
I wonder how bad it would be to stuff the first k elements in a tree, then remove the worst one as new ones are added (the tree would have to be a list at each node to contain the equals).
from eclipse-collections.
Related Issues (20)
- Update Eclipse Tycho and replace EBR dependency with Orbit HOT 28
- Add a bill of materials (BOM) POM HOT 2
- Why do primitive versions of interfaces do not implement java.util interfaces ? HOT 3
- MultiReaderFastList returning incorrect types for select, selectWith, reject, rejectWith HOT 1
- Unexpected behaviour in updateValue() with empty key HOT 2
- Put AbstractImmutablePrimitiveSetTestCase template in the right folder
- Update the copyrights HOT 1
- Bug in UnifiedMap? HOT 1
- eclipse-collections v12 JDK compatibility support HOT 4
- Interval types do not support reverse ranges for certain helper methods HOT 2
- Add groupByEachUniqueKey to RichIterable
- [OSGi] Opting in to Service Loader Mediator HOT 3
- can not found the GMF HOT 1
- Migrate tests from Junit4 to Junit5
- Perf Issue: toImmutable[List/Set/Bag] methods should be overridden on LazyIterable
- Upgrade Maven version for CodeQL GitHub Action build to 3.9.6
- Fix performance problem in MutableList.subList() / implement similar optimization as ArrayList.subList() HOT 5
- Implement ArrayStack distinct HOT 1
- Maven install failure: java.util.ConcurrentModificationException in Eclipse Collections OSGi Bundle HOT 1
- Optimize `clear` method in `SubList` class
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from eclipse-collections.