Comments (4)
Sounds good. With regard to benches, I'd recommend criterion
instead of nightly benches. I want to expose the entire surface of the library, including benchmarking, to end-users without requiring a nightly toolchain.
from counter-rs.
Refactoring the code is fine, to an extent. This is very much a style thing, and I don't want to pre-approve anything that I'll regret later on. That said, I agree that 2k lines is too big for a well-factored source file.
I'm willing to look at a PR adding Rayon, but that PR should include some benchmarks showing at what data magnitude the feature is justified, and documentation exposing that information to end users. Rayon is sometimes a game-changer, but you can't just naively throw it at problems and expect to see an improvement.
FWIW: In the context of
let mut items = self
.map
.iter()
.map(|(key, count)| (key.clone(), count.clone()))
.collect::<Vec<_>>();
I would be extremely surprised to discover that par_iter
performed better under any circumstance. That said, I'm willing to be convinced by a well-crafted benchmark.
Either way, a refactor is a wholly separate concern from adding feature-gated Rayon support, so these should be two distinct PRs.
from counter-rs.
I agree with your points.
Here is my plan
- prepare a separate PR for refactoring. A lot of it will be stylistic choices, and finding an acceptable solution will help me understand what you envision for the repo.
- when 1. is done, add benches against the current version of the code. It is likely that this will be a separate PR since there are some opinionated choices to be made such as
nightly
#[bench]
vscriterion
. The data i am planning to handle are in million/billion scale (LLM dataset deduplication/counting) I'll likely add benches that go that high. - Prepare a
rayon
PR that includes rayon integration and documentation, with further benches as identified as necessary.
from counter-rs.
1.
is started in #36
from counter-rs.
Related Issues (14)
- Use defaultmap crate HOT 3
- Most common should take a number HOT 2
- Add/Sub clone self unnecessarily HOT 7
- is_subset / is_superset methods? HOT 8
- impl<T> FromIterator<(T, usize)> for Counter<T>? HOT 2
- Suggestion: add 'multiset' and 'bag' tags to repository, README and Cargo.toml
- Suggestion: make Counter generic over the hashing algorithm HOT 2
- min count + into HOT 2
- Is the `zero` instance member still required?
- Counter doesn't derive Debug HOT 1
- `impl PartialOrd` via `is_{sub,super}set` HOT 1
- Make Counter generic over the count type? HOT 5
- Add into_hashmap function
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from counter-rs.