Giter VIP home page Giter VIP logo

Comments (4)

coriolinus avatar coriolinus commented on May 26, 2024 1

Sounds good. With regard to benches, I'd recommend criterion instead of nightly benches. I want to expose the entire surface of the library, including benchmarking, to end-users without requiring a nightly toolchain.

from counter-rs.

coriolinus avatar coriolinus commented on May 26, 2024

Refactoring the code is fine, to an extent. This is very much a style thing, and I don't want to pre-approve anything that I'll regret later on. That said, I agree that 2k lines is too big for a well-factored source file.

I'm willing to look at a PR adding Rayon, but that PR should include some benchmarks showing at what data magnitude the feature is justified, and documentation exposing that information to end users. Rayon is sometimes a game-changer, but you can't just naively throw it at problems and expect to see an improvement.

FWIW: In the context of

    let mut items = self
        .map
        .iter()
        .map(|(key, count)| (key.clone(), count.clone()))
        .collect::<Vec<_>>();

I would be extremely surprised to discover that par_iter performed better under any circumstance. That said, I'm willing to be convinced by a well-crafted benchmark.

Either way, a refactor is a wholly separate concern from adding feature-gated Rayon support, so these should be two distinct PRs.

from counter-rs.

chris-ha458 avatar chris-ha458 commented on May 26, 2024

I agree with your points.
Here is my plan

  1. prepare a separate PR for refactoring. A lot of it will be stylistic choices, and finding an acceptable solution will help me understand what you envision for the repo.
  2. when 1. is done, add benches against the current version of the code. It is likely that this will be a separate PR since there are some opinionated choices to be made such as nightly #[bench] vs criterion. The data i am planning to handle are in million/billion scale (LLM dataset deduplication/counting) I'll likely add benches that go that high.
  3. Prepare a rayon PR that includes rayon integration and documentation, with further benches as identified as necessary.

from counter-rs.

chris-ha458 avatar chris-ha458 commented on May 26, 2024

1. is started in #36

from counter-rs.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.