I was thinking about incorporating rayon in certain sections of the code behind a feat

I agree with your points. Here is my plan prepare a separa

1. is started in <a class="issue-link js-issue-link"

Refactoring & Optional data concurrency about counter-rs HOT 4 OPEN

coriolinus commented on May 26, 2024

Refactoring & Optional data concurrency

from counter-rs.

Comments (4)

coriolinus commented on May 26, 2024 1

Sounds good. With regard to benches, I'd recommend criterion instead of nightly benches. I want to expose the entire surface of the library, including benchmarking, to end-users without requiring a nightly toolchain.

from counter-rs.

coriolinus commented on May 26, 2024

Refactoring the code is fine, to an extent. This is very much a style thing, and I don't want to pre-approve anything that I'll regret later on. That said, I agree that 2k lines is too big for a well-factored source file.

I'm willing to look at a PR adding Rayon, but that PR should include some benchmarks showing at what data magnitude the feature is justified, and documentation exposing that information to end users. Rayon is sometimes a game-changer, but you can't just naively throw it at problems and expect to see an improvement.

FWIW: In the context of

    let mut items = self
        .map
        .iter()
        .map(|(key, count)| (key.clone(), count.clone()))
        .collect::<Vec<_>>();

I would be extremely surprised to discover that par_iter performed better under any circumstance. That said, I'm willing to be convinced by a well-crafted benchmark.

Either way, a refactor is a wholly separate concern from adding feature-gated Rayon support, so these should be two distinct PRs.

from counter-rs.

chris-ha458 commented on May 26, 2024

I agree with your points.
Here is my plan

prepare a separate PR for refactoring. A lot of it will be stylistic choices, and finding an acceptable solution will help me understand what you envision for the repo.
when 1. is done, add benches against the current version of the code. It is likely that this will be a separate PR since there are some opinionated choices to be made such as nightly #[bench] vs criterion. The data i am planning to handle are in million/billion scale (LLM dataset deduplication/counting) I'll likely add benches that go that high.
Prepare a rayon PR that includes rayon integration and documentation, with further benches as identified as necessary.

from counter-rs.

chris-ha458 commented on May 26, 2024

1. is started in #36

from counter-rs.

Refactoring & Optional data concurrency about counter-rs HOT 4 OPEN

Comments (4)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent