Giter VIP home page Giter VIP logo

Comments (8)

josecelano avatar josecelano commented on September 10, 2024 1

Other potencial solutions that can be combined from ChatGPT:

To improve the efficiency and speed of importing statistics from your BitTorrent Tracker to your BitTorrent Index, especially given the large scale of torrents and the potential mismatch between the two, you can consider several strategies. Each of these methods leverages different aspects of system design and algorithmic efficiency:

1. Batch Processing with Enhanced API

  • Idea: Enhance the Tracker's API to support batch processing. This means enabling the API to accept requests for statistics of multiple torrents at once, rather than one at a time.
  • Implementation: Update the Tracker's API to allow for batch requests. You can set a reasonable limit on the number of torrents per request to balance load.
  • Advantages: Reduces the number of API calls drastically, decreasing network overhead and API load.

2. Differential Update Strategy

  • Idea: Instead of importing all torrent statistics every hour, determine which torrents have likely changed and update only those.
  • Implementation: Implement a mechanism (like a timestamp or a change log) on the Tracker to identify torrents that have been updated since the last import. The Index then requests statistics only for these torrents.
  • Advantages: Significantly reduces the amount of data transferred and processed.

3. Using Bloom Filters

  • Idea: Use Bloom filters to quickly check if a torrent in the Index is also in the Tracker, and update statistics accordingly.
  • Implementation:
    • The Tracker maintains a Bloom filter of all its torrents.
    • The Index queries this Bloom filter to check if its torrents are in the Tracker before making an API call.
    • Implement a scheduled task to update the Bloom filter periodically.
  • Advantages: Bloom filters are space-efficient and fast for membership checking, which reduces unnecessary API calls for torrents not in the Tracker.

4. Webhooks or Push Mechanism

  • Idea: Instead of pulling data from the Tracker, have the Tracker push updates to the Index.
  • Implementation: Implement a webhook system in the Tracker that sends updates to the Index whenever torrent statistics change.
  • Advantages: Real-time updates and reduced load on both systems, as data is transferred only when there is a change.

5. Database Replication or Shared Access

  • Idea: If both services can securely access a shared database, replication or direct querying could be used.
  • Implementation: Implement a shared database or a replication mechanism where the Tracker updates a shared dataset that the Index can read directly.
  • Advantages: Eliminates the need for API calls and allows the Index to directly query the data it needs.

6. Caching and Incremental Updates

  • Idea: Cache statistics in the Index and only request updates for those statistics.
  • Implementation: Implement a caching layer in the Index that stores the most recent statistics. Periodically, request updates for these statistics from the Tracker.
  • Advantages: Reduces API calls to only those torrents whose cached data is out of date.

Conclusion

Each of these strategies has its advantages and can be combined for optimal performance. Given that both services are implemented in Rust, you can leverage Rust's performance and concurrency features to implement these solutions efficiently. The choice of strategy will depend on your specific requirements, such as the frequency of updates, the volume of data, and the infrastructure setup of your Tracker and Index.

from torrust-index.

josecelano avatar josecelano commented on September 10, 2024

We have now 9945 torrents on the live demo and we are having this nice pattern:

image

I suppose it's the importer running every hour.

from torrust-index.

josecelano avatar josecelano commented on September 10, 2024

Hi, today @da2ce7 and I were discussing a solution. @da2ce7 proposed:

  • Make sure we don't overlap executions of the importer. I think the current solution does not overlap executions. It simply waits one hour after finishing the importation. If the process takes less than 1 hour statistics will be imported every hour. If the process takes longer, for example, 3 hours, then statistics will be imported every 4 hours (3 hours for the process + 1 hour waiting for the next tick). I guess the intention was to update statistics at least once in an hour, right @WarmBeer? In that case, maybe we could just wait the remaining time between 1 hour and the duration of the importation process. @WarmBeer @da2ce7 ?
  • We can also improve the logs. See #468 (comment)
  • We should also take advantage of threads. @da2ce7 proposed to import a batch of torrents at the same time using tokio-spawned tasks. We are making a single request per torrent. In the future, we could add a new endpoint to the tracker to get the statistics for more than one torrent at the same time.
  • We should use pagination for the query to get all torrents from the Index. We can take for example 50 and import them in parallel. And then continue with the next page (order by the implicit DB table order).
  • If there is a problem with the tracker connection while importing a torrent, the current behavior is just resetting statistics for the torrent in the Index and trying again in the next importation.
        let interval = std::time::Duration::from_secs(torrent_info_update_interval);
        let mut interval = tokio::time::interval(interval);

        interval.tick().await; // first tick is immediate...

        loop {
            interval.tick().await;

            info!("Running tracker statistics importer ...");

            if let Err(e) = send_heartbeat(importer_port).await {
                error!("Failed to send heartbeat from importer cronjob: {}", e);
            }

            if let Some(tracker) = weak_tracker_statistics_importer.upgrade() {
                drop(tracker.import_all_torrents_statistics().await);
            } else {
                break;
            }
        }

Relates to: #468 (comment)

from torrust-index.

josecelano avatar josecelano commented on September 10, 2024

It seems that as the number of torrents increases the server is having more problems importing all the statistics in 1 hour.

image

We have now 16823 torrents.

from torrust-index.

josecelano avatar josecelano commented on September 10, 2024

We have 27570 torrents in the live demo.

image

image

It seems the server is always busy finally.

image

from torrust-index.

josecelano avatar josecelano commented on September 10, 2024

By the way @WarmBeer @da2ce7 @mario-nt what do you think about my Proposed solution 2: import statistics on the fly.

We are importing statistics for all torrents every hour even if maybe we do not need them because we do not have users in the Index or the users are only interested in 20% of the torrents. As far as I know we only show the statistics on the list and details page.

We could:

  • Remove statistics from the Index database.
  • Import statistics only when we need them (list and detail pages).
  • We can use a in-memory cache valid for one hour. Before fetching them from the tracker we check if we have fresh data in the cache (less than 1 hour).

Pros:

  • If the Index does not have users we don't overload the Tracker.
  • We only get the data we use. If only 20% of the torrents are listed or viewed in detail we only import statistics for those torrents.

Cons:

  • Assuming a high load in the Index and users interested in 100% of the torrents equally distributed, it takes longer to build the response because it's not a direct SQL query. But this should not be too slow.

from torrust-index.

mario-nt avatar mario-nt commented on September 10, 2024

I liked the idea of sharing a database, but that option couples the tracker and the index completely.

My favourite one is number 4: Webhooks or Push Mechanism, as that way, we only push the data once it is updated, but it may still be slow if there are a lot of changes.

And bloom filters look good too.

from torrust-index.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.