Giter VIP home page Giter VIP logo

gamayun's People

Contributors

dominikh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gamayun's Issues

Be quicker at unchoking new peers

We choke/unchoke peers every 10 seconds, based on an choking algorithm. However, I don't think it makes sense to wait 10 seconds to unchoke peers if we still have free slots. We should unchoke them as soon as they express their interest.

Provide CLI tool for showing information from torrent file

Basically, provide a tool similar to transmission-show, but add support for structured output and accessing all information, including the source tag. This tool will be part of how to use gamayun. For example, to seed torrents from directories based on the tracker, one could do something like gamayun --add foo.torrent --dir /seed/$(gamayun-info --source foo.torrent)

Support protocol encryption

On the surface, encryption > no encryption. However, BitTorrent's encryption is rather useless and more akin to obfuscation; it's not very secure. It also seems to fail at obfuscation:

Analysis of the BitTorrent protocol encryption (a.k.a. MSE) has shown that statistical measurements of packet sizes and packet directions of the first 100 packets in a TCP session can be used to identify the obfuscated protocol with over 96% accuracy.

It would seem that encryption would be a "feel good" feature, without practical use.

Work around bad traffic accounting in trackers

Traffic accounting in bittorrent trackers is a hacky mess and has a tendency to break.

One series of events that has been observed to cause issues is this:

  1. peer_id=1, event=started, downloaded=0
  2. peer_id=1, event="", downloaded=5
  3. peer_id=1, event=completed, downloaded=10
  4. a day of no internet connectivity while the client continues to run
  5. peer_id=1, event="", downloaded=10

Some trackers seem to interpret event 5 as a new download, with 10 bytes downloaded, causing a total of 20 bytes downloaded across two downloads.I don't know why they would do that, as opposed to rejecting the announce since, if the previous session has timed out, there is no matching event=started announce, but it's the only explanation I could come up with for behavior I have observed, where a multi-hour internet outage resulted in doubled download stats.

We should probably stop retrying announces after a while and consider the session closed. The next announce, then, would be a new event=started announce, with downloaded=0. This will cause us to under-report some downloaded, but only the amount between the last successful announce and last failed announce. This will generally be much less than what we can over-report, namely the entire torrent.

We might also want to immediately stop and then start a torrent that has finished downloading, to ensure a fresh session with downloaded=0.

Investigate using io_uring for disk I/O

We don't want to maintain an in-process read cache, as that just duplicates what's already in the kernel's page cache. We also want to let the OS handle readahead for us. However, in the worst case, we'll end up doing 16 KiB random reads, which will spend a significant amount of time in syscall overhead. See if io_uring might help with this. It would also reduce the number of threads we'd need.

Support fast resume

At a minimum support our own form of fast resume. We might also want to support libtorrent's format to make migration to our client easier.

Allow flexible mapping between files in a torrent and files on the file system

Allow specifying per-file where to download to/seed from. For a single torrent file with the files a/a.ext and a/b.ext, it should be possible to back them with /whatever/cool.ext and /somewhere_else/not_so_cool.ext. This mapping should be changeable at runtime, especially after a torrent has finished downloading and has been moved.

Load metainfo lazily

In a seed-centric workload with lots of torrents, keeping all metainfo in memory at all times is a waste of memory. The majority of torrents will be idle and not need to know more than their infohash. Even active torrents won't need piece hashes if they're just seeding. We should load metainfo lazily, and only load the subset of metainfo that we need.

Add torrent priorities

We have P available peer connections and t active torrents. By default, each torrent will be able to use P/t connections (but may use more if other torrents use fewer.) However, we may want to prioritize some torrents over others. There are two ways to do this, and we probably want both of them:

  1. Give individual torrents a bigger share of connections. Each torrent will have a factor f in [0, ∞] defaulting to 1. The minimum peers per individual torrent will then be f / ∑(all factors) · P. The user could have a set of such factors, like 1, 5, 10, ... and give them names such as "normal", "medium", "high"... Ultimately, the choice of factors determines just how starved a torrent can be in relation to other torrents.
  2. Assign priority groups to torrents, and allocate connections to groups in order of priority, allowing more important groups to use up all connections, leaving none for lower priority groups. Within a group, factors would apply, but only consider torrents within the group.

The same approaches can be extended to upload/download slots.

Implement flexible connection limit

A static per-torrent peer limit is quite wasteful.

If we're happy to have 100 peers globally, and we have only one active torrent, then that torrent should be able to have 100 peers. Instead, have a global peer limit, a limit of active torrents, and dynamically adjust per-torrent limits, possibly disconnecting peers from busy torrents when new torrents become active.

The same algorithm probably works for incoming and outgoing connections.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.