Giter VIP home page Giter VIP logo

bittorrent.org's Introduction

This is the repository backing www.bittorrent.org.

It primarily consists of BitTorrent Extension Proposals (BEPs), documenting various aspects, common practices and proposed standards of the BitTorrent protocol.

bittorrent.org's People

Contributors

arvidn avatar bommuraj2012 avatar casey avatar ccbrown avatar dessalines avatar dpiers avatar feross avatar gubatron avatar jech avatar jknollbt avatar jzelinskie avatar lmatteis avatar louhong avatar mct avatar mindless2112 avatar mpeklar avatar pushrax avatar r0ro avatar sca-bittorrent avatar skywalkerd avatar some1namednate avatar ssiloti avatar the8472 avatar xercesblue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bittorrent.org's Issues

metadata terminology and implementation of glossary

I have found the terminology of metadata, metainfo, and torrent file being used interchangeably and rather confusing. I would like to propose that the documentation be updated to be consistent across the BEPs to help new readers and searching. Perhaps also a glossary started to aid reference.

I have found the following seems to be the defacto standards with metadata being used interchangeably with metainfo in some BEPs:

  • metainfo file: A bencoded dict with keys 'info' and 'announce'. A torrent file.
  • ???: The bdecoded metainfo file dictionary. Perhaps just metainfo.
  • metadata: The bencoded info dictionary of metainfo. The data requested by magnets.
  • info dictionary or infodict: A dictionary containing the details of the torrent files and pieces (bdecoded metadata).

I am not quite sure yet what to call decoded metainfo but probably settle on just metainfo since it often referenced as metainfo file and the file needs to be decoded to be usable.

For reference these are this is a BEP that uses metadata where metainfo is likely meant:

BEP12 Multitracker Metadata Extension

I can create a PR if a decision is agreed upon :)

Edit: reworded and clarified

Design issues with BEP42

I have found some issues in BEP42.

Once enforced, write tokens from peers whose node ID does not match its external IP should be considered dropped.

This is somewhat ambiguous. Context clarifies it but it would be better to phrase it as follows:

Once enforced, responses to get_peers requests whose node ID does not match its external IP should be considered to not contain a token and thus not be eligible as storage target. I.e. implementations should take care that they find the closest set of nodes which return a token and whose IDs matches their IPs and announce to those.

Additionally the prescribed behavior does not prevent an attacker who has colluding nodes along the lookup path to effectively short-circuit the lookup so that the terminal set consists of non-compliant nodes. This might cause implementations that have lax "is lookup done" conditions to store their announces on far-away nodes and thus rendering this BEP ineffective.

So it's probably necessary to also mandate stricter termination conditions for a lookup.

Additionally the transition mechanism should be improved by stating:

Enforcing nodes MAY also announce to non-compliant nodes as long as they also announce to the closest set of compliant nodes.

Why no state machine specifications?

I am preparing a BEP which I feel can only be well documented by means of a fully specified extended state machine (e.g. UML behavioural state machine diagram). At the same time, I have noted that no prior BEPs, in particular not even BEP3, provides this. Is this discouraged for some reason, or do BEPs have to be fully prose/text?

publish RFC for bittorrent protocol

Hi,

I can't find any RFC for BitTorrent protocol and its extensions.
As someone new to BitTorrent I find it difficult to find an aggregated version of BitTorrent and it's extensions.

What would you think of the idea of publishing an RFC for:

  • Bencode
  • BitTorrent
  • DHT

Encrypted Torrents

Users/Developers often bring up the private flag when discussing the distribution of non-public data. I think it's ill-suited for that purpose and encrypted torrents would be a better choice to use public infrastructure (pex/trackers/dht) and simply keep the data confidential.

@arvidn mentioned µT's encrypted torrent feature on SO but there's no spec for it. And I couldn't even find a feature description or information on its security properties.

It would be nice if at least a bare-bones sketch of how it works could be provided so we can derive a spec from it. I want to avoid reverse-engineering it.


Now, disregarding existing implementations, the following is what I would want to put into a spec:

obvious must-haves:

  • storage layer encryption
  • seekable cipher
  • encryption of file names

kinda important:

  • storing the files list as an opaque, encrypted block so individual file sizes can't be discerned. For backwards compatibility / storing data in encrypted form there can still be one flat file spanning the full length.
    while file sizes aren't totally unique they could still allow an observer to infer what files are being shared if multiple files/their metadata are also publicly available. privacy-conscious users could even include length-padding files once the lengths are obscured in the metadata.
  • suggestion for client devs to strip all non-essential parts from the torrent on creation

nice to have:

  • a separate file name encryption key derived from root key so that webpages could list encrypted file names and client-side decryption (e.g. javascript or browser extension) could decrypt the names without leaking the actual contents in case the file name key gets compromised by the website
  • optional password derivation + per torrent salt instead of random key so users can use a single password for multiple torrents

Transitioning to stronger hash function

With the published collision attack on SHA1 bittorrent may become less attractive for use-cases which don't just want simple integrity-checking (for which it still is perfectly fine) but also authentication.

We should come up with a plan to transition to stronger hashes for all uses, ideally while maintaining backwards compatibility for a transition period.

Thoughts:

  • To reuse existing software/infrastructure we'll have to truncate to 160bits in some places but not necessarily all of them.
  • To avoid bloat up torrents any further the new hashes should probably be exclusively in a merkle tree format to exist alongside with pieces

Scraping isn't documented

While working on chihaya, I was curious what clients are doing with the "downloaded" field in a Scrape. Not only could I not find out how this field was being used, but I discovered that BEP15 (UDP Protocol) is the first BEP to mention even mention scraping at all.

Proposal: Stronger hash function for BEP 47

If I understand BEP 47 correctly, it's possible to add extra file attributes to the files dictionary. That would be very nice for verifying or compare file hashes before downloading the torrent. But it uses SHA1, which is not really safe anymore. I guess it's no problem to swap it with SHA256 or SHA384.

Anyway, is there any known torrent creator which supports this feature? I couldn't find any. :/

Decentralized Mutable Torrents

I've been searching for ways to have users benefit from DHT store (BEP 44) more directly. One use case would be the idea of "decentralized mutable torrents".

  1. User downloads a special .torrent (with a mutable flag inside it). This informs the client to use DHT put/get queries rather than doing normal DHT.
  2. Client then fetches (get) the mutable key from the .torrent (should always be a mutable key) which will point to a infohash (hence another torrent) which is where the actual contents are.
  3. The client starts downloading such content and periodically helps keep the mutable key alive in the DHT (put).
  4. An "auto-download" checkbox/option is displayed so when the mutable key changes, the client leaves the swarm and joins the new one to download the new data.

To me this is the bare minimum to allow for content creators to finally start using DHT store functionality directly, and benefit from its decentralized properties.

From a publishers perspective I imagine this to be really simple: in the clients preferences there could be a "mutable torrents" section which contains a table mapping various mutable keys to torrents that are being seeded. The publisher can accordingly change which torrent maps to which mutable key.

From a consumers perspective, they could add such mutable .torrent (or magnet link) and there could be an "auto download" checkbox/option. Obviously it would function differently from normal torrents, because it would need to DHT get for updates, and change swarms when there're updates. I'm wondering though how the UI for a mutable torrent download would be like; I guess with "auto download" on, you'd have the torrent change automatically if the publisher changed it.

Thoughts?

@the8472 @arvidn

BEP 30 (merkle-tree) improvements

Thinking about PR #28: If we want to migrate to 16KiB pieces and merkle-tree-torrents then i've found that the trees are hostile to BEP 38 and similar file-reuse strategies.

BEP 30 also mentions related caveats:

In theory we can create one swarm. In that swarm, new clients could serve pieces to old clients. For the new clients to benefit from the old clients, however, we need to add some way for the new to obtain the hashes required to check a piece. Designing a fool proof solution for this problem is not trivial.

Because we let the initial seeders recalculate the hash tree, this extension is incompatible with the proposed HTTP Seeding extensions in BEP 17 [4] and 19 [5] .

I think we need to fix this to some extent. I would suggest adding a list of hashes that are somewhere down in the tree, in a way that they roughly align with files, completely falling inside them if the files are large enough. For single-file torrents nothing would change. But multi-file torrents need a way for partial reconstruction that allows clients to generate Tr_hashpiece

Since each file in a torrent comes with its own footprint (path/length) scaling the number of included tree nodes with the number of files should only inflate the metadata size by a constant factor.

The overhead could be squeezed even further by by lumping really small files together, e.g. if a torrent contains one large video file and a directory of hundreds of small image files a user is likely either to have all or none of the images so covering more than one of them in a subhash shouldn't be a big problem for reconstruction either.

Alternatively we could add some subtree_request and subtree_piece so clients that need it could grab a local copy of the whole leaf hash set.

poke @arvidn

BEP 52: please add DontHave message

The DontHave message is the opposite of a Have message: it allows a peer to renege on a piece that was previously announced using Bitmap or Have. It is defined by Arvid here:

https://www.libtorrent.org/extension_protocol.html

DontHave is trivial to implement: sending it is optional, and upon reception you just reset a bit in a bitmap somewhere. It's also essential for peers that discard pieces when they are short on space: if you have discarded a piece, there's nothing much you can do in the core protocol except ignoring requests for that piece (or sending Rejects if you're using the Fast state machine).

I would like to suggest that DontHave should be made a core request of the v2 protocol.

Would a v2-aware DHT allow individual file search?

Having spent a few hours (definitely far too few to be asking this question, sorry) reading BEPs/issues/PRs here, I'm floored by how thorough and thoughtful the BitTorrent ecosystem's design is. This community is super-knowledgeable—therefore I feel very self-conscious asking this beginner question here (I didn't want to ask this as a comment in #59 since it's not a core, BEP 3-level concern).

With the v2 spec per BEP 52 storing the Merkle root hash for each sufficiently-large file, what is the thinking for DHT queries for individual files by their hashes?

That is, I could see peers announcing themselves on the Merkle root hash of each file they're aware of, just like they announce themselves on torrent files' infohashes. Clients would somehow disambiguate between a torrent's infohash and an individual file's Merkle root hash.

Readily-apparent advantages include de-duplicating content—both over the wire, i.e., downloading a common file from multiple swarms, but also on disk, i.e., symlinking identical files when seeding multiple torrents.

Such a function could also enable new use-cases in content-addressed linking (e.g., writing a blog post and including the Merkle hashes of referenced content, in order to increase the post's longevity and resistance to link rot). This would raise BitTorrent's profile among the crowd currently evaluating Dat, IPFS, et al.

But I can also imagine this could place a heavy load on both the DHT and on clients sharing torrents with large numbers of files, so maybe it's not feasible?

I would love any hints on whether the community has discussed this already and reached a conclusion, or if the matter is still in the research phase, or if there are (likely) other DHT+v2 interplay I'm unaware of (or if I'm just hopelessly confused!).

Improve clarification that info-hash is the digest of en bencoding found in .torrent file

There are perceived ambiguities in the wording of BEP3 at:
Oct 20, 2012 clarified that info-hash is the digest of en bencoding found in .torrent file.

We have:
https://github.com/bittorrent/bittorrent.org/blame/master/beps/bep_0003.html#L113

  • Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by an 'e'. For example, d3:cow3:moo4:spam4:eggse corresponds to {'cow': 'moo', 'spam': 'eggs'} and d4:spaml1:a1:bee corresponds to {'spam': ['a', 'b']}. Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics).

And then:
https://github.com/bittorrent/bittorrent.org/blame/master/beps/bep_0003.html#L170

info_hash

The 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. Note that this is a substring of the metainfo file. The info-hash must be the hash of the encoded form as found in the .torrent file, regardless of it being invalid. This value will almost certainly have to be escaped.

Which leads to confusion by developers as exampled by the below issues.

transmission/transmission#129 (comment)

torrent-file-editor/torrent-file-editor#47 (comment)

The referenced BEP3 sections could be integrated more concisely for improved implementation strategies.

Proposal: Reference file path in magnet link

I'm planning on creating a web browser that allows users to load torrents like they would a website.

The general idea is that you can paste a magnet URI into the address bar of the browser, and then view it like you would a website. It'd use the similar semantics of web servers where index.html would get used to load the web page if a specific page wasn't supplied.

However, this raises the question of how one can use paths relative to the magnet link. Ideally, when somebody authors a website to publish on bittorrent, they would expect to be able to use relative paths when referencing resources within the website.

E.g if you have a torrent with the following folder structure:

/index.html
/posts/something.html
/media/example.png

How would one share a link to /posts/something.html?

With BEP 53 in #64 there's now a way to reference file indices within a torrent, but there's no straightforward way to specify a file path.

I propose the addition of an optional fp parameter which stands for file path which will yield URIs like:

magnet:?xs=urn:btpk:examplehere&fp=/posts/something.html

This will help with web pages that have relative URLs which would want to link to other pages, or link to resources like images.

A big problem with this approach is that browsers don't treat the search portion of a URI the way they would an HTTP URL.

For example, if you wanted to use the browser's relative path resolution function with a magnet link, it would break. Using the normal means to get the relative URL , /media/example.png ,of the example above, would result in a malformed magnet link that looks like:

magnet:/media/example.png

Even though it is a relative path to the URI, it breaks the magnet's properties since all the search parameters get stripped. That means that this approach would require fighting the browser engine's relative URL mechanics, which is kind of scary.

Any suggestions for how to accomplish something similar while preserving relative URL functionality would be very much appreciated, but I think it'd also be possible to work around this issue and using a new search parameter name.

Best practices document

Would it make sense to have an informal document, probably not even a BEP, documenting best implmentation practices so that people don't have to rediscover the wheel? Even linking to blog posts, issue trackers and implementations may be appropriate for some points.

The discussion about sanitizing path names in for BTv2 or this unicode LTR confusion thing.

Many questions on stack overflow or IRC also arise from the specs leaving a lot of things up to the implementations.

BEP 5 DHT token

Although the token value used to announce is opaque, the suggested BitTorrent implementation can be problematic:

The BitTorrent implementation uses the SHA1 hash of the IP address concatenated onto a secret that changes every five minutes and tokens up to ten minutes old are accepted.

This means that one a node gets a valid token from a get_peers request, it can then use it to perform an announce_peer request for any info_hash.
A malicious node could use this to fill the targeted node trackers by generating a lot of announce_peer with info_hash close to targeted node id, most likely resulting in dropping genuine announce from other peers.

One easy way to prevent this would be to also include the info_hash from the get_peers request in the SHA1.

BEP 52 - Define client announce behavior when joining both swarms of a hybrid torrent

This is a follow-up to the discussion here (started from this comment).

TL;DR

BEP 52 currently states

For interoperability with BEP 3 a torrent can be created to contain the necessary data for both formats. Implementations supporting both formats can join both swarms by calculating the new and old infohashes and downloading them to the same storage.

It says that a client can join both swarms, but it doesn't specify the way how should the client announce its traffic to the tracker in such a case.

For example let's say a peer uploaded 100 MB in the v1 swarm and 200 MB in the v2 swarm. How must a client announce this to the tracker? I see two possibilities:

a) /announce?info_hash=<v1_hash>&uploaded=100 MB and /announce?info_hash=<v2_hash>&uploaded=200 MB

or

b) Both announces having the same (total) amount of uploaded/downloaded data (uploaded=300 MB in this example)

I'm opening this issue as I think that this should be clearly specified in the BEP as the choice between the two has great impact on both the client and the tracker's implementation. For example for libtorrent, keeping separate stats (the a) option) would add significant complexity to the code. Also obviously, if one client implementation goes with the a) option and another with b) that's gonna cause big headaches to trackers who care about those numbers (basically all private trackers).

These are very good reasons as to why I think that this should be specified in the BEP as I think there should be no room open for different interpretations (aka different implementations).

Individual file hash DHT

Hi, as I'm a programmer myself using torrent softwares for years but new to the torrent protocols (excuse me if I used wrong terminology)..

Problem statement:
One thing I really noticed is how fragmented and redundant torrent swarms have become. Similar file exists in multiple different torrent and seeding one torrent doesn't help another torrent although they have the same file. This leads to the inevitable death of a swarm when seeders eventually leave.

Current solutions:
One solution implemented is swarm merging which users can downloads similar files from different torrent swarms. But this I feel is very clunky and hackish as you'll have to find another torrent with exactly the same file (how to be certain it's the same file? The file could be the same but different name and vice versa) then download it separately.

Another solution is from bitcomet, which from my experience is wonderful but have it's caveats. Once file is downloaded, the file hash is uploaded to their server (the negative) so if another user downloading the same file based on it's hash, the server sends the user to be used as seed. It's done automatically but the protocol is proprietary and no information is released for that.

Proposed solution:
So what I proposed is using/include file hash using/extending/"creating a separate" DHT to find seeders or peers based on file hash. The current system uses infohash of torrent files to search for peers which in my opinion is pretty outdated and needs updating. Each file hash can easily be included in the torrent file.

Preemptive arguments;

  1. Privacy
    Some will argue it will publish to much info on what we're downloading but if privacy is concerned, DHT should not be used anyway. Others can still know what you're downloading based on the infohash.

  2. Heavy traffic
    It'll be heavier than infohash DHT since publishing individual file hash is more.than single infohash. Yes it will, but current technology have advanced since the time DHT is first introduced and the average Internet speed has increased and become cheaper so why not?

Closing statement:
Torrent scene has become very redundant as many files are placed in different torrents and seeded separately leading to rapid deaths of old torrents. In dead torrents, file can still be found elsewhere in another seeded torrents and this solution gives it a second chance to live. Regarding the downsides of my proposed solution, those who have slow connection can opt out, disabling it similar to DHT, but the up side can reduce redundancy in the torrent swarms.

Sorry if this is inappropriately posted here but it's something in my head a long time and needed to share it. I can go out make a new client with this new protocol myself but it's pointless if there's no wide user base or big client names adopting it.

Thanks for reading :)

Proposal: HTTP web seed exchange

HTTP webseed is awesome for keep alive old torrents. And HTTP seeding also. However if link will stop working sometime and there is no seed then we have a problem. I think so easy to solve using the method known from BEP 28 like tracker exchange. Can harvest mirrors as like trackers in search of seeds/peers. this way you could be updated regularly fresh mirrors.

Scenario:

  1. someone download torrent
  2. Simply download straight to the cloud and share link as google drive, dropbox, normal hosting like zippyshare/mediafire for this torrent thanks to PEX.

For example, very old games like NES etc. they are not seed anymore because nobody wants to, just webseed as like archive.org works all time. The same content is available outside torrents on HTTP. It would be a cool solution to something in shape offline mode ;)

BEP 44 immutable put request example lacks token

BEP 44 states "The token field also has the same semantics as the standard DHT message get_peers and announce_peer, when requesting an item and to write an item respectively."

But further down the immutable put request does not contain a token field.

I think the right thing here is to also require a token for immutable puts. Libtorrent already seems to do that (hopefully @arvidn can confirm) and so does my own implementation.

BEP42 and address translation

Thank you for your proposal for mitigating DHT attacks.

As I read the draft, one problem occurred to me: what happens to two nodes on opposite sides of a gateway doing address translation? You mentioned this as a problem during bootstrap, and the difficulty in determining one's own address. But I believe it's worse.

For instance, suppose you have two nodes, one on an IPv4 network, another on an IPv6 network, and a gateway in between them. Neither node will be able to validate messages from the other because the gateway will translate the addresses in the traffic.

Or suppose you have three nodes where two are on a private network behind a NAT, and the other on the public internet. If one of the nodes on the private network chooses a node ID based on its private network address, it will be able to validate messages with the other node on the private network but not to the node on the public network. Conversely, if it chooses its public address, it will validate with the node on the public network but not the private one.

The limit of only 8 node IDs per address is also rather limiting. Speaking as an employee of a company, there are sometimes dozens of people in our office, and all share a single connection to the internet.

Satellite-based ISPs also typically only present a handful of IP addresses to the public internet for all their internal users. If you've done geolocation before when running web or other service, seeing "satellite" in the results is rather common. It's impossible for all those thousands of folks to share a small number of node IDs.

Last, a number of folks use proxies and/or VPNs for privacy. All that traffic appears to come from one address (or a small number of addresses).

How should users/clients in one of these situations choose their node ID?

DHT feeds

Many moons ago we discussed an approach to implement RSS feeds over the DHT. http://libtorrent.org/dht_rss.html and I guess BEP44 spawned from that, but 44 is barely put to use and feeds have never been specified on top of that.

crossposting from LT-discuss:


We already have all the parts that could be used to implement decentralized torrent feeds manually.

All that's needed is some glue to tie it together in a way that more than one client could understand.

I think mutable, signed torrents and push-notifications/gossiping within a swarm have been mentioned in some related thread, but those would be new features essential to get a feed working.

As arvid mentioned the original idea was to have skip lists with a mutable head in the DHT pointing to infohashes.
But I'm a little doubtful that this would scale well. It would probably work but it seems a little brittle or could suffer from inefficiencies.

On the other hand the objection to using the metadata exchange is that it is heavyweight in comparison to a DHT lookup.

I think using the metadata exchange option is preferable for several reasons

a) new entries can be batched if frequent republishing would cause too much churn
b) the size of the most frequently accessed part of the feed (the newest entries) can be kept constant by growing a separate archive torrent that is accessed/needs to be refreshed less frequently
c) Trusting implementers to get republishing for mutable puts right is already a dicy in my opinion.
Trusting them to get efficient republishing large sets of skip list nodes right at scale seems even riskier
d) potential hiccups/failure modes would be more confined to the set of swarms that make a feed
e) swarms can carry larger sets of data. I.e. a meta-torrent can carry full-fledged torrent that also has entries in the outer dictionary (trackers, comments, creation date, other fluff in the future)


So, my proposal would be as follows

  • feeds start from a BEP 44 mutable value
  • value is an infohash
  • subscribers fetch torrent via metadata exchange
  • the torrent is a multifile torrent containing
    a) regular torrents as feed items
    b) various pointers (name + infohash) to related feed torrents (archive of older torrents, subfeeds, whatever). could abuse the filelist with zero-length files for that or put hashes into text files
  • the torrent is not ordered in filesystem/alphabetical order but by ascending creation date
  • subscribers slow-poll DHT. when the mutable value is changed subscribers switch over to the new swarm, attempting to re-use as much of the old data as possible (de-dup should be fairly easy here)
  • subscribers assist in re-publishing the DHT keys and keep swarm alive, publishers are not required to do active maintenance between updates

To make sharing of feeds easy we can introduce a new magnet. e.g. magnet:?xt=urn:btfeed:[...]

The rest would be implementation guidance/best practices to keep overhead low and the contents human-friendly.

All that implementers need for basic support is BEP44 + some UI/utilities for publishers or subscribers.

Standalone DHT-to-RSS adapters would also be possible.


I'm willing to implement a prototype + write BEP draft if there is some inclination/interest by other devs to adopt.

PEX failure on mostly-seed swarms

There is a sub-optimal interaction between PEX and the disconnect-seeds-while-seeding feature that many clients implement.

If a swarm is dominated by seeds then the seeds will not connect to each other and thus do not know about other seeds, which means they cannot tell peers about other seeds. And since the swarm is dominated by seeds it is also less probable that they have any downloaders that they could gossip about.

This is a problem for IPv6 deployment since many trackers still are IPv4-only, so learning about v6 peers via PEX from dualstack nodes is an important way to get around CGNAT.

I think a possible mitigation is to leave seed-seed connections open if the maximum connection limits of the client are not reached, which is more likely to be the case on heavily seeded swarms.

Of course this requires both sides to be less aggressive about the disconnection behavior, so this will only be really effective once it is rolled out by several major clients.

Clarify if BEP 5 `nodes` metainfo key can contain IPv6 addresses

This is quite minor, but I'm writing a metainfo file creator, and it's not clear to me whether or not the BEP 5 nodes metainfo key can contain IPv6 addresses.

I assume it can, but on the other hand, it might break existing clients that don't expect an IPv6 address.

I also checked out BEP 32, since that relates to both the DHT and IPv6, but it doesn't mention the nodes key.

If IPv6 is indeed permitted, perhaps the example could be changed to:

nodes = [["<host>", <port>], ["<host>", <port>], ...]
nodes = [["127.0.0.1", 6881], ["your.router.node", 4804], ["2001:db8:100:0:d5c8:db3f:995e:c0f7", 1941]]

The above address is a random address in 2001:db8::/32, which is reserved for examples in documentation, and a random port.

If this is correct and acceptable, I'd be happy to submit a PR.

the announce logic in BEP 7 should be changed

As far as I can tell, all implementations have (perhaps implicitly) agreed not to implement (EDIT: the &ipv4= and &ipv6= announce parameters from) BEP 7, but instead always announce once per source interface.

I personally support rejecting updating BEP7 and possibly replacing it with a BEP describing the expected announce-logic for clients with multiple external IP addresses/interfaces.

@ssiloti @the8472 thoughts?

Private magnet links

I'd like to propose adding a &private=1 parameter to magnet links, to be friendlier to private trackers.

In general private trackers distrust magnet links, though the reasons aren't always clear. It is possible to use them though, and I'm a member of at least one private tracker which does.

The normal usage seems to be that the magnet links will always contain a &tr= parameter (the tracker URL will have embedded "passkey"-style auth info). A client will connect to the tracker for a list of peers, retrieves the metadata from the peers using the ut_metadata extension, and downloads the torrent normally.

However I notice that my client (deluge 1.3.15 + libtorrent 1.1.4) will still do DHT searches for peers before the metadata is downloaded. This makes sense, although all private trackers enforce that the private flag is set in the infodict, the client won't see this until peers are connected and the infodict is transferred, so it couldn't know the torrent is private.

This leaks information outside the domain of the tracker, which is quite bad. At the least, it's a situation ripe for undefined behavior (would the client disconnect DHT-sourced peers on discovering the torrent is private?). In practice I do see DHT peers are sometimes discovered and connected before the metadata is downloaded, as some private torrents do get reposted publicly. In the theoretical nightmare case, DHT crawlers could use this to leech data, or expose tracker members. In my experience theoretical nightmares are the favorite pastime of tracker administrators.

So I propose the &private=1 parameter to magnet links having the same semantics as BEP 27, i.e. the client may only connect peers returned from the named tracker for metadata transfer. It's an error for &private=1 to be present without &tr=.

I'd also lke to define the behavior that when a torrent is initialized as public without metadata (i.e. magnet link with no &private=1) but is later discovered via metadata to be private, the client should mark the torrent with an error, and ... attempt to undo the damage somehow. I'd like to say the client should disconnect from all peers that are not sourced from the tracker, but I'm not sure this does any good, since new peers from outside the tracker may establish incoming connections. I'd love some input on this part.

(It's true that suddenly requiring &private=1 for private magnet torrents would disrupt the workflows of private trackers who use magnets, but my guess is they would welcome the tradeoff of disruption for leak protection)

BEP 52: MUST close the connection

If an unsolicited piece is received a peer MUST close the connection.

I think this is too strong, since it assumes too much about the peer's implementation. In particular, it means that the peer cannot just fire a Cancel and forget, it must keep it in memory until it receives either Piece or Reject.

I suggest this should be a SHOULD.

Clarify BEP 42

BEP 42 says:

It is important that the ip field is in the top level dictionary. Nodes that enforce the node-ID will respond with an error message ("y": "e", "e": { ... }), whereas a node that supports this extension but without enforcing it will respond with a normal reply ("y": "r", "r": { ... }).

It is not clear under what conditions the enforcing node is supposed to respond with an error or normal reply. Is this suggesting to respond to any query that doesn't provide ip with an error message?

cc: @arvidn

Peers should specify a reason before closing connection

Currently, a peer can close the connection anytime due to various reasons. It's not possible to understand why the remote end has closed the connection. This makes very hard to write a client cooperating nicely with other clients.

I propose adding a new message type to the protocol (probably should be implemented via BEP 10):
If a client has enabled close_reason extension, the remote side should send an extension message specifying a close code (integer) and reason (short string).

A similar mechanism exists in WebSocket protocol:
https://tools.ietf.org/html/rfc6455#section-5.5.1

Some of the close reasons for WebSocket protocol can be seen in:
https://developer.mozilla.org/en-US/docs/Web/API/CloseEvent

@ssiloti what do you think? Should I write this properly in BEP format?

Proposal: protocol scheme for urls

Description

BitTorrent is great for sharing archives of static content through clients that download them to a folder, or render a single file from the folder in the case of videos. Magnet links have made the flow of creating a torrent and sharing it with people dead simple.

However, magnet links are limited in that they only use the search portion of a URL and don't allow you to specify a file or specify a hostname to navigate through. This is an essential feature for getting torrents to be loaded in web browsers and interoperating with their idea of what an "origin" is.

Prior / Existing art

IPFS is similar to BitTorrent in that it is a P2P network for transferring files. They developed their own ipfs:// and ipns:// protocols for use in browsers and link sharing. This is used in their various applications including ipfs-companion which enables people to view websites published on IPFS.

The bittorrent: protocol proposed on the ietf mailing list in 2010. It's an interesting start, but it ended up trying to do things like connecting to initial peers and wasn't as good as magnet links. (IMO) It also doesn't account for new features like BEP 46 which allows specifying a public key in a magnet URI instead of an infohash which allows people to publish mutable torrents on the DHT.

My Webrun project for which I created btih:// and btpk:// protocol schemes in order to load JS files from a torrent (from an infohash or public key). I modeled this after IPFS by having different schemes for the mutable and immutable torrents, but I'm not sure if this is the best way to go for users. I chose btih and btpk because that's what's used in the URNs for magnet links.

wtp-webext by @tom-james-watson is a Firefox extension that lets you load torrents as websites in a browser using WebTorrent and the experimental libdweb API in Firefox. He's using the wtp:// scheme to stand for web torrent protocol, and is planning on supporting DNS resolution for infohashes through the same way IPNS and Dat do it.

Design

The URL should support:

  • Referencing infoHashes as the origin for the URL
  • Supporting file paths into the torrent through the path of the URL
  • Referencing public keys for BEP 46 mutable torrents
  • Referencing domains that get resolved through DNS

Not sure if it's necessary to support all the features of a magnet URI like trackers. My reasoning is that a browser should primarily rely on a DHT / local discovery / PEX rather than trackers since it's more decentralized.

Protocol name suggestions

  • bittorrent://
  • bt://
  • wtp://
  • ???

I wanted to create this issue to get feedback on the idea and to figure out what would work. My goal is to have some sort of URL figured out by the summer time and to have at least a couple of applications that are able to torrents with it.

Proposal: Document the ut_holepunch extension

ut_holepunch is mentioned in BEP-11 (Peer Exchange), but it does not appear to be documented anywhere. An implementer who wants to implement the extension would have to find this (unofficial) gist, ask for help on a developer forum, or reverse-engineer it from an existing implementation.

It would be useful to document this extension in a BEP, so that all relevant extensions can be found in the same place.

multi tracker specification

I don't believe anyone has ever implemented the behavior specified by the multi-tracker extension, BEP12.

Specifically, uTorrent found it more useful to treat separate tiers as independent sources of peers, and will announce to trackers in separate tiers in parallel. Everyone seemed to expect and rely on this behavior so it was also implemented in libtorrent.

I propose the specification be updated to match the de-facto standard.

Now there's a private tracker that uses the multi-tracker extension in its torrents, but puts all trackers in the same tier. My reading of the specification (when I first implemented it in libtorrent) was that trackers in the same tier should be treated as load-balancing front-ends to the same peer database. Announces should be directed to a random host in that group, to evenly distribute the load.

I have not tested this, but this forum post suggests that uTorrent either doesn't support multiple trackers in the same tier, or it pins a single tracker to be used consistently within the tier (probably the latter).

The specification is not very clear on this point. We should clarify this as well.

Another aspect the specification is silent on is how &event=started and &event=stopped should be treated. Currently, libtorrent records whether it has sent a started event per tracker, regardless of tier. So this is not quite treating every tracker in a tier as just front-ends to the same database, since it will send started to the same tier multiple times, via different trackers.

Any thoughts on what the behavior should be?

Proposal: Sampling DHT RPC

The idea is fairly simple, add a DHT call that returns a random sample of the infohashes that a node has stored locally.

This should make DHT indexing more efficient and provide a cleaner alternative to current approaches which are highly incentivized to misbehave to increase the amount of data they can gather.
It would also democratize the indexing process in the sense that can be executed with moderate resources instead of advantaging those who have large IP blocks at their disposal.

Addressing privacy concerns: Swarms using the DHT already are open, public and de facto indexed. Additionally using popular open trackers usually leads to the torrents to be included in some of their data dumps, too. So in practice this would change very little.

But to offset the concerns anyway I would suggest using encrypted torrents (#20, once I finish the spec and a reference implementation) to allow people to use the DHT while keeping the content secret.

Non-Goal: Enabling average endusers to search the entire DHT network for content

Opinions?

Set the repository description on GitHub

It might be nice to set the description and website of this repository, so it displays something other than: "No description, website, or topics provided." and links to bittorrent.org.

Improving BEP 38

I don’t like how BEP 38 solves the problem of finding local data and have a suggestion on how I think it could be improved.

My problem is that it is not easy to index all your data and compare it with torrents without actually having all the torrents readily available. It is also not possible to index files when using piece hashes as hints due to alignment.

Add a new key to the torrent called “filehints” with a list of dictionaries similar to the “files” key from the info dictionary. The list contains dictionaries with the following keys:

  • filename - Name of the file, full path is not really needed as a hit
  • length - Size of file
  • start hash - sha1 hash of the first 2048 bytes
  • end hash - sha1 hash of the last 2048 bytes

If the file is less than 4096 bytes, the end hash is not necessary. The idea with using both start and end hash is to discover files potentially changed in size or changed at the beginning. Also prevents problems with files potentially starting with the same bytes. I’m not sure if this should be extended to contain hashes of bytes from the middle of the file.

With this it is possible to create a file containing a collection of different filehints and compare them with indexed data, fast. Then the user can find the torrents they can easily seed and files they can reuse. My hope with standardizing is to get sites to build databases of their filehints.

The focus is a bit different but I honestly think it's better overall than BEP 38 even for its current use-case. Partly due to my suggestion being more stateless and an exact and easy-to-implement solution.

Torrent search index format

With updatable torrents and torrent feeds we're slowly moving towards more decentralized ways for publishers to distribute large quantities of updatable content and have subscribers learn about updates.

One problem that consumers of such content face is that they usually first need to download the entire dataset, and therefore cannot search their way through it and prioritize the download of specific pieces based on their interest.

A solution to this would be to implement a B+tree structure, where nodes in the tree are the same size of the pieces of the torrent. The .torrent file would only hold the "root" of the tree.

Searching

A search would start from the root node (provided in the torrent file). Each pointer in the root node links to a piece number. Based on the search, a specific piece is download, and the process continues until the leaf node holding the data is found.

Inserting and deleting

Since we want to reuse as much data as possible and not let subscribes re-download the entire index every time some data is changed, a simple approach would be to use an append-only b+tree, also called a copy-write b+tree. The idea is that the tree is immutable and data is always appended. There's also ways to compact the tree once old portions aren't needed anymore. CouchDB also uses this structure: http://guide.couchdb.org/draft/btree.html

Use case

Imagine you're an entity such as Web Archive and want to distribute a large dump (~40gb) of torrent files. You want your users to be able to achieve keyword search to quickly find relevant content. Instead of creating a single torrent of these torrent files, you create a single "torrent search index" file which is the B+tree explained earlier. The keys in the b+tree are the words you want your users to search for, which can be extracted from the torrent files using available stemmers and analyzers.

You publish this resulting "torrent search index" as a mutable torrent (BEP46) making sure the torrent contains the root node.

Users consuming your mutable torrent will always have the latest index you publish. If they want to search for something they navigate the tree and only download pieces where the content is. Hopefully, with healthy swarms, a keyword search should provide relevant results in only a few seconds.

I was wondering whether something like this has been proposed before and what you think about it?

BEP 52: extension protocol?

I've just read BEP 52, and I cannot see anything in it that looks like the LibTorrent/µTorrent extension mechanism. What is the current thinking on protocol extensions in v2?

BEP 52 file alignment

@ssiloti @the8472 I would like some clarification on how files are required to be aligned in v2 torrent files.

The BEP talks about using pad files in order to be backwards compatible with v1 torrents.

Since the old format did not align files to piece boundaries a multifile torrent must use BEP 47 padding files to achieve identical alignment.

And the description of the file tree says:

Files are mapped into this piece address space so that each non-empty file is aligned to a piece boundary and occurs in the same order as in the file tree. The last piece of each file may be shorter than the specified piece length, resulting in an alignment gap.

It think of this as if here are implied padfiles in a v2-only torrent. The way the libtorrent implementation does this right now is to unconditionally insert pad files when parsing a v2 torrent. Even if there are pad-files that causes incorrect alignment, this logic will simply "fix" this.

However, the case that isn't obvious is if there is an invalid pad file that's larger than it should be, the "fix" currently is to insert another padfile, which will push the next file a whole extra piece out.

For a v2 torrent, with padfiles, should it be a requirement that the pad files are correct? i.e. a fatal error if they aren't.

The reference torrent creator appears to be adding pad files unconditionally, for v2-only torrents also.

That should probably be fixed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.