Giter VIP home page Giter VIP logo

Comments (12)

the8472 avatar the8472 commented on July 28, 2024 1

In practice BEP1 is not really followed. You don't need a BEP number for now that the document is not even finished.

And a search feature is likely so complex that it'll need multiple BEPs for the building blocks. E.g. like BEP 9 relies on BEP 10 and optionally on BEP 5.

You'll probably need a working implementation that can be used as reference and to find real-world problems with the theoretical approach. possibly as stand-alone client or library.

I think everyone (other implementers) is quite skeptical of decentralized fulltext search because it's a very complex problem that other p2p networks have tried to tackle generally resulted in mixed outcomes.

The thing is that the search itself and its algorithmic complexity may not be the biggest problem. Gnutella&co already had limited-broadcast based solutions that sortof worked. The other problems are what really blows up the complexity and will make writing a concise BEP difficult.

Anyway, in my opinion a BEP would need to contain the following so that others could implment it in principle:

  • the network protocol
  • the algorithms for searching + example calculations where appropriate
  • privacy analysis
  • defenses against various malicious actors or at least an analysis where its weaknesses are

And that would just be the starting point for others to provide input, e.g. to improve privacy or add more defenses.


On a higher level, you have to consider what goals one would want from a fulltext search. some of the following goals are just nice-to-haves or weakly achieving them may be good enough:

  • resilience, i.e. thwarting attempts to disrupt the entire network
  • spam resistance
  • censorship resistance, i.e. DoSing particular locations in the network
  • sybil resistance
  • resistance against abuse of service/unduly consuming resources - e.g. attempts to dump the entire network or issue thousands of queries per second. otherwise someone will just try to implement a website that'll fulfill search queries by issuing thousands of of lookups per second.
  • performance
  • fine-grained searching: partial matches, boolean queries, sorting by various metadata.
  • obscuring what a client is searching for
  • obscuring who published a particular entry

from bittorrent.org.

ssiloti avatar ssiloti commented on July 28, 2024

Is there an actual proposal somewhere or is this just a proposal for someone to write a proposal? In any case it is very unlikely such a BEP would be accepted. There is a distinct lack of good solutions to this problem.

from bittorrent.org.

artsokol avatar artsokol commented on July 28, 2024

Today we publish document with our idea description. Waiting for your critical comments and questions.

from bittorrent.org.

arvidn avatar arvidn commented on July 28, 2024

I would think that there are some privacy concerns with this protocol. For instance, it sounds like this would make it fairly easy to query a peer for all the torrents it's downloading/seeding. One could argue that to some degree this is already public information, but it's a lot harder to get today (the easy query is "which peers are downloading this torrent", but it's still hard to ask "which torrents is this peer downloading").

from bittorrent.org.

the8472 avatar the8472 commented on July 28, 2024

I also don't see any defenses against malicious actors. Search networks are prime targets for inject of spam / fake / malware content under popular keywords.

from bittorrent.org.

artsokol avatar artsokol commented on July 28, 2024

For instance, it sounds like this would make it fairly easy to query a peer for all the torrents it's downloading/seeding.

That's not correct. Other Host(peer) returns only one closest to our query other host and sws node belonged to or itself if there are no any.
Possibly make sense to introduce new option "private" for torrent files that should not be involved in small world network

I also don't see any defenses against malicious actors. Search networks are prime targets for inject of spam / fake / malware content under popular keywords.

For the time being we should provide as more additional information about found files as we can (size, date of creating etc) to user can make a decision about downloading certain file. Some trust mechanism can be suggested in the nearest future by separate BEPs. For ex. small world is tightly connected network and complaining of one host causes the quick forwarding of rumor and malware info hash will be excluded from network by network.

from bittorrent.org.

aponom84 avatar aponom84 commented on July 28, 2024

@the8472, we understand that there is a problem related to "spam / fake / malware content under popular keywords". The main aim of this BEP is to bring to community the idea of the possibility of efficient full-text searching under P2P environment. We realize that there must be a mechanism for calculating the relevance. For example, relevance can be based on the number of seeding peers. Note, that a good torrent file with a good description together has a unique hash value. So it gives some protection against fakes.

from bittorrent.org.

the8472 avatar the8472 commented on July 28, 2024

Well, the algorithm may be a starting point for a fulltext search, but to be implementation-ready other problems need to be solved too.

I think a BEP for full-text search should include some minimum resilience against malicious actors, a protocol specification, tangential problems like segmenting languages where whitespaces do not necessarily separate words (e.g. logographic languages) etc. etc.

All you propose is one building block out of several needed to build a fulltext search, that's not sufficient for a standards track BEP.

from bittorrent.org.

artsokol avatar artsokol commented on July 28, 2024

We are in process of writing technical details and will provide it step by step. According information from BitTorrent.org first of all we should post Idea, get accept from community and get bep number. And that is why we have published just base theoretical description of our idea.

from bittorrent.org.

artsokol avatar artsokol commented on July 28, 2024

We are going to update our proposal regularly according parts finishing.

from bittorrent.org.

aponom84 avatar aponom84 commented on July 28, 2024

I just want to add a comment about computational complexity and performance.
My colleagues and I have studied this issue for several years. We have wanted to adapt small world networks for searching, but our goal was make something more universal then well known DHT. And we successfully done it. We published several papers about efficient algorithms for approximate nearest neighbour search in high dimensional metric space. These algorithms construct navigable small world networks in one simple way. And our experiments have confirmed logarithmical search complexity of the searching procedure.
You can more details in the following papers:
[Malkov, Y., Ponomarenko, A., Logvinov, A., & Krylov, V. (2012). Scalable distributed algorithm for approximate nearest neighbor search problem in high dimensional general metric spaces. In Similarity Search and Applications (pp. 132-147). Springer Berlin Heidelberg.] - here you can find the details of algorithm. (This paper has awarded as a best paper in the SISAP conference)
[Ponomarenko, A., Avrelin, N., Bilegsaikhan. N., Boytsov, L., 2014. Comparative Analysis of Data Structures for Approximate Nearest Neighbor Search. In Proceedings of The Third International Conference on Data Analytics] - in this paper you can find experiments with Wikipedia data set. Unfortunately we don't plot this graph in log log scale, but I sure you that in log log scale it has the from of a line.
So now the nearest neighbour search in high dimensional spaces is not a "problem" and it is possible to implement logarithmic full text search even in the p2p setting.

from bittorrent.org.

artsokol avatar artsokol commented on July 28, 2024

Added more details in common information, similarity and term vector description, searching algorithms and routing table.

from bittorrent.org.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.