Giter VIP home page Giter VIP logo

swarm's Issues

Mock datastore for synchronization tests

To conduct fast synchronization tests it would be useful to implement a mock datastore which does not actually store data.
A central storage component just stores which chunk is available at which node, there is no real data involved.

Swap with no chequebook contract

Let's make swap work without chequebook contract, so people can start to use swarm without having any ethers.
If the node provides an account address instead of a chequebook contract let's just send ether there directly

swarm needs a gateway mode (transparent proxy that rewrites bzz:// URLs)

Context:
You can run swarm on a local node and see swarm content on http://localhost:8500 or you could access swarm via a gateway. These require different HTML content than if you have a bzz:// URL handler installed in your browser.

Problem:
Swarm is meant to serve HTTP content for web3 dapps and serverless websites such as our swarm homepage theswarm.eth. In the HTML of that website we want to have URLs of the form

bzz://theswarm.eth/images/some-image.png

but at the moment we are using the relative URL

bzz:/theswarm.eth/some-image.png

which loads

http://localhost:8500/bzz:/theswarm.eth/images/some-image.png
or
http://swarm-gateways.net/bzz:/theswarm.eth/images/some-image.png

This is not good. We cannot have the content of swarm hosted websites to have to be adapted in this way. We need the following:

  1. Default behaviour of all swarm hosted websites should be to use bzz:// URLs.

2a. It should be possibly to run swarm with a --gateway flag which adds a transparent proxy to all swarm content, replacing bzz:// with http://localhost:8500/bzz:/ or with http://gateway-url/bzz:/.

2b. Swarm should not come with a gateway mode itself, but come bundled with an nginx config file that achieves the same as 2a. We should then describe this method in documentation and here: https://ethereum.stackexchange.com/questions/8187/how-to-run-a-swarm-gateway


Some notes:
An alternative is to do the re-writing client side, but then you cannot surf swarm content without a browser plugin / bzz:// handler

Negotiate on chunk price in link

Upon connection nodes should agree on a swap chunk price
Minimal implementation: offer prices, if it is the same, connect, otherwise do not connect.
Use a default price

p2p/simulations: event journal posting to p2p layer

(reminder)

System information

Geth version: NA
OS & Version: any/tests

Expected behaviour

Journal events for msg and node start/stop in simulation should be triggered by callback in p2p layer.

Actual behaviour

It's now triggered in the simulation layer.

SIP - Changes in manifest handling (the trailing slash problem and other issues)

Manifest traversal, from trailing slashes to over/under matched paths has been a headache for a while. We have gotten bugs, we have gotten unexpected behaviour, we have gotten confused.

This issue is created as a placeholder for the following discussion:

"Manifests should treat / as a special character and should always break on / and not on any substring."

Calculate swap in ether instead of chunk

Current swap code calculates the swap balance in number of chunks.
Instead of this peers should agree on a chunk price during handshake (see issue: #223) and do the accounting in ethers from then.

Implement intervals

Intervals are meant to store the chunk index intervals which are already fetched. When a peer disconnects it can happen that we lose an interval and historic syncing will have to fetch it on the next connect.
Intervals should be saved every time a batch is done (Client.BatchDone), before the takeover proof is sent back (if there is one).
This is partially implemented and commented out, we need to finish the implementation and write tests for it.

We need two different type of syncer streams: historical and session syncing. Between full storer nodes both will be there in both directions (so 4 streams between the two nodes). Implement these in syncer and add it to the Subscribe message to specifiy whether this is a historical or a session syncer.
Implement functional tests for historical syncing using intervals
Implement functional tests for historical syncing across sessions using persisted intervals

Fix discovery tests

fix network/simulations/discovery_test.go and rework it using p2p/testing/simulations

reduce the number of errors in http access

  • the GET request /favicon.ico should return our favicon bzz:/22481deec05d53e909e4f3933842686113927c67ab2a22c8ad5614e4e3dc505c/favicon.ico
  • the GET request for robots.txt should return a valid robots.txt
  • the default behaviour of any manifest should be to redirect to the entry for '/' and what was previously called the --defaultpath should be the entry at '/'
  • the GET request for just '/' (as opposed to bzz:/ etc.) should return a HTTP 200 and some minimal status info, not an error about an invalid protocol https://github.com/ethereum/go-ethereum/blob/master/swarm/api/http/server.go#L555
  • manifests without a default path should default to the behaviour that ?list=true induces. (Or at least it should be possible to enable this behaviour on the local node)
  • the GET request to /status should output a status page showing sync status, connected peers, disk usage etc. etc. ... possibly leading in future to a configuration assistant?
  • The configuration file should be populated with default values or should be edited manually.. currently empty entries in the config file are overwritten by any command line arguments. This leads to funky config files if the first time you run swarm you get an argument wrong. For example if I change httpaddr to '' in my config and then run swarm with --httpaddr doofus, then I'll have doofus stuck in my config until I change it manually.

Syncer in mock storage mode

Allow syncer to operate in mock storage mode, i.e., no need to pass the chunk data

Chunk validity check should be abstracted (different for content chunk, resource chunk, test-mock-storage): #263

Syncer tests should pass, see: #242

Create a custom 502 page for the gateway

In our swarm-gateways.net cluster, we run a nginx server reverse proxying to swarm nodes. If swarm nodes can't be reached, nginx displays a standard 502 Bad Gateway error message.

The gateway should have a nicer and more informative custom page.

Rename swarm deb package

the ubuntu package for go-ethereum is conflicting with an existing package which also has an executable named swarm. bzz is also taken

Rename to:

  • bzz taken
  • go-swarm
  • gbzz
  • go-bzz-
  • bzzd
  • bee after all you control/start/config a bee of the swarm

Default manifest behavior

The default behaviour of any manifest should be to redirect to the entry for '/' and what was previously called the --defaultpath should be the entry at '/'

Fix overlay simulation in swarm-network-rewrite

From @holisticode on Gitter:
The overlay simulation may not be taken care of in the current rewrite branch, as it is easily missed as it is a main executable
It may be a good idea to add a test to it if we want to keep the overlay simulation
Please also add it to CI, to keep it working

Http status page for swarm nodes

We need a status page on the http interface, e.g. a on /status

Show the information gathered from the status API endpoint: #243

Maybe this can be merged with the admin page: #159

Need proper URL handlers for browser

We need an easy way to teach browser about bzz:// bzz-raw:// bzz-list:// bzz-hash:// and bzz-immutable:// URLs.

There is one URL handler in the repository since forever but it does not work correctly.

This is also important because at the moment we use relative links "bzz:/theswarm.eth/a/b/c" in our HTML when we should be using absolute links "bzz://theswarm.eth"

One of these works on the gateway, the other works with proper URL handling... what should the correct solution be?

Http admin page for swarm nodes

Closely connected to the status page issue #158

The difference from the status page that the admin page should be interactive: it should contain parameters which can be updated.

It doesn't necessarily has to be a separate page from the status page.

Examples

  • update storage size
  • upload file

Possible issues

Some parameters are not updatable on the fly in the swarm implementaion. So this issue is not just ui development, some refactoring is possibly needed in swarm too.

We have take care of security, perhaps there is an "internal" domain, for example bzz://upload.internal and bzz://status.internal (a bit like chrome://settings/)

Clean up swarm documentation

Parts of the documentation are really old and confusing. To get started, here is a list of some things that could be changed:

  • remove any nohup commands and any input redirects (i.e. no more of this madness: 2>> $DATADIR/swarm.log < <(echo -n "MYPASSWORD") & )
  • remove variables such as $DATADIR - it only serves to confuse those who are not used to the command line and doesn't really help those who do.
  • Remove any reference to /tmp/BZZ datadirs and instead use a proper standard datadir (.ethereum/swarm/ ?)
  • remove documentation about running your own blockchain (under "testing swap") - that's confusing and out of scope.
  • remove any --verbosity 6 from the docs!
  • update the enode addresses to correspond to our current cluster or take them out entirely.
  • remove any networkid 322 still in the docs (we are on networkid 3 now and should switch to 1 soon)
  • make sure all configuration parameters are correct in the documentation and document the config file generation - https://swarm-guide.readthedocs.io/en/latest/runninganode.html#general-configuration-parameters
  • remove ENS registration docs that are better handled in the ENS docs themselves and only focus on how to add "content" to an ENS resolver contract.
  • introduction.rst Fix 0.4 release date (POC 0.4 expected in Q2 2018. -- this is wrong), fix pss gitter link, fix About / This document links, Remove reference to Swatch gitter (no activity on over 1 year), fix all the links in section "roadmap and resources"
  • installation - check which version of Go we need. We say 1.7 or later is preferred, but I suspect we require even later.
  • installation - does sudo apt install golang work, or is that not going to get us the version of go we need?
  • connecting to swarm (simple) - do not use the env var. As written "open another terminal window and connect to Swarm with" will not work.
  • why is swarm up not part of "How do I upload and download?" in the "simple guide". Perhaps refactor?
  • connecting to swarm (advanced) - let's delete everything up to the configuration section. no? We can keep a few snippets such as "how to manually add enodes" but the rest is superfluous as it follows from the documentation of the configuration options.
  • check if "setting up swap" section is still accurate - I have not tested it in a loooong time
  • apropos swarm up - are we renaming the swarm binary for the next release? because we should change the docs accordingly.
  • decide whether to keep the section "Content Retrieval Using a Proxy"
  • (mutable) resource updates "infer" -> "imply".. and generally this needs refourmulation because I have trouble parsing it.
  • replace references to "theswarm.test" with "theswarm.eth"
  • ENS remove all but the first two paragraphs and maybe add a note that ENS registered swarm content must be prefixed by 0x.
  • PSS section needs work
  • FUSE documentation - the actual how-to of mounting a manifest is missing.
  • Architecture section - let's delete it. It doesn't have to be part of the user docs. This is not the right place for it.

Network rewrite streamer/syncer related issues

network rewrite streamer phase breakdown of tasks

  • persistent stateStore implementation for kademlia table
  • there should be a test in discovery to see if the persisted peer set (1) is found and loaded (2) bootstraps healthy kademlia without any connections (3) benchmark the reduction of time reaching health
  • allow syncer to operate in mock storage mode, i.e., no need to pass the chunk data
  • chunk validity check should be abstracted (different for content chunk, resource chunk, test-mock-storage)
  • chunk interface
  • after the request fails it should be removed from the @memstore
  • request repeat if downstream disconnects
  • streamer API, syncer API. used in sim test => simplifies tests
  • write in batches https://github.com/ethersphere/go-ethereum/blob/swarm-gateways-db-fixes/swarm/storage/dbstore.go
  • for proper handling of waiting for storage #179 dbStored field should be lock protected
  • possibly combine with https://github.com/ethersphere/go-ethereum/blob/swarm-db-sync-fix-pyramid and test
  • streamer protocol message exchange unit tests using p2p/testing
  • request/delivery functional tests using p2p/simulations chain of nodes
  • per-bin syncer functional tests using p2p/simulations chain of nodes
  • fix network/simulations/discovery_test.go and rework it using p2p/testing/simulations
  • implement unsubscribe message and method
  • implement subcribe error responses
  • implement Client Close (just like Server Close) and call it in f568ef1#diff-524f169c33e854ca57a6e668ce319a9bR345
  • pass Live bool flag similarly to Key
  • adapt syncer iterator to new syncer
  • save intervals - needs to be concurrent for racing mode light client download request
  • upstream peer sends sessionAt on subscribe
  • functional tests for historical syncing using intervals
  • functional tests for historical syncing across sessions using persisted intervals
  • ~db purge/delete triggsers provable syncer ~
  • cached deliveries should not enter syncpool?
  • light node client/server streamers for requests
  • adapt livepeer streams
  • rework network/stream/testing as p2p/testing/simulation.go unless we want to keep it in our own yard
  • implement subscribe request for light node upload
  • dbstore: export import to be adapted adapted, dump, reindex, cleanup removed,
  • write syncer tests with netsim and mock storage for old syncers

multihash support in swarm

We are often asked about multihash support.

We should have a proper discussion about pros and cons at least. I suggest this issue shall be the place for that discussion until someone formulates this as an EIP (or is it SIP?)

chunker modifications

  • use context for abort etc
  • eliminate wait groups from API
    • storage should be waited on by default, if not needed caller starts split in go routine
    • processors quitting not needed to be waited on
  • backend for progress bar (unclear how to combine split progress with disk storage progress)
  • chunk encryption API
  • API for shannonian obfuscation for plausible deniability
  • API for erasure coding. two modes of operation with join:
    • cheap/slow mode: retrieve first n hashes of intermediate chunk, fallback to parity chunks only if some not found.
    • fast mode: n out of m race

Don't give error if hashes are prefixed with 0x

The hash for the site theswarm.eth is currently 2c2d2adb8fd0cba399282fb59f8219e5fbbd67ba06fcf5c8d343f5eb1c8be022

It is documented that if you are setting a content hash in ENS and you submit that hash, you are making an error. The correct hash to submit to ENS is 0x2c2d2adb8fd0cba399282fb59f8219e5fbbd67ba06fcf5c8d343f5eb1c8be022

Conversely, I have just discovered that calling the 0x hash on bzz gives an error too
See for example:
http://swarm-gateways.net/bzz:/0x2c2d2adb8fd0cba399282fb59f8219e5fbbd67ba06fcf5c8d343f5eb1c8be022/

Surely this can be fixed.

Possibly solutions:

    • If you encounter a bzz:/0xHASH request, redirect HTTP 301 to bzz:/HASH
    • Serve the same content at 0xHASH as you would at HASH
    • Display a helpful error to the user, telling them to remove the 0x form the URL

Swarm needs Browser plugins to handle bzz URLs

With the help of a browser plugin users should be able to use and select a bzz provider, be it localhost:8500, swarm-gateways.net or infra-swarm-gateway, and use bzz:// urls natively.

mod_time in manifest should be optional or better documented

When uploading using bzz, a manifest is created which includes a timestamp of when the upload was created, so it creates a different hash each time. Uploading using bzz-raw (i.e. just uploading the raw bytes), will always lead to the same hash.
To demonstrate, if I upload the same data twice I'll get two hashes but inspecting the manifests, I see the same hash was stored under the path:

$ curl -F "my-file=data" http://localhost:8500/bzz:/
ca16a6b21ddb375fd718bb931cb039b0ff3fdaabc90b4c5e0e82604345764182

$ curl -F "my-file=data" http://localhost:8500/bzz:/
09b70636040f1a9cd88d399dc9cca3fe740e4b1283b5100e1eb57ac4d1b9c5ae

$ curl -s http://localhost:8500/bzz-raw:/ca16a6b21ddb375fd718bb931cb039b0ff3fdaabc90b4c5e0e82604345764182/ | jq .
{
  "entries": [
    {
      "hash": "61cc094e478970c7e58bf44cd1e13b2851d9cea254327d08dbdd1918b454b9f8",
      "path": "my-file",
      "size": 4,
      "mod_time": "2018-01-24T11:30:10.24894844Z"
    }
  ]
}

$ curl -s http://localhost:8500/bzz-raw:/09b70636040f1a9cd88d399dc9cca3fe740e4b1283b5100e1eb57ac4d1b9c5ae/ | jq .
{
  "entries": [
    {
      "hash": "61cc094e478970c7e58bf44cd1e13b2851d9cea254327d08dbdd1918b454b9f8",
      "path": "my-file",
      "size": 4,
      "mod_time": "2018-01-24T11:30:13.609200639Z"
    }
  ]
}

This being said, perhaps the addition of mod_time should be optional if people just want deterministic manifest creation.

this has been causing people headache so we should consider documenting it properly or make it optional

Enable multihash support for swarm root hashes & ENS

Goal: as described in #166 we want to be able to request swarm data using URLs of the form bzz://<multi-hash>/path/in/manifest.

The reason is that this will allow people to store multi-hashes in the ENS resolver contracts at "content" and thereby allowing swarm and ipfs and other systems to exists side by side.

This change also allows us to add to ENS Swarm content that has been uploaded with the --encrypt flag. In the current system that is not possible.

  • Enable retrieval of swarm-content using a multi-hash in the URL
  • Generate a multi-hash when uploading swarm content
  • Document the functionality in the swarm docs
  • Notify the ENS guys -> Need new resolver and new ENS tools.
  • Update all our own ENS names to use a multi-hash

swarm/storage: memstore cache of failed lookups

System information

Geth version: 1.5.10-unstable
OS & Version: Windows/Linux/OSX
Commit hash :55901fffe2ae3b535b5063add279862e0484671e

Expected behaviour

A and B are fresh nodes with empty datastores
Bring up node A, upload to A, bring down A, bring up A and B, connect A and B (manually), request file from B, B forwards to A, B returns file.

Actual behaviour

If file request is made too early after startup (possibly the threshold is the syncer run) the retrieval fails. Even if retrieval of other files after this threshold works, the file that failed will continue to fail until B (possibly also A, please double check this) is restarted.

Steps to reproduce the behaviour

See above…

p2p and p2p/protocols packages handle messages synchronously

The p2p and p2p/protocols packages are reading and then handling requests/messages synchronously.

As a result the PSS protocol is deadlocking on the sim adapter (which uses a net.Pipe).

# p2p/peer.go

func (p *Peer) readLoop(errc chan<- error) {
	defer p.wg.Done()
	for {
		msg, err := p.rw.ReadMsg()
		if err != nil {
			errc <- err
			return
		}
		msg.ReceivedAt = time.Now()
		if err = p.handle(msg); err != nil {
			errc <- err
			return
		}
	}
}

# p2p/protocols/protocol.go

// Run starts the forever loop that handles incoming messages
// called within the p2p.Protocol#Run function
func (p *Peer) Run(handler func(msg interface{}) error) error {
	for {
		if err := p.handleIncoming(handler); err != nil {
			return err
		}
	}
}

In effect the PSS protocol is handling one request/message at a time.

Let's discuss what's the best way to address the deadlock:

  1. Add concurrent multi-message handling to the PSS protocol ?
  2. Keep the synchronous behaviour, but make sure we have a sufficiently big read/write buffer on each connection/adapter ?
  3. Other ideas ?

Implement metrics for swarm

This issue has been opened in order to be tracked in the ethersphere project. It's source issue is ethereum/go-ethereum#15481:

For swarm, it would be good to be able to collect stats and metrics on a node concerning storage (chunks, local DB, etc.) and chequebook properties (consumed and delivered services, service peers, balances, cheques, etc.).

This issue is a parent issue for #177, which is about evaluating the technical infrastructure for the metrics implementation.

Also related are issues #159 and #158 which may be visualizing information based on metrics

Mutable resource updates outstanding tasks

  • API for lookups: (1) latest (2) historical (3) particular version
  • use swarm storage stack for resource update chunks
  • signature implementation: the text that owner signs should contain the key K=Hash(name|period|version). so S = Sign(Hash(K|CH) where CH is the content hash of the version in question and the data of the update chunk is D=S|CH|period|version|name. This is needed otherwise any hash the owner ever signs can be submitted by malicious parties as resource update.
  • resource update validation . content hash, period, version and name should be parsable from the data in order to validate the resource update. as per the previous point
  • http API for the API lookups
  • manifest entry content type to allow server to follow mutable resources;
  • strict lookup mode: complain if version forks (usecase: sw3 promise of no fork truthline)
  • resource update stream using network rewrite stream subscription layer; use pss for remote broadcast
  • create ENS resolver that accepts transactions with arguments blockNumber, version, CH, S from any third party to update the record see #205
  • The ENS resolver should register blockNumber and frequency of updates (the latter only changable by the owner). Then root entry chunks are not needed.
  • instead of blockNumber, period should be represented by the index of period. I.e., latestPeriod = (currentBlock - startBlock) / frequency. This would make period and version more like major and minor version numbers plus we could use 32 bit integers for both
  • Parse update data as multihash if datalength is set to 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.