Giter VIP home page Giter VIP logo

ipfs-go-ds-storj's Introduction

Storj Datastore for IPFS

This is an implementation of the IPFS datastore interface backed by Storj.

NOTE: Plugins only work on Linux and MacOS at the moment. You can track the progress of this issue here: golang/go#19282

Bundling

The datastore plugin must be compiled and bundled together with go-ipfs. The plugin will be part of the go-ipfs binary after executing these steps:

# We use go modules for everything.
> export GO111MODULE=on

# Clone go-ipfs.
> git clone https://github.com/ipfs/kubo
> cd go-ipfs

# Checkout the desired release tag of go-ipfs.
> git checkout v0.19.2

# Pull in the datastore plugin (you can specify a version other than latest if you'd like).
> go get storj.io/ipfs-go-ds-storj/plugin@latest

# Add the plugin to the preload list.
> echo -e "\nstorjds storj.io/ipfs-go-ds-storj/plugin 0" >> plugin/loader/preload_list

# Update the deptree
> go mod tidy

# Now rebuild go-ipfs with the plugin. The go-ipfs binary will be available at cmd/ipfs/ipfs.
> make build

# (Optionally) install go-ipfs. The go-ipfs binary will be copied to ~/go/bin/ipfs.
> make install

Installation

For a brand new ipfs instance (no data stored yet):

  1. Bundle the datastore plugin in go-ipfs as described above.
  2. Delete $IPFS_DIR if it exists.
  3. Run ipfs init --empty-repo (The --empty-repo flag prevents adding doc files to the local IPFS node, which may cause issues before configuring the datastore plugin).
  4. Edit $IPFS_DIR/config to include Storj details (see Configuration below).
  5. Overwrite $IPFS_DIR/datastore_spec as specified below (Don't do this on an instance with existing data - it will be lost).

Configuration

The config file should include the following:

{
  "Datastore": {
  ...
    "Spec": {
      "mounts": [
        {
          "child": {
            "type": "storjds",
            "dbURI": "$databaseURI",
            "bucket": "$bucketname",
            "accessGrant": "$accessGrant",
            "packInterval": "$packInterval",
            "debugAddr": "$debugAddr",
            "updateBloomFilter": "$updateBloomFilter",
            "nodeConnectionPoolCapacity": "100",
            "nodeConnectionPoolKeyCapacity": "5",
            "nodeConnectionPoolIdleExpiration": "2m",
            "satelliteConnectionPoolCapacity": "10",
            "satelliteConnectionPoolKeyCapacity": "0",
            "satelliteConnectionPoolIdleExpiration": "2m"
          },
          "mountpoint": "/",
          "prefix": "storj.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "providers",
            "type": "levelds"
          },
          "mountpoint": "/providers",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    }
    ...
  }
  ...
}

$databaseURI is the URI to a Postgres or CockroachDB database installation. This database is used for local caching of blocks before they are packed and uploaded to the Storj bucket. The database must exists.

$bucketname is a bucket on Storj DCS. It must be created.

$accessGrant is an access grant for Storj DCS with full permission on the entire $bucketname bucket.

$packInterval is an optional time duration that sets the packing interval. The default packing interval is 1 minute. If set to a negative duration, e.g. -1m, the packing job is disabled.

$debugAddr is an optional [host]:port address to listen on for the debug endpoints. If not set, the debug endpoints are disabled.

$updateBloomFilter is an optional boolean that enables the bloom filter updater. If not set, the updater is disabled.

nodeConnectionPoolCapacity is an optional total number of connections to keep open in the Storj node connection pool. Default: 500.

nodeConnectionPoolKeyCapacity is an optonal maximum number of connections to keep open per Storj node in the connection pool. Zero means no limit. Default: 5.

nodeConnectionPoolIdleExpiration is an optional duration for how long a connection in the Storj node connection pool is allowed to be kept idle. Zero means no expiraton. Default: 2m.

satelliteConnectionPoolCapacity is an optional total number of connections to keep open in the Storj satellite connection pool. Default: 10.

satelliteConnectionPoolKeyCapacity is an optonal maximum number of connections to keep open per Storj satellite in the connection pool. Zero means no limit. Default: 0.

satelliteConnectionPoolIdleExpiration is an optional duration for how long a connection in the Storj satellite connection pool is allowed to be kept idle. Zero means no expiraton. Default: 2m.

If you are configuring a brand new ipfs instance without any data, you can overwrite the datastore_spec file with:

{"mounts":[{"mountpoint":"/providers","path":"providers","type":"levelds"},{"bucket":"$bucketname","mountpoint":"/"}],"type":"mount"}

Otherwise, you need to do a datastore migration.

Run With Docker

docker run --rm -d \
    --network host \
    -e STORJ_DATABASE_URL=<database_url> \
    -e STORJ_BUCKET=<storj_bucket> \
    -e STORJ_ACCESS=<storj_access_grant> \
    -e IPFS_IDENTITY_PEER_ID=<peer_id> \
    -e IPFS_IDENTITY_PRIVATE_KEY=<priv_key> \
    -e IPFS_GATEWAY_NO_FETCH=true \
    -e IPFS_GATEWAY_DOMAIN=<gateway_domain_name> \
    -e IPFS_GATEWAY_USE_SUBDOMAINS=false \
    -e IPFS_GATEWAY_PORT=8080 \
    -e IPFS_API_PORT=5001 \
    -e IPFS_INDEX_PROVIDER_URL=http://index-provider:50617 \
    -e IPFS_BLOOM_FILTER_SIZE=1048576 \
    -e STORJ_UPDATE_BLOOM_FILTER=true \
    -e STORJ_PACK_INTERVAL=5m \
    -e STORJ_DEBUG_ADDR=<[host]:port> \
    -e GOLOG_FILE=/app/log/output.log \
    -e GOLOG_LOG_LEVEL="storjds=info" \
    -e STORJ_NODE_CONNECTION_POOL_CAPACITY=100 \
    -e STORJ_NODE_CONNECTION_POOL_KEY_CAPACITY=5 \
    -e STORJ_NODE_CONNECTION_POOL_IDLE_EXPIRATION=2m \
    -e STORJ_SATELLITE_CONNECTION_POOL_CAPACITY=10 \
    -e STORJ_SATELLITE_CONNECTION_POOL_KEY_CAPACITY=0 \
    -e STORJ_SATELLITE_CONNECTION_POOL_IDLE_EXPIRATION=2m \
    --mount type=bind,source=<log-dir>,destination=/app/log \
    storjlabs/ipfs-go-ds-storj:<tag>

Docker images are published to https://hub.docker.com/r/storjlabs/ipfs-go-ds-storj.

STORJ_DATABASE_URL can be set to a Postgres or CockroachDB database URL.

STORJ_BUCKET must be set to an existing bucket.

STORJ_ACCESS must be set to an access grant with full permission to STORJ_BUCKET.

IPFS_IDENTITY_PEER_ID can be set optionally to preserve the node identity between runs. The current peer ID can be found under Identity.PeerID in the config file.

IPFS_IDENTITY_PRIVATE_KEY must be set if IPFS_IDENTITY_PEER_ID is set, and the provided private key must match the peer ID. The current private key can be found under Identity.PrivKey in the config file.

IPFS_GATEWAY_NO_FETCH determines if the IPFS gateway is open (if set to false) or restricted (if set to true). Restricted gateways serve files only from the local IPFS node. Open gateways search the IPFS network if the file is not present on the local IPFS node.

IPFS_GATEWAY_DOMAIN can be set to the domain name assigned to the IPFS gateway. If set, IPFS_GATEWAY_USE_SUBDOMAINS determines if to use subdomains resolution style. See https://docs.ipfs.io/concepts/ipfs-gateway/#subdomain for details.

IPFS_GATEWAY_PORT can be set to change the IPFS Gateway port from the default 8080.

IPFS_API_PORT can be set to change the IPFS HTTP API port from the default 5001.

IPFS_INDEX_PROVIDER_URL can be set to the delegated routing URL of an IPNI Index Provider.

IPFS_ADDRESSES_SWARM can be set to change the IPFS Addresses.Swarm config. It will overwrite the default values.

IPFS_PEERING_PEERS can be set to add extra public peers in the IPFS Peering.Peers config.

IPFS_BLOOM_FILTER_SIZE sets the size in bytes of the datastore bloom filter. It is recommended to set this on production installations for reducing the number of calls to the database due to incoming requests from the IPFS network. Default value is 0, which means that the bloom filter is disabled. Details in https://github.com/ipfs/kubo/blob/master/docs/config.md#datastorebloomfiltersize.

STORJ_UPDATE_BLOOM_FILTER enables the bloom filter updater, which listens to changes in the local database and updates the datastore bloom filter. The default value is false. It should be enabled when running multiple nodes attached to the same datastore. Only CockroachDB is supported. Requires admin privileges for the CockroachDB user.

STORJ_PACK_INTERVAL can be set to change the packing interval. The default packing interval is 1 minute. If set to a negative duration, e.g. -1m, the packing job is disabled.

STORJ_DEBUG_ADDR can be set to a specific [host]:port address to listen on for the debug endpoints. If not set, the debug endpoints are disabled.

GOLOG_FILE sets the log file location. If not set, the logs are printed to the standard error.

GOLOG_LOG_LEVEL sets the log level. The default level is ERROR. Use storjds=<level> to set the log level of only the Storj datastore plugin. See https://github.com/ipfs/go-log#golog_log_level for details.

STORJ_NODE_CONNECTION_POOL_CAPACITY sets the total number of connections to keep open in the Storj node connection pool. Default: 500.

STORJ_NODE_CONNECTION_POOL_KEY_CAPACITY sets the maximum number of connections to keep open per Storj node in the connection pool. Zero means no limit. Default: 5.

STORJ_NODE_CONNECTION_POOL_IDLE_EXPIRATION sets how long a connection in the Storj node connection pool is allowed to be kept idle. Zero means no expiraton. Default: 2m.

STORJ_SATELLITE_CONNECTION_POOL_CAPACITY sets the total number of connections to keep open in the Storj satellite connection pool. Default: 10.

STORJ_SATELLITE_CONNECTION_POOL_KEY_CAPACITY sets the maximum number of connections to keep open per Storj satellite in the connection pool. Zero means no limit. Default: 0.

STORJ_SATELLITE_CONNECTION_POOL_IDLE_EXPIRATION sets how long a connection in the Storj satellite connection pool is allowed to be kept idle. Zero means no expiraton. Default: 2m.

License

MIT

ipfs-go-ds-storj's People

Contributors

dependabot[bot] avatar egonelbre avatar kaloyan-raev avatar mniewrzal avatar sonicwhale avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ipfs-go-ds-storj's Issues

Make sure Garbage Collection is not triggered

We need to make sure we have the proper config to avoid garbage collection triggered on the IPFS node.

The GC process is very intensive and reads all the data from the IPFS, which will incur a substantial egress cost from storage nodes. At the same time, it's unlikely we want to delete a significant amount of data from our IPFS node as it is designed to store only pinned data.

Packer job as a separate process

There should be only one packer job running to avoid concurrency issues. Currently, we have to disable the packer job on all IPFS nodes, but one. Ideally, the packer job should run in its own separate process. It should have it's own Docker image too.

Optimize the SQL query for querying unpacked blocks

This SQL query runs significantly slowly as the blocks table grow:

WITH next_pack AS (
	SELECT b.cid, sum(b2.size) AS sums
	FROM blocks b
	INNER JOIN blocks b2 ON b.pack_status=b2.pack_status AND b2.created <= b.created
	WHERE b.pack_status = `+unpackedStatus+`
	GROUP BY b.cid
	HAVING sum(b2.size) <= $1
	ORDER BY b.created ASC
)
UPDATE blocks
SET pack_status = `+packingStatus+`
WHERE 
	$2 <= (SELECT max(sums) FROM next_pack) AND
	cid IN (SELECT cid FROM next_pack)

Reduce number of GetSize query to database

The IPFS node is bombarded with GetSize requests from the IPFS network. These calls come from the IPFS discovery process when someone is looking for a specific IPFS hash.

We could reduce the pressure on the DB by enabling the bloom filter feature of IPFS. This can be done by setting the BloomFilterSize config to a positive value.

Preserve IPFS node identity between runs

We want to publish our IPFS service to the IPFS content provider list so others can peer their nodes with our nodes for faster download performance over the IPFS network.

As a prerequisite, we need to preserve the identity of our IPFS nodes between runs of the Docker containers. Currently, the identity is regenerated on every new run.

We need to take note of the current Peer ID and its respective private key from the config file of each of our IPFS nodes. The config file is available at /data/ipfs/config within the Docker container. The Peer ID and private key are under the Identity config and look like this:

  "Identity": {
    "PeerID": "12D3KooWSJXmjcZNnFnggjstN9APt3L6fbHyeGCCTGahnBHn7gwQ",
    "PrivKey": "CAESQG27djOGl6RLdW+CIUVcNXD7imilAp/zPbWOdteaAtGe9PIcqRrVT44T23IZrYQZqckj9MV87RPHoQ4L0E9chI8="
  },

Then we need to introduce new env vars to the Docker image to pass them to the IPFS config:

  • IPFS_IDENTITY_PEER_ID
  • IPFS_IDENTITY_PRIVATE_KEY

If the above env vars are not set then the IPFS node will generate a new identity like it currently does.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.