Giter VIP home page Giter VIP logo

gaia's Introduction

Gaia: A decentralized high-performance storage system

codecov

This document describes the high-level design and implementation of the Gaia storage system, also briefly explained in the docs.stacks.co. It includes specifications for backend storage drivers and interactions between developer APIs and the Gaia service.

Developers who wish to use the Gaia storage system should see the stacks.js documentation here and in particular the storage package here.

Instructions on setting up, configuring and testing a Gaia Hub can be found here and here.

Overview

Gaia works by hosting data in one or more existing storage systems of the user's choice. These storage systems are typically cloud storage systems. We currently have driver support for S3 and Azure Blob Storage, but the driver model allows for other backend support as well. The point is, the user gets to choose where their data lives, and Gaia enables applications to access it via a uniform API.

Blockstack applications use the Gaia storage system to store data on behalf of a user. When the user logs in to an application, the authentication process gives the application the URL of a Gaia hub, which performs writes on behalf of that user. The Gaia hub authenticates writes to a location by requiring a valid authentication token, generated by a private key authorized to write at that location.

User Control: How is Gaia Decentralized?

Gaia's approach to decentralization focuses on user-control of data and storage. If a user can choose which gaia hub and which backend provider to store data with, then that is all the decentralization required to enable user-controlled applications.

In Gaia, the control of user data lies in the way that user data is accessed. When an application fetches a file data.txt for a given user alice.id, the lookup will follow these steps:

  1. Fetch the zonefile for alice.id, and read her profile URL from that zonefile
  2. Fetch the Alice's profile and verify that it is signed by alice.id's key
  3. Read the application root URL (e.g. https://gaia.alice.org/) out of the profile
  4. Fetch file from https://gaia.alice.org/data.txt

Because alice.id controls her zonefile, she can change where her profile is stored, if the current storage of the profile is compromised. Similarly, if Alice wishes to change her gaia provider, or run her own gaia node, she can change the entry in her profile.

For applications writing directly on behalf of Alice, they do not need to perform this lookup. Instead, the stack.js authentication flow provides Alice's chosen application root URL to the application. This authentication flow is also within Alice's control, because the authentication response must be generated by Alice's browser.

While it is true that many Gaia hubs will use backend providers like AWS or Azure, allowing users to easily operate their own hubs, which may select different backend providers (and we'd like to implement more backend drivers), enables truly user-controlled data, while enabling high performance and high availability for data reads and writes.

Write-to and Read-from URL Guarantees

A performance and simplicity oriented guarantee of the Gaia specification is that when an application submits a write to a URL https://myhub.service.org/store/foo/bar, the application is guaranteed to be able to read from a URL https://myreads.com/foo/bar. While the prefix of the read-from URL may change between the two, the suffix must be the same as the write-to URL.

This allows an application to know exactly where a written file can be read from, given the read prefix. To obtain that read prefix, the Gaia service defines an endpoint:

GET /hub_info/

which returns a JSON object with a read_url_prefix.

For example, if my service returns:

{ ...,
  "read_url_prefix": "https://myservice.org/read/"
}

I know that if I submit a write request to:

https://myservice.org/store/1DHvWDj834zPAkwMhpXdYbCYh4PomwQfzz/0/profile.json

That I will be able to read that file from:

https://myservice.org/read/1DHvWDj834zPAkwMhpXdYbCYh4PomwQfzz/0/profile.json

Address-based Access-Control

Access control in a gaia storage hub is performed on a per-address basis. Writes to URLs /store/<address>/<file> are only allowed if the writer can demonstrate that they control that address. This is achieved via an authentication token, which is a message signed by the private-key associated with that address. The message itself is a challenge-text, returned via the /hub_info/ endpoint.

V1 Authentication Scheme

The V1 authentication scheme uses a JWT, prefixed with v1: as a bearer token in the HTTP authorization field. The expected JWT payload structure is:

{
 'type': 'object',
 'properties': {
   'iss': { 'type': 'string' },
   'exp': { 'type': 'IntDate' },
   'iat': { 'type': 'IntDate' },
   'gaiaChallenge': { 'type': 'string' },
   'associationToken': { 'type': 'string' },
   'salt': { 'type': 'string' }
 }
 'required': [ 'iss', 'gaiaChallenge' ]
}

In addition to iss, exp, and gaiaChallenge claims, clients may add other properties (e.g., a salt field) to the payload, and they will not affect the validity of the JWT. Rather, the validity of the JWT is checked by ensuring:

  1. That the JWT is signed correctly by verifying with the pubkey hex provided as iss
  2. That iss matches the address associated with the bucket.
  3. That gaiaChallenge is equal to the server's challenge text.
  4. That the epoch time exp is greater than the server's current epoch time.
  5. That the epoch time iat (issued-at date) is greater than the bucket's revocation date (only if such a date has been set by the bucket owner).

Association Tokens

The association token specification is considered private, as it is mostly used for internal Gaia use cases. This means that this specification can change or become deprecated in the future.

Often times, a single user will use many different keys to store data. These keys may be generated on-the-fly. Instead of requiring the user to explicitly whitelist each key, the v1 authentication scheme allows the user to bind a key to an already-whitelisted key via an association token.

An association token is a JWT signed by a whitelisted key that, in turn, contains the public key that signs the authentication JWT that contains it. Put another way, the Gaia hub will accept a v1 authentication JWT if it contains an associationToken JWT that (1) was sigend by a whitelisted address, and (2) identifies the signer of the authentication JWT.

The association token JWT has the following structure in its payload:

{
  'type': 'object',
  'properties': {
    'iss': { 'type': 'string' },
    'exp': { 'type': 'IntDate' },
    'iat': { 'type': 'IntDate' },
    'childToAssociate': { 'type': 'string' },
    'salt': { 'type': 'string' },
  },
  'required': [ 'iss', 'exp', 'childToAssociate' ]
}

Here, the iss field should be the public key of a whitelisted address. The childtoAssociate should be equal to the iss field of the authentication JWT. Note that the exp field is required in association tokens.

Legacy authentication scheme

In more detail, this signed message is:

BASE64({ "signature" : ECDSA_SIGN(SHA256(challenge-text)),
         "publickey" : PUBLICKEY_HEX })

Currently, challenge-text must match the known challenge-text on the gaia storage hub. However, as future work enables more extensible forms of authentication, we could extend this to allow the auth token to include the challenge-text as well, which the gaia storage hub would then need to also validate.

Data storage format

A gaia storage hub will store the written data exactly as given. This means that the storage hub does not provide many different kinds of guarantees about the data. It does not ensure that data is validly formatted, contains valid signatures, or is encrypted. Rather, the design philosophy is that these concerns are client-side concerns. Client libraries (such as stacks.js) are capable of providing these guarantees, and we use a liberal definition of the end-to-end principle to guide this design decision.

Operation of a Gaia Hub

Configuration files

A configuration TOML/JSON file should be stored either in the top-level directory of the hub server, or a file location may be specified in the environment variable CONFIG_PATH.

An example configuration file is provided in (./hub/config.sample.json) You can specify the logging level, the number of social proofs required for addresses to write to the system, the backend driver, the credentials for that backend driver, and the readURL for the storage provider.

Private hubs

A private hub services requests for a single user. This is controlled via whitelisting the addresses allowed to write files. In order to support application storage, because each application uses a different app- and user-specific address, each application you wish to use must be added to the whitelist separately.

Alternatively, the user's client must use the v1 authentication scheme and generate an association token for each app. The user should whitelist her address, and use her associated private key to sign each app's association token. This removes the need to whitelist each application, but with the caveat that the user needs to take care that her association tokens do not get misused.

Open-membership hubs

An open-membership hub will allow writes for any address top-level directory, each request will still be validated such that write requests must provide valid authentication tokens for that address. Operating in this mode is recommended for service and identity providers who wish to support many different users.

In order to limit the users that may interact with such a hub to users who provide social proofs of identity, we support an execution mode where the hub checks that a user's profile.json object contains social proofs in order to be able to write to other locations. This can be configured via the config.json or config.toml.

Driver model

Gaia hub drivers are fairly simple. The biggest requirement is the ability to fulfill the write-to/read-from URL guarantee.

A driver can expect that two modification operations to the same path will be mutually exclusive. No writes, renames, or deletes to the same path will be concurrent.

As currently implemented a gaia hub driver must implement the following functions:

interface DriverModel {

  /**
   * Return the prefix for reading files from.
   *  a write to the path `foo` should be readable from
   *  `${getReadURLPrefix()}foo`
   * @returns the read url prefix.
   */
  getReadURLPrefix(): string;

  /**
   * Performs the actual write of a file to `path`
   *   the file must be readable at `${getReadURLPrefix()}/${storageToplevel}/${path}`
   *
   * @param options.path - path of the file.
   * @param options.storageToplevel - the top level directory to store the file in
   * @param options.contentType - the HTTP content-type of the file
   * @param options.stream - the data to be stored at `path`
   * @param options.contentLength - the bytes of content in the stream
   * @param options.ifMatch - optional etag value to be used for optimistic concurrency control
   * @param options.ifNoneMatch - used with the `*` value to save a file not known to exist,
   * guaranteeing that another upload didn't happen before, losing the data of the previous
   * @returns Promise that resolves to an object containing a public-readable URL of the stored content and the objects etag value
   */
  performWrite(options: {
    path: string;
    storageTopLevel: string;
    stream: Readable;
    contentLength: number;
    contentType: string;
    ifMatch?: string;
    ifNoneMatch?: string;
  }): Promise<{
    publicURL: string,
    etag: string
  }>;

  /**
   * Deletes a file. Throws a `DoesNotExist` if the file does not exist. 
   * @param options.path - path of the file
   * @param options.storageTopLevel - the top level directory
   * @param  options.contentType - the HTTP content-type of the file
   */
  performDelete(options: {
    path: string;
    storageTopLevel: string;
  }): Promise<void>;

  /**
   * Renames a file given a path. Some implementations do not support
   * a first class move operation and this can be implemented as a copy and delete. 
   * @param options.path - path of the original file
   * @param options.storageTopLevel - the top level directory for the original file
   * @param options.newPath - new path for the file
   */
  performRename(options: {
    path: string;
    storageTopLevel: string;
    newPath: string;
  }): Promise<void>;

  /**
   * Retrieves metadata for a given file.
   * @param options.path - path of the file
   * @param options.storageTopLevel - the top level directory
   */
  performStat(options: {
    path: string;
    storageTopLevel: string;
  }): Promise<{
    exists: boolean;
    lastModifiedDate: number;
    contentLength: number;
    contentType: string;
    etag: string;
  }>;

  /**
   * Returns an object with a NodeJS stream.Readable for the file content
   * and metadata about the file.
   * @param options.path - path of the file
   * @param options.storageTopLevel - the top level directory
   */
  performRead(options: {
    path: string;
    storageTopLevel: string;
  }): Promise<{
    data: Readable;
    lastModifiedDate: number;
    contentLength: number;
    contentType: string;
    etag: string;
  }>;

  /**
   * Return a list of files beginning with the given prefix,
   * as well as a driver-specific page identifier for requesting
   * the next page of entries.  The return structure should
   * take the form { "entries": [string], "page"?: string }
   * @returns {Promise} the list of files and a possible page identifier.
   */
  listFiles(options: {
    pathPrefix: string;
    page?: string;
  }): Promise<{
    entries: string[];
    page?: string;
  }>;

  /**
   * Return a list of files beginning with the given prefix,
   * as well as file metadata, and a driver-specific page identifier
   * for requesting the next page of entries.
   */
  listFilesStat(options: {
    pathPrefix: string;
    page?: string;
  }): Promise<{
    entries: {
        name: string;
        lastModifiedDate: number;
        contentLength: number;
        etag: string;
    }[];
    page?: string;
  }>;
  
}

HTTP API

The Gaia storage API defines the following endpoints:


GET ${read-url-prefix}/${address}/${path}

This returns the data stored by the gaia hub at ${path}. The response headers include Content-Type and ETag, along with the required CORS headers Access-Control-Allow-Origin and Access-Control-Allow-Methods.


HEAD ${read-url-prefix}/${address}/${path}

Returns the same headers as the corresponding GET request. HEAD requests do not return a response body.


POST ${hubUrl}/store/${address}/${path}

This performs a write to the gaia hub at ${path}.

On success, it returns a 202 status, and a JSON object:

{
 "publicURL": "${read-url-prefix}/${address}/${path}",
 "etag": "version-identifier"
}

The POST must contain an authentication header with a bearer token. The bearer token's content and generation is described in the access control section of this document.

Additionally, file ETags and conditional request headers are used as a concurrency control mechanism. All requests to this endpoint should contain either an If-Match header or an If-None-Match header. The three request types are as follows:

Update existing file: this request must specify an If-Match header containing the most up to date ETag. If the file has been updated elsewhere and the ETag supplied in the If-Match header doesn't match that of the file in gaia, a 412 Precondition Failed error will be returned.

Create a new file: this request must specify the If-None-Match: * header. If the already exists at the given path, a 412 Precondition Failed error will be returned.

Overwrite a file: this request must specify the If-Match: * header. Note that this bypasses concurrency control and should be used with caution. Improper use can cause bugs such as unintended data loss.

The file ETag is returned in the response body of the store POST request, the response headers of GET and HEAD requests, and in the returned entries in list-files request.

Additionally, a request to a file path that already has a previous ongoing request still processing for the same file path will return with a 409 Conflict error. This can be handled with a retry.


DELETE ${hubUrl}/delete/${address}/${path}

This performs a deletion of a file in the gaia hub at ${path}.

On success, it returns a 202 status. Returns a 404 if the path does not exist. Returns 400 if the path is invalid.

The DELETE must contain an authentication header with a bearer token. The bearer token's content and generation is described in the access control section of this document.


GET ${hubUrl}/hub_info/

Returns a JSON object:

{
 "challenge_text": "text-which-must-be-signed-to-validate-requests",
 "read_url_prefix": "${read-url-prefix}"
 "latest_auth_version": "v1"
}

The latest auth version allows the client to figure out which auth versions the gaia hub supports.


POST ${hubUrl}/revoke-all/${address}

The post body must be a JSON object with the following field:

{ "oldestValidTimestamp": "${timestamp}" }

Where the timestamp is an epoch time in seconds. The timestamp is written to a bucket-specific file (/${address}-auth). This becomes the oldest valid iat timestamp for authentication tokens that write to the /${address}/ bucket.

On success, it returns a 202 status, and a JSON object:

{ "status": "success" }

The POST must contain an authentication header with a bearer token. The bearer token's content and generation is described in the access control section of this document.


POST ${hubUrl}/list-files/${address}

The post body can contain a page field with the pagination identifier from a previous request:

{ "page": "${lastListFilesResult.page}" }

If the post body contains a stat: true field then the returned JSON includes file metadata:

{
  "entries": [
    { "name": "string", "lastModifiedDate": "number", "contentLength": "number", "etag": "string" },
    { "name": "string", "lastModifiedDate": "number", "contentLength": "number", "etag": "string" },
    // ...
  ],
  "page": "string" // possible pagination marker
}

If the post body does not contain a stat: true field then the returned JSON entries will only be file name strings:

{
  "entries": [
    "fileNameExample1",
    "fileNameExample2",
    // ...
  ],
  "page": "string" // possible pagination marker
}

The POST must contain an authentication header with a bearer token. The bearer token's content and generation is described in the access control section of this document.


Future Design Goals

Dependency on DNS

The gaia specification requires that a gaia hub return a URL that a user's client will be able to fetch. In practice, most gaia hubs will use URLs with DNS entries for hostnames (though URLs with IP addresses would work as well). However, even though the spec uses URLs, that doesn't necessarily make an opinionated claim on underlying mechanisms for that URL. If a browser supported new URL schemes which enabled lookups without traditional DNS (for example, with the Blockstack Name System instead), then gaia hubs could return URLs implementing that scheme. As the Blockstack ecosystem develops and supports these kinds of features, we expect users would deploy gaia hubs that would take advantage.

Extensibly limiting membership sets

Some service providers may wish to provide hub services to a limited set of different users, with a provider-specific method of authenticating that a user or address is within that set. In order to provide that functionality, our hub implementation would need to be extensible enough to allow plugging in different authentication models.

A .storage Namespace

Gaia nodes can request data from other Gaia nodes, and can store data to other Gaia nodes. In effect, Gaia nodes can be "chained together" in arbitrarily complex ways. This creates an opportunity to create a decentralized storage marketplace.

Example

For example, Alice can make her Gaia node public and program it to store data to her Amazon S3 bucket and her Dropbox account. Bob can then POST data to Alice's node, causing her node to replicate data to both providers. Later, Charlie can read Bob's data from Alice's node, causing Alice's node to fetch and serve back the data from her cloud storage. Neither Bob nor Charlie have to set up accounts on Amazon S3 and Dropbox this way, since Alice's node serves as an intermediary between them and the storage providers.

Since Alice is on the read/write path between Bob and Charlie and cloud storage, she has the opportunity to make optimizations. First, she can program her Gaia node to synchronously write data to local disk and asynchronously back it up to S3 and Dropbox. This would speed up Bob's writes, but at the cost of durability (i.e. Alice's node could crash before replicating to the cloud).

In addition, Alice can program her Gaia node to service all reads from disk. This would speed up Charlie's reads, since he'll get the latest data without having to hit back-end cloud storage providers.

Service Description

Since Alice is providing a service to Bob and Charlie, she will want compensation. This can be achieved by having both of them send her money via the underlying blockchain.

To do so, she would register her node's IP address in a .storage namespace in Blockstack, and post her rates per gigabyte in her node's profile and her payment address. Once Bob and Charlie sent her payment, her node would begin accepting reads and writes from them up to the capacity purchased. They would continue sending payments as long as Alice provides them with service.

Other experienced Gaia node operators would register their nodes in .storage, and compete for users by offerring better durability, availability, performance, extra storage features, and so on.

Notes on our deployed service

Our deployed service places some modest limitations on file uploads and rate limits. Currently, the service will only allow up to 20 write requests per second and a maximum file size of 5MB. However, these limitations are only for our service, if you deploy your own Gaia hub, these limitations are not necessary.

Project Comparison

Here's how Gaia stacks up against other decentralized storage systems. Features that are common to all storage systems are omitted for brevity.

Features Gaia Sia Storj IPFS DAT SSB
User controls where data is hosted X
Data can be viewed in a normal Web browser X X
Data is read/write X X X
Data can be deleted X X X
Data can be listed X X X X X
Deleted data space is reclaimed X X X X
Data lookups have predictable performance X X
Writes permission can be delegated X
Listing permission can be delegated X
Supports multiple backends natively X X
Data is globally addressable X X X X X
Needs a cryptocurrency to work X X
Data is content-addressed X X X X X

gaia's People

Contributors

bjorger avatar charliec3 avatar criadoperez avatar cwackerfuss avatar dantrevino avatar dependabot[bot] avatar fritzsima avatar jackzampolin avatar jcnelson avatar jenakarrasco avatar kantai avatar muneeb-ali avatar reedrosenbluth avatar semantic-release-bot avatar sortofdev avatar timstackblock avatar wileyj avatar zone117x avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gaia's Issues

Document struct of database objects

For using the registrar, the code currently assumes that incoming registrations are stored in a DB and the registrar reads from that DB to process registrations.

The structure of the DB objects is not properly documented yet.

Default gaia hub read issues

Several app developers are reporting read issues with the default Gaia hub hosted by Blockstack PBC.

Richard Adjei [Today at 4:53 AM]
in #engineering
I would say the main issue is the servers are occasionally failing to recover files from storage even though the files exist. And the files that load are also taking a while. It could be due to our app requesting too many files at the same time and Gaia storage not being designed to handle that many accesses. But I can say for sure so I just wanted to inform blockstack.

7 replies
avthar [3 hours ago]
As a caveat, this may also be a problem with the programming patterns we use to load stuff from GAIA. We’re following the examples laid out in the Blockstack tutorials though so I’m not sure if there’s too much more improvement on that front (edited)

JustinHunter [2 hours ago]
I've seen this behavior when trying to load more than one file on page load (ex: in componentDidMount method in React). I think it's good to bring it up. My solution has been a sub-optimal solution performance wise, but if I need that second (or third or whatever) file, I set a refresh call to keep fetching the file just to make sure I eventually get it.

avthar [2 hours ago]
thanks for sharing your experience — I think it’s the same issue — we’re trying to load like 5 files on the same page lol

Create an introduction to managing data in Gaia for the developer audience

We already have great tutorials showing how to build simple apps with Gaia, but there's not a place where developers can just read about how Gaia works at a conceptual level. It would be good to have a post where we explain how you can manage app data with Gaia.

This could also be a series:
Post 1: Working with Data Collections
Post 2: Maintaining and Scaling Data
Post 3: Sharing Data with Others

Driver specs

This isn't an issue but a question.
@kantai mentioned in issue #79 about pinning down a spec for gaia hub drivers. Obviously the current drivers are simple and it's easy to implement new ones. Is there a spec yet or Is there one in writing?

Gaia gateway

As a piece of ancillary tooling, add a gateway process that handles the following endpoints:

  • GET /${blockstack_id}/${address}/${filename}: Fetches the given ${filename} from the given ${blockstack_id}. For example, GET /ryan.id/1FrZTGQ8DM9TMPfGXtXMUvt2NNebLiSzad/status.json should resolve to [{"id":0,"text":"Hello, Blockstack!","created_at":1515786983492}].

  • GET /${blockstack_id}: Fetches the list of application URLs and Gaia bucket IDs (i.e. app addresses) for a particular user. This is taken straight from the apps object in the Blockstack ID's profile.

The purpose of this gateway is to make it easy to share files stored in Gaia with the rest of the Web.

Storage API endpoint

Add an API endpoint that will allow a client to upload their zonefile and profile information. The endpoint should verify that the zonefile's hash is already present in the blockchain, and that the profile is signed by the key indicated in the zonefile. This effectively makes the registrar a storage provider for the client.

Gaia Hub should allow file deletes

The gaia hub should allow file deletion. This will require an extension of the driver model, and a new endpoint (can use the DELETE HTTP method).

Multiple Gaia node support for redundancy

Allow users to configure multiple Gaia nodes for data storage.

@jcnelson mentioned that it is already possible to put multiple Gaia hub URL's in a zone file. The client logic should detect this and iterate through the URLs and try each of them. The old version of Gaia (pre-javascript) did this.

Add private servers as storage provider

I don't know how to solve this the best, but it would be cool to use generic own servers as storage provider (pros would be that I know who is responsible for broken things, how much space I have, etc).

Maybe with using current standards as a base (thinking of WebDav), it's even possible to be compatible with all the WebDav servers out there. (One open question is how to authenticate devices, would be need some "blockstack storage provider" software code at the server, or at least the operator to import login data.)

Potential loss of all identities due to remaining centralization

At the moment, all Blockchain IDs registered through Onename are vulnerable to loss.

Hackers able to hack into Onename's servers could obtain the private keys to everyone's identity, and if they were able to do that they would own them. Unlike identities stored on Twitter or Google, there would be little Onename could do to return them back to their rightful owners.

Gaia authentication payload must include client-side data

The authentication process for authenticating gaia hub should include client-generated content in the signed payload (and will therefore need a way to relay that data to the gaia hub as well). This would prevent a malicious gaia hub from requesting arbitrary challenge texts, because the client would be signing a payload with client-generated data, in addition to the challenge text.

Hub: Add IPFS Driver

We should support an IPFS backend for the Gaia Hub.

However, we should probably pin down a spec for gaia hub drivers before we do this.

Selection criteria for blockstored node

Before using a blockstored node, make sure that:

a) We fetch the latest bitcoind block from an independent source, and check that the blockstored node's last block is recent i.e., within the normal (~10 block) processing range.

b) We compare the consensus_hash obtained from blockstored with other sources.

Public list-files

Make it possible, through a configuration option in the Gaia hub, to list files publicly via a GET endpoint.

Windows support for diskDriver

By attemping to run a Gaia server on Windows 10 machine with default config.json (copy of config.sample.disk.json) following error shows up:
src\server\server.js -> lib\server\server.js

"function":"Module._load","line":489,"method":"_load","native":false}],"stack":["Error: UNKNOWN: unknown error, lstat '\\\\Users\\home\\'"," at Object.fs.lstatSync (fs.js:941:11)"," at DiskDriver.mkdirs (C:\\Users\\home\\dev\\gaia\\hub\\lib\\server\\drivers\\diskDriver.js:56:39)"," at new DiskDriver (C:\\Users\\home\\dev\\gaia\\hub\\lib\\server\\drivers\\diskDriver.js:41:10)"," at makeHttpServer (C:\\Users\\home\\dev\\gaia\\hub\\lib\\server\\http.js:51:14)"," at Object.<anonymous> (C:\\Users\\home\\dev\\gaia\\hub\\lib\\index.js:15:36)"," at Module._compile (module.js:635:30)"," at Object.Module._extensions..js (module.js:646:10)"," at Module.load (module.js:554:32)"," at tryModuleLoad (module.js:497:12)"," at Function.Module._load (module.js:489:3)"],"level":"error","message":"uncaughtException: UNKNOWN: unknown error, lstat '\\\\Users\\home\\'","timestamp":"2018-07-21T08:04:56.676Z"

Tried creating the /tmp/gaia-data folder on main drive manually, that did not seem to help. Neither did changing paths in config.json have any positive results, always ending up with the same error, but with different paths.

Mac OS did not seem to have this issue, I'm not sure if Gaia server is even supposed to run on Windows, if not it would be a good idea to mention that in README file.

Add check to authentication which requires the URL to be communicated

We can add a requirement to the authentication which ensures that the authentication token was generated by a client intending to communicate with this gaia hub -- basically, just add a claim like:

hubUrl: "https://hub.blockstack.org"

And then the Gaia hub asserts that the hubURL in the authentication token matches the one that the hub expects of itself.

enable posting to gaia hub with multi-sig JWTs

In order to upload a profile that belongs to a blockstack id that is backed by a multi-sig address gaia should accept writes that are authenticated via multi-party signature.

I'm not sure this formulation is as precise as I can make it. It is an issue that came up in a conversation with @jcnelson, hopefully he can clarify.

Create a generic, configurable indexing solution for gaia hubs

A common challenge for developers is how to query relational data that is spread out in multiple users gaia hubs. For example, a developer building a social media app, might want to find all users that have added a certain hash tag to their social media profile.

One solution to this problem is to have an open source and configurable indexing service that periodically indexes this data and provides an interface from which the app can perform relational queries.

We already do something similar to this as part of our search endpoint which lets apps search for strings in the profiles of the global set of Blockstack IDs.

We should consider creating a generic solution to this problem.

This issue was raised in this engineering meeting: https://forum.blockstack.org/t/2018-05-24-developer-products-engineering-meeting/5330/6?u=larry

Write tutorial for setting up your own Gaia hub

  • create docs directory to contain user facing documentation
  • move the muti-player-storage.md tutorial to docs/multi-player-storage.md (?)
  • tutorials we will need
    • local storage for end-users
    • developer storage through cloud hosting (Azure, etc)

Add GitHub as storage provider

Although it's not the perfect thing for all types of files, it has at least version control and a good API directly built in.
Very useful for profiles, I think.

Support messaging for file changes

Just thinking about collaborative use cases I wonder if gaia hub could support web socket connections for the multi-reader contexts.
That way clients could get notified when files they are interested in (hosted on hubs of other users) change.

This is just an idea at this point, no idea about the implementation implications. Just looking for feedback.

Create dedicated website for Gaia

This entails creating a dedicated website for Gaia so developers can learn about it and get started with development apart from Blockstack.

The specification can be found here: https://paper.dropbox.com/doc/Gaia-website-specification--ALOb6_2C93xM3YpR3U17uUz4Ag-9beIbRJOZuhGbLIp2EEcD

The WIP design can be previewed here: https://s.codepen.io/markmhendrickson/debug/e5f242d0eae52642a84dce3abc80be47

I'm taking a minimalist, content-centric approach to the design to start before turning much attention to the site's aesthetics (e.g. creating a color palette or vector-based graphics such as icons or diagrams for the various sections).

However, I'm thinking we may want to go with a green and blue earthy aesthetic, for natural (ha) reasons. I think an organic feel with subtle animations that makes the site feel alive and lush could be tied well to the conceptual idea that Gaia fosters an ecosystem of life aka apps that are interconnected via energy aka data.

Support driver models with read paths (e.g. Dropbox or Google Drive)

It would be nice to be able to support some drivers with more complex requirements (i.e., because the storage provider doesn't provide the ability to pre-determine the URL for a file).

These drivers would require that the gaia hub be invoked on the read path -- so if we want to support something like this, we need a "read-path" gaia driver type. For this spec, I'd propose a get endpoint:

GET /read/

Which will return the data (or, in an ideal world, a 301, but I think that these kinds of backends wouldn't typically have CORS headers set).

Blockstack core not working on localhost after installing gaia.

My Auth Login was working with simple Blockstack core installation.
But as soon as i install gaia with python install. I am unable to Login / storedata, it jsut get stuck on choose your data storage part.
Guide ASAP i am working on production blockstack.
Bear in mind that the error in console is following.
i am working in localhost envirement.
TypeError: NetworkError when attempting to fetch resource

Add support for application addresses with whitelisting

This has to do with governing which addresses a gaia hub will accept writes on behalf of, which is a necessity for operating private hubs (https://github.com/blockstack/gaia#private-hubs)

Currently, the config.json file governs address whitelisting in Gaia. If a user wishes to restrict the set of addresses which can write to the gaia hub, this can pose a major usability issue.

Basically, if you want to whitelist an "identity address" like aaron.id's 1EJh2y3xKUwFjJ8v29a2NRruPJ71neozEE -- you cannot actually derive, say, my application address for app foo from the ID (or even the xpub, because these are all bip32 hardened indexes)

There's a couple of approaches we could take:

  1. Add endpoint to the gaia hub which will allow the blockstack-browser to modify the whitelist. This would require the notion of "admin addresses" which can modify the whitelist (you wouldn't want applications, for example, to have this access)

  2. blockstack.js attaches a "proof" in the authentication header on gaia hub writes. This proof would show that the app address is derived from the identity address. The auth response token already proves this fact (by signing the JWT with the identity key, and including the app private key) --- however, this would require (1) sharing the auth response token and (2) the app's transit key, which we should not do. Instead, we'd probably want to include this as a new field in the auth response as a JWT with payload = { appPublicKey, identityPublicKey } and signer = identityPublicKey

I'm in favor of (2), as that's probably the simplest to implement, and it keeps the gaia hub basically stateless (you would just whitelist identity addresses, and then the application addresses for those identities would follow from there.)

Would love to hear thoughts on this -- tagging some people who I remember having views on this: @jcnelson @jackzampolin @larrysalibra @cwackerfuss

Monitoring for default gaia hub of matching read urls

We need to monitor the correctness of various kinds of gaia hub returns -- in particular, we should assert that the readURLPrefix returned by hub_info is the same as the publicURL being returned by the POST method.

Flow type checks consistency

Within the gaia codebase I have seen areas like server.js where there are flow type checks , I think would be best if flow type checks were used throughout the codebase or not used at all as it doesn't look consistent. At first glance I wasn't sure if it was Typescript or JS.

Inconsistent readURL use.

While experimenting with gaia-hub running against various providers, I noticed that the readURL configuration variable is used inconsistently.

In the disk driver, the readRUL is used without modification besides verifying a trailing slash:

this.readURL = config.readURL
if (this.readURL.slice(-1) !== '/') {
  // must end in /
  this.readURL = `${this.readURL}/`
}

In az it is prefixed with https:// and has

getReadURLPrefix () {
  if (this.readURL) {
    return `https://${this.readURL}/${this.bucket}/`
  }
  return `https://${this.accountName}.blob.core.windows.net/${this.bucket}/`
}

In gc and S3, it has https:// prepended, but nothing else

gc:

getReadURLPrefix () {
  if (this.readURL) {
    return `https://${this.readURL}/`
  }
  return `https://storage.googleapis.com/${this.bucket}/`
}

S3

getReadURLPrefix(): string {
  if (this.readURL) {
    return `https://${this.readURL}/`
  }
  return `https://${this.bucket}.s3.amazonaws.com/`
}

The inconsistency makes configuration confusing. I would suggest that readURL always expect a full url including bucket name.

I'm happy to fix this if you'll provide guidance on the desired behavior.

Thanks,
Bo

Formalize spec for Gaia Hub drivers

This spec should be the interface that needs to be satisfied by the different Gaia Hub drivers. Currently there are two drivers, AzDriver and S3Driver. The AzDriver is used in the deployed version and is the most complete. I've attempted to generalize below:

class SpecDriver {
  constructor (config) { 
    // instance of client for this storage provider 
    this.service = specLibrary.Client(config.credentials.accountName, config.credentials.accountKey)
    // Currently bucket for Az and S3, but maybe path more generally?
    this.path = config.path 
    // A winston logger config see /hub/server/config.js
    this.logger = config.logger
    // Needed for constructing URL in Az and S3
    this.accountName = config.credentials.accountName
    // Currently just the domain (see getReadURLPrefix in Az and S3)
    this.readURL = config.readURL
    // For setting cache headers on requests to the backend
    this.cacheControl = config.cacheControl
  }
  
  // Used to check if a filename is valid before writing
  // Could we change this to an instance method?
  static isPathValid (path) { return bool }

  // This returns the root path from which to perfom reads
  getReadURLPrefix () { return readURLPrefix }

  // Currently doesn't return anything, should return a promise?
  performWrite (args) { return Promise }
}

@kantai @jcnelson @larrysalibra

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.