joystream / joystream Goto Github PK

Joystream Monorepo

License: GNU General Public License v3.0

Shell 0.79% Rust 60.89% Dockerfile 0.08% JavaScript 0.17% Batchfile 0.01% TypeScript 36.06% Python 0.17% Jinja 0.06% Jupyter Notebook 1.71% Handlebars 0.08%

joystream substrate rust blockchain cryptocurrency typescript

joystream's Introduction

Joystream

This is the main code repository for all Joystream software. In this mono-repo you will find all the software required to run a Joystream network: The Joystream full node, runtime and all reusable substrate runtime modules that make up the Joystream runtime. In addition to all front-end apps and infrastructure servers necessary for operating the network.

Overview

The Joystream network builds on the substrate blockchain framework, and adds additional functionality to support the various roles that can be entered into on the platform.

Development

For best results use GNU/Linux with minimum glibc version 2.28 for nodejs v18 to work. So Ubuntu 20.04 or newer.

You can check your version of glibc with ldd --version

The following tools are required for building, testing and contributing to this repo:

Rust toolchain - required
nodejs >= v14.18.x - required (However volta will try to use v18.6)
yarn classic package manager v1.22.x- required
docker and docker-compose v2.20.x or higher - required
ansible - optional

If you use VSCode as your code editor we recommend using the workspace settings for recommend eslint plugin to function properly.

After cloning the repo run the following to get started:

Install development tools

./setup.sh

If you prefer your own node version manager

Install development tools without Volta version manager.

./setup.sh --no-volta

For older operating systems which don't support node 18

Modify the root package.json and change volta section to use node version 16.20.1 instead of 18.6.0

"volta": {
    "node": "16.20.1",
    "yarn": "1.22.19"
}

Run local development network

# Build local npm packages
yarn build

# Build joystream/node docker testing image
RUNTIME_PROFILE=TESTING yarn build:node:docker

# Start a local development network
yarn start

Software

Substrate blockchain

Server Applications - infrastructure

Storage Node - Media Storage Infrastructure
Query Node
Distributor Node

Front-end Applications

Pioneer v2 - Main UI for accessing Joystream community and governance features
Atlas - Media Player

Tools and CLI

joystream-cli - CLI for community and governance activities

Testing infrastructure

Network integration - Joystream network integration testing framework

Running a local full node

git checkout master
WASM_BUILD_TOOLCHAIN=nightly-2022-11-15 cargo build --release
./target/release/joystream-node -- --pruning archive --chain joy-mainnet.json

Learn more about joystream-node.

A step by step guide to setup a full node and validator on the Joystream main network, can be found here.

Pre-built joystream-node binaries

Look under the 'Assets' section:

Ephesus release v8.3.0

Mainnet chainspec file

joy-mainnet.json

Integration tests

# Make sure yarn packages are built
yarn build

# Build the test joystream-node
RUNTIME_PROFILE=TESTING yarn build:node:docker

# Run tests
yarn test

Contributing

We have lots of good first issues open to help you get started on contributing code. If you are not a developer you can still make valuable contributions by testing our software and providing feedback and opening new issues.

A description of our branching model will help you to understand where work on different software components happens, and consequently where to direct your pull requests.

We rely on eslint for code quality of our JavaScript and TypeScript code and prettier for consistent formatting. For Rust we rely on rustfmt and clippy.

The husky npm package is used to manage the project git-hooks. This is automatically installed and setup when you run yarn install.

When you git commit and git push some scripts will run automatically to ensure committed code passes lint, tests, and code-style checks.

During a rebase/merge you may want to skip all hooks, you can use HUSKY_SKIP_HOOKS environment variable.

HUSKY_SKIP_HOOKS=1 git rebase ...

RLS Extension in VScode or Atom Editors

If you use RLS extension in your IDE, start your editor with the BUILD_DUMMY_WASM_BINARY=1 environment set to workaround a build issue that occurs in the IDE only.

BUILD_DUMMY_WASM_BINARY=1 code ./joystream

Authors

See the list of contributors who participated in this project.

License

All software under this project is licensed as GPLv3 unless otherwise indicated.

Acknowledgments

Thanks to the whole Parity Tech team for making substrate and helping in chat with tips, suggestions, tutorials and answering all our questions during development.

joystream's People

Contributors

Stargazers

Watchers

Forkers

mnaamani siman ltfschoen h4x0rbbq ksinghf juniuszhou blrhc shamil-gadelshin yourheropaul aarlt gleb-urvanov lezek123 bedeho dzhelezov semeano herou kdembler ondratra 0mustafasevinc0 bwhm chainlito simplexprotocol dzhidex metmirr dimitreee web3capital cae1um 96radhikajadhav 0xvolodya traumschule iorveth sulejman kiims lopegor jamiehewitt15 nuke-web3 freakstatic nkhlghbl ahhda ignazio-bovo ezaruba polymath-is xmonader maxlevush-coinside chandrastation p-sad liqum jmdtol thang83473361 pengandkun igrexac thesan oleksanderkorn badabum l3pereira destenson zeeshanakram3 ajunlonglive ihor171 standardgalactic prom3theu5 singulart jbelke playfloor bit2world sergey234 yasiryagi softwareengineerdeveloperh mkbeefcake kuatcapital crazysergo drillprop testlool wradoslaw ladaalekz tenyandr1 yikker30 linkoln477 aletm3 yohana25 golddydev okayko leetjoy goldstarhigher lbjbhdr kravangelika alvare555 jerrsuo darkobtc ytrewq3210 denisnagello eshark9312 rijulraj456 chrlschwb elizabethdev553 vrrayz attemka ivanturlakov lucky4angel lisa000123

joystream's Issues

Substrate coding conventions I

Add your suggestion as a comment!

Background

Our Substrate code base is starting to get more complicated, and it would be a benefit to harmonise the set of major conventions we follow, so as to follow good best practices, and make reviews more efficient. The goal of this issue is to accumulate suggestions over time, as replies, which we can turn into an eventual convention document. This document can further be turned into rules for our CI linter.

Major questions that

How to decide what is its own module, vs. combining with existing module?
Modules deserve their own repo? every module?

Initial suggestions

All maps must map to Option to avoid default construction behaviour of StorageMap from allowing us to be lazy about checking ::exists on the same map before lookup.
Always try to make a module easily reusable for another runtime, by:
- If possible, always provide your own traits for your expectations on other modules, rather than relying on public traits, or traits in the module you are expecting to use.
- Don't define and implement traits for your own module, see point above.
- If possible, strive to implement the business logic of your module separately from the runtime module itself, in a substrate agnostic way.
Assert as many invariants as possible!

Complete forum runtime specification

Add ASCII art of on app start

Implement Data Object Type Registry

Data Object Type field
Registry as map of ID to DOT
Modify only as root
Tests

Add readme

Create a README with instructions for how to build a node with another runtime.

Integrate existing forum work into upstream feature branch

This is literally about just establishing a baseline feature branch which builds, it will have a lot of commented out code, and to share the little amount of progress I have made so far.

WIP: Proposal: platform storage utlization tool

A tool to, based on on-chain state, figure out how much capacity exists for different data object types, and the rates at which they are being consumed over recent time scales. Should be used by the working group as a way to detect how to expand storage capacity over time.

write tests for staked roles runtime module

Add integration tests

We have observed that unit tests at the module level may still leave bugs at the runtime or node level. The easiest way to this is to address this is to have integration tests which spin up a full node and submit transactions of interest, and then asserting relevant state changes and events.

This test should run as part of CI.

Implement Content Directory

Content directory entries split between struct fields and JSON "blob" for extra data
Map of ID to entry

Add a notion of acceptable media/MIME/content types.

Data Object Types on the chain are not file types, but relate to storage tranches. However, for different purposes, different media types should be permissible for uploads.

This is really part of the as-yet unspecified filtering in the whitepaper.

Enforce migration execution before any other logic.

Question we posted in Substrate Riot chat

We are doing a runtime upgrade which requires running migration logic on the storage as well. This is done by running this logic in the on_initialize handler of a designated migration module.

How do we ensure that the handler for this callback is called first in the given module, in case other of our modules have code in their on_initialize that relies on the migration already having been executed after the runtime upgrade.

Use Consistent version numbers

In next release, be consistent with versioning.
Use same version number for github tagged release, cargo.toml package version and runtime to avoid any confusion.

The version should be of the form v{Authoring}.{Spec}.{Impl} declared in runtime/Cargo.toml

See this documentation to better understand how the RuntimeVersion is interpreted.

Proposal: Storage & distribution system benchmarking tool

Background

In the Joystream static data storage and distribution system, there are service provider participants paid for accepting uploads, synching, storing and distributing data. These providers will as part of their participation and compensation commit to having a certain quantity of resources available for utilisation at all times, and making them available to platform participants in a timely and reliable manner.

Currently, we have no tools for automatically measuring whether any given participants or the system as a whole, is actually performing as expected at some given amount of load. Such tools will not only be of use to us as we are developing, but also the storage working group when it is operational.

Goals

Be able to automatically determine how performant our node software and solution architecture is at different levels of load. This also includes the utilisation of on-chain resources.
Provide genuine real world like incentives for testnet participants, by being able to screen and reward participants based on performance.
Easily identify any catastrophic problems
Build tools the storage working group will require when operational.

Proposal

Disclaimer

This proposal needs to be augmented when distributors have been finalised as a standalone role. Two key assumption made in this proposal already

Storage and distributors are organised into groups, where all group members are replicating the same fully redundant service commitment over the same data objects.
Storage providers will allow any other storage provider or distributor to download from them, regardless of group membership.

Overview

This proposal is intended to be deployed for a version of the storage and distribution infrastructure where the distributor and storage provider roles have been separated, and storage and distribution groups (before called tranches) have been introduced.

The proposal has the following three key characteristics:

First, it proposes a format for defining a collection of tests, called a test battery, one may want to conduct against the storage and distribution infrastructure. This format is aimed at making it convenient to prepare and conduct such a tests, as well as conveniently analysing the result. A single test battery may include a variety of different particular tests, called test scenarios. Each scenario may vary in what part of the infrastructure they are targeting, in terms of both actors and function. Some scenarios may have on-chain or infrastructure side effects that other scenarios depend upon, while others not. In order to accommodate this, the scenarios are organised into a scenario DAG, where a directed edge indicates that the destination scenario can only be initiated when the source scenario has been completed.
Second, it makes it possible for the test conductor to query the infrastructure from multiple different hosts, called sentry test nodes, in a single battery or scenario. This allows the operator to capture whether different hosts are satisfying given geographically contingent latency constraints, and also makes it harder for an adversarial host to detect a test an change behaviour strategically. The operator controls the test via a node called the core test node, and lastly there is a data source node which is expected to serve all required data used for uploads in tests. This decoupled architecture can also just run on a single host, with a single data source and sentry node, if desired. Both sentry and data source nodes are inspected and operated through a RESTful HTTP API.
Third, it defines a set of different roles for hosts involved in conducting such a test, and corresponding software tools which will run on these hosts.

Concepts

Personae

A personae is the set of resources and information required to act as an active platform member in some specific role. Herein we specifically speak of personae for memberships in the consumer, storage provider and distributor roles. Each is made up of the seed for the private key for the primary account, that is the account corresponding to the membership, as well as any sort of role identifier that may be required (e.g. storage provider ID).

Data Object Profile

Given a deployed storage system, with an active data object type registry, a data object profile is a valid data object type and CID for an underlying data object which satisfied the data object type constraints.

Test Battery

A test battery defines sequence of tests to be conducted, and the corresponding resources or resource identifiers, for conducting the test. It is defined by the follow properties:

Name: Name of the test.
Description: Human readable description of test.
Consumer Personaes: Vector of consumer personaes, where the index of a personae is called the consumer personae ID.
Storage Provider Personaes: Vector of storage provider personaes, where the index of a personae is called the storage provider personae ID.
Distributor Personaes: Vector of distributor provider personaes, where the index of a personae is called the distributor provider ID.
Chain: Information needed to identify and connect to chain.
Data Object Profiles: Vector of data object profiles, where the index each is called the data object profile ID.
Data Source URL: URL for data source HTTP service at which one may download all required data objects by their ID, where the index is called the data source ID.
Sentry Host URLs: Vector of URLs for sentry host HTTP service, where the index of each is called the sentry node ID.
Scenario DAG: A vector of test scenarios, where the index of of each is called the scenario ID.

Test Scenarios

There are multiple scenario types, but all have the following properties:

Name: Name of the test.
Description: Human readable description of test.
Dependencies: Vector of scenario IDs which this scenario depend upon. May be empty.
Sentry Nodes: Whether to use a specific sentry node, identified with a sentry node ID, or any random node.
Request Pipeline Width: How many simultaneous outstanding requests to target per active sentry node.
Max Resolution Time: The maximum number of seconds it can take to resolve a host from a key before it is considered down.
Pause Time: Minimum pause time between download requests, both failure and success.

Beyond this there are two different types, with a corresponding set of special properties

Upload

A scenario where a set of new data objects are uploaded, and it has the following properties

Data Objects: Non-empty vector of data object profile IDs.
Consumer Personae: ID of the consumer personae ID to use for all uploads.
Max Upload Time Per Byte: This value, times the byte size of the data object, is the maximum number of seconds an upload can remain in progress before it is deemed a failure.

Download

A scenario where a set of data objects are downloaded from storage providers, and it has the following properties

Filter: A regular expression which will be used to filter the set of objects, by data object ID, to download.
Target: Is one among the following, with the corresponding semantics
- Nothing: Download all (filtered) data objects from all groups.
- A storage/distributor group ID: Download all (filtered) data objects from all members in the given groups.
- A storage/distributor group membership ID: Download all (filtered) data objects Can either be nothing, a specific storage group ID, or a specific storage membership ID. Nothing is interpreted as downloading from every group.
- Personae: The personae type and personae ID to use when connecting. Obviously, when connecting to a storage provider, a consumer personae is not valid.
Max Download Time Per Byte: This value, times the byte size of the data object, is the maximum number of seconds a download can remain in progress before it is deemed a failure.

Storage provider list

Anyone can enter
Kicked out by root

Forum Wishlist

Below is a WIP list of improvements for the Forum.
I have contained everything in one issue here, although some will require runtime changes, and are maybe not realistic.

Also credit to ascii in this thread, as I stole plenty of his ideas:

Functionality

Runtime + Pioneer

Multiple forum sudo(s).
- Can be hierarchical or not.
Modify category and subcategory name/description (sudo)
Delete entire thread, and all replies (sudo)
Delete thread (poster)
- Only if there are no replies. Perhaps also only within n blocks
Modify thread title (sudo)
Modify thread title (poster)
- Only if there are no replies. Perhaps also only within n blocks.
- Arguably, we can skip this and have to the poster ask forum Sudo (not practical though)

Pioneer Only

Threads and posts:

Creating threads/posts:

Preview of post (similar to how github does it). This may require a lot of work, but we should at least be able to help users see exactly how it will appear, as markdown isn't specific enough.
Editor to help users not familiar with markdown to add links, etc. Again, something like the one here in github would be nice, although you could make it even more friendly for non-technical users. (Obviously, I don't expect you to code these things yourself...)

Show more `chain state` data.

Show threadId in small font below threadTitle
Show time and date of post in the header, or below Avatar/Handle.
- If post has been edited, show history of edits in the footer.
- (Considered adding blockheight as well, but that might be confusing when we transfer old forum posts to new testnets...)
Show n as in "nr_in_thread":n at the far right of the post header.
- Use n such that you can link to a specific post in a thread. https://testnet.joystream.org/acropolis/pioneer/#/forum/threads/8/n or something.
Add a Reply button below the "bottom" post in each thread.
In the footer of each post, add a Quote button.
- Instead of the standard markdown Quote (see below)
- render the entire post inside the reply, in a darker grey color (see wireframe image)

> Quote some text from previous post

My reply

Front Page and Categories page

Front Page changes:

From:

To:

Category	Subcategories	Threads	Posts	Creator	Moderator
Name	Number	Number	Number	As Is `*`	Forum Sudo`**`

* and **
Not sure how much sense this makes while we have one Forum Sudo for the entire forum. As I am not exactly sure how the Communication Moderator role will work in the future, this will be a temporary fix anyway.

Subcategory Page

From:

To:

Notes:

Both shown as forum sudo (dropdown should not be visible for non-sudo)
This assumes "final" level of subcategories. ATM, there is no need for a level in between.

Report: Content Directory Encoding Discussion

WIP.

On-chain schemas redundant in content directory?

Background

The way the current content directory encoding is done is described here

https://github.com/Joystream/joystream/blob/master/reports/archive/1.md#the-json-approach

In particular, it references that all schema will be stored on chain with unique ids, and that content items will have a reference to one such schema via this id, and then have its actual json encoding stored as well.

Problem

There appears to be no good reason to store the schemas on-chain. Any production user experience using the content directory cannot be schema-driven, it has to render an experience which is tightly coupled to the schemas it supports at the implementation level. Hence the validation rules carried in the scheme must be in the application to begin with. This is indeed what we are doing now in Pioneer also obviously. Any new schemas introduced cannot be handled by such an application, and it must be updated. Hence there is no good reason to store the schemas on-chain.

Even the schema ids may be redundant for this same reason, as they then are not validated on-chain in any way, which was the main principle for putting things inside the json field in the json proposal.

update handle validation method to limit to: `^[a-z0-9_]+$`

regex package from crates.io doesn't compile to wasm, so need alternative or roll our own.

Add a metadata schema registry.

We're discussing back and forth how to store metadata about content, and one of the best approaches IMHO is to use https://schema.org/ for it.

The main reason is that embedding metadata in a future website in this format will make the content discoverable in most search engines, which is fairly important in the future.

But beyond that, it's also a fairly complete description of common audio and video metadata, so a great place to start.

We have two ways of going forward, though: either reference schema.org URLs, e.g. https://schema.org/VideoObject - or reference a Joystream internal schema that happens to contain most of the same fields, but can be extended.

Between @siman and myself, we prefer the latter.

The proposal, then, is to have a schema registry on chain where we map a numeric schema ID to a BLOB that contains a https://json-schema.org/ description of the metadata. In the content directory, then, we reference those IDs.

Refactor Storage module for Substrate

Create Wiki or README with instructions on how to create and verify runtime upgrade proposals

Rename "dot" variables to something else

DOT is already used in Polkadot, and shouldn't be used as an abbreviation for Data Object Type.

Sideproject: Rearchitect Pioneer

Background

What is Pioneer?

#338

The problem

These are the primary problems in the application architecture, in no particular order:

It is hard to understand and make changes because the organization is so complex and disorganized, and also includes lots of unused functionality from the original fork.
It is hard to test anything, both component rendering and business logic.
It is hard to reuse any work in other applications.
It is hard to develop the application independently of the full node backend, e.g. to explore new features or behaviors before the backend is ready
- There is no easy way to interact with individual React components in isolation with the given mocked state.
- There is no easy way to setup more complex mocked application states and interact with the full UI in such hypothetical scenarios.
It is hard to selectively pull in some changes, but not others, from the upstream apps repo. Pulling in more changes than we need results in us having to refactor parts of our application which we do not wish to update.
Lots of critical business logic is stuck to React components, such as signing transactions, even though it would be desirable to be able to use them in another context.

Some of these problems are related or may have a common underlying source. Be aware that the application has a broad range of shortcomings in terms of usability, but this is not the focus here.

Goal

A fresh implementation of the Pioneer app, effectively with the same external behavior, which is better-suited future development of it and other applications.

Requirements

Must start in a new repo, hosted on our Github org.
CI which runs tests, linters and deployment.
Introduce proper application and UI state management using Redux. Atlas will become a very large and complex application, so favour using the full set of Redux tools (e.g. actions, reselect, saga, observable, normalizr) to encourage good conventions, readibility, etc.
Only make pure React components.
Same as Pioneer:
- basic key management and backend data source.
- UI/UX.
- core tech stack: typescript, react, etc. However, no need to use Polkadotjs in the same way, or at all.
Introduce modern tooling and conventions for testing: both component rendering and business logic must be testable.
Introduce modern tooling for developing:
- hotloading
- be able to run with fully mocked data sources/local storage.
- Storybook for exploring components.
Introduce modern tooling for deploying.
Works with Rome testnet (soon to be released)
Setup for internationalization.
Organize the codebase into a set of assets, packages, and libraries which make it easy to reuse work in other similar applications.
Establish and writeup clear coding and architectural guidelines that prospective contributors can review or be directed towards.

Milestones

Concise 1-2 page writeup of a plan for how to address the described requirements. Every single requirement must be addressed directly. If you believe there is a better solution to address the same problem the requirement is meant to solve, then describe this alternative.
A sequence of software deliverables which can be decided iteratively, over 1-2 week sprints.

WIP: Research: Constrained runtime upgrading.

WIP

TLDR: How can we put constraints on runtime upgrades in order to preserve some sort of guarantees about what is being changed? e.g.

we may want to not do a wholesale upgrading of the entire runtime at a time, perhaps just modules
even for just a single module, we may want to only upgrade som subcomponent
we may want to put some constraint on the change that can be made, e.g. block any upgrade which changes the allocation of funds in the platform token.

There is initial work on these topics in the following

Cap9/BeakerOS: Cap9 is a capability-based exokernel protocol to run on Ethereum. As a smart contract framework, Cap9 provides developers the ability to perform upgrades in a secure and robust manner - preventing privilege escalation at any point in the development process.
ZeppelinOS: ZeppelinOS is an open-source, distributed platform of tools and services on top of the EVM to develop and manage smart contract applications securely.
AragonOS: A computer operating system manages how applications access the underlying resources of the computer—the hardware. aragonOS does the same for decentralized organizations or protocols. It abstracts away the management of how apps and external entities access the resources of the organization or protocol. These resources can be things such as assets, cryptocurrencies, the rights to claim a premium on a loan, or the rights to upgrade a smart contract. Its architecture is based on the idea of a decentralized organization or protocol being the aggregate of multiple components (called applications) connected by a pillar, called the Kernel, which are all governed by a special Access Control List (ACL) application that controls how these applications and other entities can interact with each other.

Sideproject: POC Substrate secure messaging integration

POC Substrate secure messaging integration

Status

Available

Purpose

The purpose of this POC is to

Define a self-contained project which new developers can tackle as part of the mutual evaluation process for joining the Jsgenesis development team.
Generate some initial findings, specs and implementations which can make their way into a Joystream testnet in the near future.

Background

The platform should have a communication subsystem in order to facilitate participant coordination in a number of different contexts. A key component of this communication subsystem is a messaging system. In the future, a messaging board will also be introduced.

Recall that the platform has

a membership registry where each member has a unique (human readable) string handle, signing keys and a media-rich profile.
a set of moderator authorities which are themselves policed by the governance system.

Goals

The messaging system is integrated with the following ways

The namespace for user handles in the messaging system is the same as the handle namespace in the on-chain membership system. Specifically, end users will be able to resolve the appropriate handle for a messaging system participant by looking up an on-chain mapping from the user key to the handle.
The communication in the messaging system is authenticated and possibly encrypted (depending on context) using keys, directly or indirectly, from the on-chain membership system.
The rooms in the messaging system have human-readable names in a namespace governed on-chain. Specifically, end users will be able to resolve the room authority (which is most likely identified by a public key) based on a human-readable room name by looking up an on-chain mapping from room names to room authority identifiers. As a consequence, the moderator of a given room is also then determined by the chain.

Requirements

Messaging protocol

Must support

direct messaging, with confidentiality
group/room messaging, and this must observable by anyone, not only participants in the group
non-moderating participation must work within browser constraints
servers/relayers cannot read or forge messages, or alter message order

Benefit, but not an absolute must, if it supports

receive messages offline, which are persisted. Assume exogenous incentives
group/room messaging confidentiality
forward secrecy

Practical

Use some existing messaging protocol.
Substrate runtime need not include the governance mechanism itself, changes to a governed state can be done through the authority key.
Re-purpose an existing messaging client, don't build from scratch, must work in browser/Electron.

Milestones

Select messaging protocol & client after reviewing multiple options: e.g.

Matrix (in developement?)
XMPP
Whisper (in developement?)
IRC
HOPR (in developement?)
Something over Webrtc?
other...

Write high-level specification
Implement required changes in messaging client, and
3.1. Centralized (skip if confident): role of the chain is played by a hosted web service, and messaging client integrates with it, as test.
3.2. Blockchain: replace role of web service with substrate instance

Deliverables

Messaging protocol review, analysis and decision
High level specification
Altered messaging client
Substrate runtime + CLI

Remove the `memo` field

When all roles can be integrated with a membership, there is no longer a need for the memo field.

Add bootstrap nodes for DHT on chain

Relates to Joystream/storage-node-joystream#26

(Rough) Design Proposal: Storage & Distribution System

Background

This is not an explanation at the level of runtime modules and protocols. It needs to be split up properly for reusability etc. It combines designs for working groups, proposal system and tranches (here called groups) for both distributors and storage providers.

Overview

The storage and distribution system is responsible for long term storage and distribution of static data objects.

Principles

All values are immutable unless explicitly listed otherwise
No value is ever deleted from the state, only marked as no longer in use.

Concepts

Payload Filter

An operationalized way of screening a data object deeply, as defined by

ID: A unique integer identifier.
Max Size: If set, the maximum possible data size.
Min Size (optional): If set, the minimum possible data size.
Inspection Routine: If set, the raw byte code encoding of WASM (or Javascript perhaps) pure function to be provided raw payload, and which decide whether the payload is acceptable.

Notice that payload size has been lifted out of the inspection routine for efficiency, as it will frequently be the primary dimensions to filter on, and explicitly specifying this outside of the routine allows for checks without always going through the costly exercise of loading a runtime environment for the routine.

Data Object Family

A shared profile for how a family of data objects is treated in the system, as defined by

ID: A unique integer identifier.
Description: Human readable description of the purpose of the data object type.
Payload Filter: ID of payload filter to be used.
Expected Daily Download Frequency: How many times during a representative 24 hour time period is it expected that a download session will be initiated.
Expected Download Progress: What percentage of the total data object is expected to be downloaded per download session.
Minimum Distribution Bitrate: The minimum rate at which such an object must be delivered.
Distribution Breaks: An array positions in the data stream, as well as corresponding time durations, where distribution will pause the given amount of time. So e.g. [(0,5s), (10MB, 10s)] means to pause at the beginning for 5 seconds, and at 10MB mark for 10 seconds.
Feasible Storage Groups: Either not set, which means any, or set to a list of specific storage group IDs.
Feasible Distributor Groups: Either not set, which means any, or set to a list of specific distributor group IDs.

NB: Formerly called data object type.
NB: Policy information about who is allowed to download what data object under what circumstances is exogenous to the system

Data Object

Presence of a static data blob in the system, as defined by

ID: A unique integer identifier.
Size: Number of bytes occupied by data.
CID: A secure hash commitment over data. This needs unpacking, e.g. to allow chunking, and even perhaps variable chunk sizes for different data types?.
Family: ID of data object family.
Assigned Storage Groups: List of IDs of storage providers currently responsible for storing a data object.
Added: Date and time for original upload event.
Origin: ID of the member who uploaded the data.
Liaison: ID of storage provider that was assigned the upload from the origin.
Status: One among
- Pending Liason Review: Liason is validating the object.
- Rejected By Liason: Upload was invalid.
- Accepted By Liason: Upload was valid.
- Removed: No longer stored for whatever reason.

Storage Group

A collection of storage providers, with identical terms of participation and fully replicated storage, as defined by

ID: A unique integer identifier.
Status: One among
- Expired: meaning the group is no longer in use, no members are part of the group, no no new members can enter, nor data objects can be added.
- Active: meaning it is fully operational.
- Paused: meaning it is temporarily not in use, hence no new member can join, and no new data can be added.
Slots: The number of providers which can at most be part of the group at any time.
Storage Utilisation: The total size of all data objects assigned.
Required Stake: The number of tokens required to currently enter this group.
Required Storage Capacity: The amount of storage capacity which any participant is expected to be able to store.
Required Total Downstream Bandwidth: The required total amount of downstream bandwidth required.
Eviction Slashing Percentage: The max percentage which can be slashed during an eviction.
Eviction Gating: One among
- Conductor: The conductor can unilaterally evict.
- Council: The conductor can only recommend eviction to the council, council decides.
Exit Terms: The earliest time when an exit can be initiated by the group member and the unstaking period from initiation of exit. During this period an eviction can still occur.

Storage Group Entry Application

An application for a member to enter a storage group as a storage provider, as defined by

ID: A unique integer identifier.
Storage Group ID: ID of the storage group.
Applicant: ID of membership for application.
Submitted: When the application was submitted by the applicant.
Expiry: How long after submitting the application will automatically expire.
Status: One among
- Pending: Initial status when created.
- Accepted: Was accepted, storage provider is in the group, includes when, and with rationale.
- Rejected: Rejected, storage provider not in the group, and cannot enter group based on this application, includes when, and rationale.
- Withdrawn: Application no longer active, initiated by the applicant.
- Expired: Application no longer active.

NB: Could add a separate possible application staking fee, both to avoid DoS abuse, and also signal the seriousness of applicants

Storage Group Membership

Membership of a given provider in a given group, as defined by

ID: A unique integer identifier.
Storage Group ID: ID of storage group in which membership applies.
Membership: ID of membership for application.
Established: When membership was established.
Application: ID of application which was accepted.
Status: One among
- Entering: In the process of becoming a fully operational member.
- Normal: Fully operational.
- Paused: Not actively servicing group or distributors at this time, since some point in time.
- Exiting: Is in the process of exiting, initiated at some time.
- Exited: Has completed exiting at some point in time, is no longer part of the group.
- Eviction: One among
  - Council: ID of eviction proposal
  - Conductor: Conductor has unilaterally evicted provider, with accompanying rationale, at a given time.

NB: Perhaps this can be generalized, this could be the general structure for any working group membership?

Storage Provider Eviction Proposal

Proposal to evict a storage provider from a group, as defined by

ID: A unique integer identifier.
Membership: ID of storage group membership where the provider was evicted.
Time: When it occurred.
Slashed: Amount slashed.
Conductor: ID of the conductor.
Rationale: Description of the underlying cause of eviction.
Deliberation: ID of deliberation which may have transpired.
Status: One among
- Opened: Open for deliberation
- Affirmed: Affirmed resolution.
- Cancelled: Cancelled resolution.

Deliberation

A discussion thread concerning some topic, as defined by

TBD

Deliberation Post

A post in a deliberation, as defined by

TBD

Distributor Group

A collection of distributors, with identical terms of participation and fully replicated distribution, as defined by a set of properties identical to the Storage Group, with the exceptions that Required Total Downstream Bandwidth is replaced Required Total Upstream Data Capacity Per Month.

Conductor

Analogous to Storage Group Membership, exceptions

evictions cannot happen via conductor, the only council based
Stake is stored directly, not in group <== this needs more thought

NB: Perhaps this can be generalized, this could be the general structure for any working group lead membership?

Conductor Entry Application Proposal

Analogous to Storage Group Entry Application, except as proposal.

Shared State

Conductor: ID of current Conductor, if any.
Conductors: Map ID to Conductor.
Storage Groups: Map ID to Storage Group.
Distributor Groups: Map ID to Distributor Group.
Data Object Families: Map ID to Data Object Family.
Data Objects: Map ID to Data Object.
Storage Group Entry Applications: Map ID to Storage Group Entry Application.
Storage Group Membership: Map ID to Storage Group Membership.
Storage Provider Eviction Proposal: Map ID to Storage Provider Eviction Proposal.
Payload Filters: Map ID to Payload Filter.

NB: Proposals (Storage Provider Eviction Proposal, Distributor Group Eviction Proposal, Conductor Entry Application Proposal) not here, unclear how to organise, and where

State Transitions

Here is (soon to be) complete list of state transitions supported.

Storage group expires
Storage group membership application expires
Distributor group membership application expires
Pending Storage group entry application expires
Accept, reject, withdraw conductor application proposal
Add storage group
Mutate a new storage group
- Pause or unpause group status
Add distributor group
Add data object family
Mutate a new data object family
- Update feasible storage groups
- Update feasible distributor groups
- Expected download frequency
- Expected download progress
- Distribution Breaks
Add a new data object
Mutate a data object
- Update assigned storage groups
- Set liason status
- Set as removed status
Add payload filter
Apply to storage group
Apply to distributor group
Exit storage group
Exit distributor group
Pause storage group membership
Pause distributor group membership.
Evict storage group member
Evict distributor group member

NB: missing proposal related state transitions, also deliberation

Communication Protocols

This section is highly incomplete, as it probably should incorporate some of the already existing design I am not familiar with. But the upload is shown for reference.

User upload

User issues transaction for adding a data object, which includes CID, size and object family. Runtime ensures that there is sufficient storage capacity, in that there is at least one feasible active with a normal status member, and which has sufficient space, and randomly assigns to one among them, and picks a liaison. The space utilization is automatically updated at this time. Otherwise, the transaction is rejected.
User resolves and connects to a host corresponding to the liasion, and attempts to make an upload by providing
- A reference to the new data object created
- The raw payload
- Optional: Request token
- Two signature over request using account corresponding to membership, one including request token if present, the other without.
Liaison validates owner, upload, and status of group. If checks pass, any applicable access policy can be applied based on request token, or otherwise. If access policy fails, upload is rejected on chain, and the interaction ends, otherwise, the status is set to accept.

The access token optional parameter which can, in a given instantiation of the protocol and system, carry information useful to determining under what context to allow or reject the upload.

NB: Why not let user directly pick liaison at random, based on chain state? It simplifies everything, offloads transactions, etc.
It also means there is no need for cleaning up failed uploads because only successful uploads are added.

User download

TBD

Distributor download

TBD

Storage provider download

TBD

Upload data object

TBD

Conductor Reporting

A conductor has to operate two separate public endpoints, one for submitting errors and one for submitting non-error utilization information. The purpose of the former is to make early stage detection of faults or malicious behavior, which could trigger direct inspection and inquiry. This could be things like

User is unable to resolve or connect to liaison.
User upload interrupted.
Distributor is unable to download from storage provider.
Storage provider is unable to download from storage provider.
Invalid data sent.

The purpose of the latter is to guide how storage and distribution resources are deployed, as well as maintain usage statistics, such as view counts, etc.
This could be things like

Upload initiated and or succeeded or failed for a given reason
Download initiated and or succeeded or failed for a given reason

It could be useful to have both user and infrastructure software report on the same events. Mismatches could be valuable information to guide policy.

Thoughts

Is there room to generalize some concepts? seems to be a lot of repetition for different roles and interactions with council. e.g. the concept of a working group, the lead, participation, membership in the group, application to the group. All of it could be generalized, and there could the capability to dynamically create group on chain even, for more bespoke purposes. It would take more work to give such a dynamic group members on chain capabilities, by defining what one can and cannot do dynamically. But that last step could be overkill. At least we coulda avoid recording the same almost identical structure of all the working groups (curation, communication, builders, discovery, storage&distribution, ...)
Should deliberation be here? should it be in generalized working group structure?
We must make a protocol and module design which allows for someone to reuse this type of subsystem in their Substrate chain.

Add a sudo fn: removeKnownContentId(id: ContentId)

As a bonus add one more sudo fn:
setKnownContentIds(ids: Vec<ContentId>)

Docker image

Hi there,

I have talk to someone from the team at #sub0 about the need for a docker image for the project. I can help with that. You can ping me on Riot at @chevdor:matrix.org.

Proposal: Accounting queries

Background

There will be a number of token flows across the platform for a variety of purposes. In order for the council and voters to exercise rational intertemporal planning, they require concise credible public information about these economic variables, both for the present and past.

Proposal

A set of queryies in the query node which track key economic flows over define time periods and commits this history to state. The major features could be

Events: The key financial events being tracked could e.g. be tokens burned, tokens slashed, tokens minted, token reward payout.
Categories: All all events to one among a dynamic set of categories, such as validators, storage etc.
Multi-asset: Supports tracking multiple assets, not just the native one.

WIP: Sideproject: Secure in-browser bootloader script

WIP: Secure in-browser bootloader script

Status

Available

Blockers

No currently working javascript light client & warp sync

Purpose

Filler project for Acropolis in case @siman runs out primary tasks, otherwise becomes normal side project with standard purpose:

Define a self-contained project which new developers can tackle as part of the screening process for Jsgenesis development team, as a side project.
Generate an actual working implementation which can be shipped the next testnet after completion of the side project.

Background

The browser has been chosen as the primary distribution environment for user-facing applications in the Joystream ecosystem. Currently, our only user-facing application is Pioneer, and it is distributed through www.joystream.org primarily. In particular

There is no server-side user specific state, as the user keys and state is stored in the browser
The client application has hardcoded to interact with the blockchain through a single full node instance operated by us, which is hardcoded in the Pioneer instance being served.
The client application is not performing any light client validation on the responses from the full node, and is not syncing the block headers.
The client application has no way of doing public key => {hosts} lookup which is secure, in order to resolve other storage providers, etc. In Pioneer our storage instance host is hardcoded as the liaison.

Goal

Have a javascript based bootloader script which can always secure load up a full working instance of a client application for Joystream with minimal hardcoded trust relationships.

This will allow us to publish static loaders on various secure immutable systems (ipfs, dat, bittorrent), which people can then use to load the application experience correctly and securely in perpetuitiy, avoiding the problem of no DNS+app store custody capacity for the platform.

Requirements

Loader works in a normal pure browser environment (not Nodejs), and also in an environment such as Electron, etc.
Does full light client validation of responses from full nodes: Check status if javascript light client for Substrate
Loader never changes, but can always load some client application even as the runtime has been upgraded and/or that client application has changed.
The client application (which need not be Pioneer in principle), needs to have some hash identifier on-chain which informs the loader where to get the actual payload for the application, e.g. from IPFS/HTTP. This hash identifier must authenticate payload. NB: Cannot be stored on our storage system, as the protocol for this system may change in arbitrary ways that loader cannot keep up with. Loader must be static.
The loader should be able to cache hosts and keys accumulated from prior sessions, so as to have a peer list to automatically try, in addition to the bootstrapping.
The loader must be independent of Joystream, Pioneer, and if possible, any specific runtime, as long as the runtime conforms to some standard way of exposing bootloading information.

Security

While we, in general, presume that the application being loaded is not malicious, it may still be worth exploring how we could loosen this assumption. Now, since we are assuming that the application runs in a browser environment, the user host (disk, devices, processes, etc) should be safe, however, if loaded naively, there is still a risk that the application can access user keys.

To prevent this, the loader could load the applicaiton in a sandboxed environment, and expose some sort of API through which the applicaiton could make requests to sign and send transactions, etc. Such requests could then be turned into user level prompts that explain what the application is actually trying to do, and requiring approval.

Aragon has done initial work on this in their AragonAPI, which is an API exposed by a wrapper environment in which upgradable apps can run in a static Aragon client application.

Note

The Augur project has shipped a loader which seems to be able to do something like this, as one can run the Augur UI under a static IPFS hash: Augur app through Cloudflare IPFS gateway
While we may be able to guarantee that a single fixed loader can work across runtime upgrades, it will obviously not work across other hard fork changes introduced by possibly altering the Substrate node being used in a chain.
This project cannot be started if javascript light client validation does not work yet, so investigate this first.
The most important question to resolve here is how to do bootstrapping into our Substrate network from the browser in a secure way, once this is settled
Keep in in mind that one cannot talk directly to the IPFS/BitTorrnt/Dat network from the browser,
Be aware that the first thing that must be established is a secure read capability of the correct blockchain state, no other authority can provide the required trust relationship across changes to applications, runtime rules and actor sets.

Milestones

Determine that Substrate light clients actually work <= also inspect whether fast finality based fast synch works or not, otherwise initial synch can take a long time
Decide how to bootstrap into full node network for blockchain from browser
Get a light client to synch with such a new network inside the browser
Decide on a standard format for how to expose
Be able to fetch and load the application and all its resources, and start it.
Run application inside of some sandbox environment

Deliverables

Loader
Runtime format & example
Demo app loading both in browser and Electron environment
Sandbox environment design
Loader v2.0 based on sandbox approach

Implement Data Directory

Data Object
Directory in storage as map of ID to DO
Update workflow:
- Owner creates as inactive
- Upload (outside of this repo)
- Storage provider marks as active (outside of this repo, but runtime functionality must be implemented here)
Tests

README missing on development branch

PR #47 to add readme was directed at master branch. I think we can simply merge master into development to get it in there.

List of web3 app distribution channels

Background

Asserting control over web 2.0 or internet assets for distributing apps, such as

app store entries
ICANN domain names
desktop app certificates

is a major barrier at for low friction and safe app distribution. Lets compile a list of alternative channels.

Requirements

WIP ( a bit handwavy):
A means of distributing must allow some place to download an authenticated payload based on some initial trusted hash or key, and from this run some sort of bootloader code which can start and render a user experience which can interact with the platform.

List

Clovyr: web 3.0 app platform
BlockStack Browser: integrated with Stacks naming system
Metamask: will get some sort of ENS integration
Brave Browser: built in metamask, ipfs and ENS?
Opera: ENS integration
Beaker Browser: Dat integration
ZeroNet Browser: Namecoin integration <= not sure
IPFS Browser Extension: Load IPFS assets in any browser
[Opera]: .crupto + IPFS integration https://twitter.com/momack28/status/1244556522081771521
~~(Mist)[https://github.com/ethereum/mist]~~

full node not syncing

OS: windows 10
launched using power shell: .\joystream-node.exe

result: node stopped syncing at block 222728

expected outcome: the finalized and best number should be going up and sync with the target number of blocks

Monero address: 42iysuonzWwezJLAtaB8sPT8RXWQGwg9UKkwAgG8oBpH88hYBZ6HtnieQwcDejtbTd9tjnkN46w2r4DdC18YjRWh6sjTjKs

staked roles: implement runtime module

Implementation in #6

Basic Publisher Profile

Relates to #9

Basic publisher profile with static assets on storage (content IDs)
Linked in content directory entries.

Write more tests for Forum SRML

Joystream Privacy Policy

We need a Privacy Policy covering all of our infrastructure.

Fix test for storage module and add new

Reproducible build script

The purpose of a reproducible build script is a first attempt to have a simple tool to assist council members when they come to vote on a runtime upgrade proposal.

Ideally the script would do the following:

clone the runtime repo, checkout the commit/branch specified by a proposal
Runs docker build to run unit tests and if successful compile the runtime wasm blob (as described in readme.)
If successful copy the wasm blob to local filesystem
Calculate and display the hash over the wasm file (using the same hash algorithm as the one produced in the runtime upgrade proposal https://github.com/Joystream/substrate-runtime-joystream/blob/ae76fc45109562ddb9415f0e333527b2b773a01d/src/governance/proposals.rs#L254

We should make use of / reuse the Dockerfile for our node Joystream/substrate-node-joystream#72

Sideproject: Add fuzzing

WIP: Experimentation with adding fuzzing to individual modules or the runtime as a whole.

Update CI to prevent merging with build warnings

Research: Upgradable Substrate client application framework

Upgradable client application framework

Status

Available

Purpose

Define a self-contained project which new developers can tackle as part of the screening process for Jsgenesis development team, as a side project.
Arrive at an actual framework which can be used for future application development on Substrate, both for us, and the ecosystem more broadly.

Background

Substrate

The Substrate SDK offers a full node application architecture which allows for upgrading of the consensus Runtime rules, and migration of state, through the transaction processing system itself. The primary motivation for this design is to allow for consensus upgrades that follow formalized rules for decision making and deployment, in order to reduce the various social transactions costs of this activity.

Joystream

In the context of Joystrem, we will be relying on this upgradability for the runtime. At the same time, the Joystream platform will be involved in paying for, developing and deploying other applications that depend on the runtime. That is, other applications than the full node, for example, content consumption experiences on desktop and mobile, web applications dynamically loaded in web3 style browsers, etc. These applications will depend on the particular structure of and function of the runtime at any time and may have locally stored state reflecting the same. When there is a runtime upgrade, these applications need to often be upgraded as well.

Goal

A methodology, and tooling, for writing Substrate client applications which gracefully does upgrading and local migration with runtime upgrades.

Requirements

New application code and migration policy will come from the chain, and there must be some backward compatible way for arbitrary old clients to upgrade to the newest version.
Upgrades should be announced and available with sufficient time, so that running client instances can fetch all required assets and perform any possibly time intensive computation before upgrade goes into effect.
Application code must run in some safe environment to prevent malicious/broken client-side code being shipped. Some abstraction is needed to give access to lower level system resources, such as disk space, etc. The Substrate Runtime uses Web assembly for this environment.
Must support a diversity of application types:
- Native apps
- Web apps
- Mobile apps
- Embedded apps
Must be independent of Joystream runtime or system in all ways, should be a community standard.

Milestones

More clearly define the problem and requirements, currently still quite loose
Review various alternatives, look closer at Substrate itself, and propose a framework.
Write any SDK or other tooling which can
Make one or two key demo applications based on framework and SDK.
Seek feedback from Parity/Substrate team and Web3 foundation, go to step 2 if required based on this.
Write a blog post and record video walkthrough of the upgrade of the client application

Deliverables

Framework
SDK
Web3 Foundation grant application (if we end up applying)
Demo
Blog post + Video

Proposal: Introduce Indexing Node

Background

Currently, all of our applications speak directly to the Substrate full node when reading the runtime state.

Problem

This has a number of problems

Convenience

In almost all non-trivial scenario, a client is not so much intersted in a given (key, value) mapping, but rather but a set of such mappings where the value satisifies some query constriant. For example, take

struct Person {
 isMale: bool
... 
}
..
decl_storage {
  GreatPeople: map u32 => Person;
}

A client may here want to fetch all Person instances in the GreatPeople map, where isMale has a given value. Currently, there is no way to do this very sort of thing talking directly to a full node, instead a client has to download all instances and do processing client side. This makes the application much more complex, and has performance penalties (see next section). This example may seem managable in isolation, but gets bad as the number of variables increases.

What is worse is that, if its a genuine map, then the client does not even know what keys to ask for, and if its a linked_map, the client has to do N requests in sequence to download all keys, which has disastrous latency and state management complexity as the number of maps increase. Another bad workaround is to introduce extra state variable which hold the key set explicitly.

Performance

A production system at scale may easily need to service hundreds of thousands, or millions, of secure key/value reads, at a given block height, by actors without access to a trusted full node. These numbers are perhaps on the low end when considering that end users may need to render rich low latency user interfaces which depend on dozens or hundreds of state values in each scene. It then becomes a problem that

Normal full nodes will need to recompute the same light client proofs a large number of times. Presumably, for a storage database with N mappings, this will involve O(lg N) database queries for every request.
Normal full nodes are in general not architecturally optimized for the task of servicing a large number of requests simultanously.
Normal full nodes cannot do any server side processing, as descrbied in the proper section, which would result in much greater bandwidth requirements.

Proposal

A state indexing node which specialises in serving storage values, and corresponding proofs, to a large volume of client requests. By parsing the storage metadata it automatically works with any runtime, and thus across any arbitray runtime upgrade. If unavoidable, light touch conventions may be imposed on the storage system (such as possibly requiring linked_map over map), but it shall not be tied to a specific runtime.
The API is based on the GraphQL standard, this has two benefits
1. This allows clients to specify complex query requirements (filtering, pagination, etc.) which can be done server side. This economizes on bandwidth, client CPU load and client memory load, and client state management complexity.
2. Unlike REST, it allows for a single API flexible enough to support a diversity of client applications, which is critical in a blockchain context.
A javascript client library that works in the browswer, with a built in light client, which can be used to query the node and have automatic validation of proofs.
Node operator incentives are not taken into account for now.

How is this different from Polkascan

Currently, and for the foreseeabl future, the project appears less focused on

Serving proofs.
Working automatically with the state of any runtime.
Providing a highly flexible API for application developers, such as GraphQL

Background

GraphQL used in Ethreum indexing services
Polkascan dynamic block explorer

joystream / joystream Goto Github PK

joystream's Introduction

Joystream

Overview

Development

Install development tools

If you prefer your own node version manager

For older operating systems which don't support node 18

Run local development network

Software

Running a local full node

Pre-built joystream-node binaries

Mainnet chainspec file

Integration tests

Contributing

RLS Extension in VScode or Atom Editors

Authors

License

Acknowledgments

joystream's People

Contributors

Stargazers

Watchers

Forkers

joystream's Issues

Add your suggestion as a comment!

Background

Major questions that

Initial suggestions

Question we posted in Substrate Riot chat

Proposal: Storage & distribution system benchmarking tool

Background

Goals

Proposal

Disclaimer

Overview

Concepts

Personae

Data Object Profile

Test Battery

Test Scenarios

Upload

Download

Functionality

Runtime + Pioneer

Pioneer Only

Threads and posts:

Creating threads/posts:

Show more chain state data.

Front Page and Categories page

Subcategory Page

Background

Problem

Background

What is Pioneer?

The problem

Goal

Requirements

Milestones

WIP

POC Substrate secure messaging integration

Status

Purpose

Background

Goals

Requirements

Messaging protocol

Practical

Milestones

Deliverables

(Rough) Design Proposal: Storage & Distribution System

Background

Overview

Principles

Concepts

Payload Filter

Data Object Family

Data Object

Storage Group

Storage Group Entry Application

Show more `chain state` data.