Giter VIP home page Giter VIP logo

youtube-synch's Introduction

Youtube-Synch

The Youtube-Synch node is used for youtube's creator onboarding and replicating their content automatically on Joystream. This service does periodic syncing the videos from a youtube channel to a Joystream channel.

Required Stack

  • Docker
  • aws cli
  • npm >= 9.0.0
  • nodejs >= 18.0.0

Buildings the Youtube-Synch node

  • Install dependencies npm install
  • Build the project npm run build

Running the Youtube-Synch node

prerequisites

  • A channel collaborator account should be setup on the Joystream network. This collaborator account will be used to replicate youtube videos to the Joystream network
  • An App (metaprotocol) should be created on the Joystream network. This app will be used for adding videos attribution information to synced videos. The app name & a string accountSeed should be provided in the config.yml file. On how to create an app on the Joystream network, see documentation

Configuration

Config file

All the configuration values required by Youtube-Synch node are provided via a single configuration file (either yml or json).

The path to the configuration will be (ordered from highest to lowest priority):

  • The value of --configPath flag provided when running a command, or
  • The value of CONFIG_PATH environment variable, or
  • config.yml in the current working directory by default

ENV variables

All configuration values can be overridden using environment variables, which may be useful when running the youtube-synch node as docker service.

To determine environment variable name based on a config key, for example endpoints.queryNode, use the following formula:

  • convert pascalCase field names to SCREAMING_SNAKE_CASE: endpoints.queryNode => ENDPOINTS.QUERY_NODE
  • replace all dots with __: ENDPOINTS.QUERY_NODE => ENDPOINTS__QUERY_NODE
  • add YT__SYNCH__ prefix: ENDPOINTS__QUERY_NODE => YT__SYNCH__ENDPOINTS__QUERY_NODE

In case of arrays or oneOf objects (ie. keys), the values must be provided as json string, for example YT_SYNCH__JOYSTREAM__CHANNEL_COLLABORATOR__ACCOUNT='[{"mnemonic":"escape naive annual throw tragic achieve grunt verify cram note harvest problem"}]'.

In order to unset a value you can use one of the following strings as env variable value: "off" "null", "undefined", for example: YT_SYNCH__LOGS__FILE="off".

For more environment variable examples see the configuration in docker-compose.yml.

Setting Up DynamoDB

Youtube-synch service uses DynamoDB to persist the state of all the channel that are being synced & their videos. The Youtube-synch node works with both the local instance of dynamoDB and cloud-based AWS instance. For running a local instance of dynamodb, this is useful is useful for testing & development purposes, follow the steps below:

Local DynamoDB

  • npm dynamodb:start to start the local instance of dynamoDB.
  • Also if you want to use the local instance of dynamoDB, you need to set the following environment variable:
    • YT_SYNCH__AWS__ENDPOINT to http://localhost:4566

AWS DynamoDB

For using AWS dynamodb, generate AWS credentials (Access Key & Secret Key) for the user that has access to the DynamoDB table from the AWS Console.

Next there are two options, either you can provide the credentials in the ~/.aws/credentials file or you can provide them as environment variables or in config file.

  • For configuring these credentials in the ~/.aws/credentials file using aws configure CLI command
  • For configuring these credentials as environment variables, set the following environment variables:
    • YT_SYNCH__AWS__CREDENTIALS__ACCESS_KEY_ID
    • YT_SYNCH__AWS__CREDENTIALS__SECRET_ACCESS_KEY

Running the node

Youtube-synch service can be run as a nodejs program or as a docker container. The service depends on the above described configurations so please make sure to configure the env vars/config file before running the node.

To run Youtube-synch service as nodejs program, run npm start

For running Youtube-synch service as a docker container, run docker-compose up -d at root of the project. This will start the service in the background.

Doing Unauthorized replication/syncing of Youtube Channel's videos on joystream

There is a CLI command for doing unauthorized replication/syncing of Youtube Channel's videos on joystream. For more information see sync:addUnauthorizedChannelForSyncing

Also, if you want to sync multiple unauthorized channels, you can use sync:syncMultipleUnauthorizedChannels command

Elasticsearch Alerting & Monitoring

The YT-Synch service logs can be sent to Elasticsearch instance, which then can be used to create alerting & monitoring rules based on the defined criteria. There is a script designed to automate the creation of Kibana Alert Rules and Action Connectors for monitoring the Youtube Synchronization Service. The script creates an alert rule that queries Elasticsearch for any errors occurring in the service within the past some time (in minutes). If any errors are found, the script triggers email and Discord notifications to inform recipients of the issue. The script also creates the necessary connectors for sending these notifications.

Environment Variables

The script uses the following environment variables to configure the alert rule and connectors:

  • KIBANA_URL: The URL of the Kibana instance (default: http://localhost:5601).
  • ELASTIC_USERNAME: The username for accessing Kibana (default: elastic).
  • ELASTIC_PASSWORD: The password for accessing Kibana.
  • EMAIL_RECIPIENTS: A comma-separated list of email addresses to receive email notifications.
  • DISCORD_WEBHOOK_URL: The webhook URL for sending notifications to Discord.
  • THRESHOLD: The threshold for triggering the alert (default: 10).
  • EMAIL_CONNECTOR_NAME: The name of the email connector to be created in Kibana (default: Elastic-Cloud-SMTP).
  • WEBHOOK_CONNECTOR_NAME: The name of the webhook connector to be created in Kibana (default: Discord Webhook).
  • ALERT_RULE_NAME: The name of the alert rule to be created in Kibana (default: YT-Sync-Service-Alert).

How to Run the Script

  1. Ensure that you have the required tools installed on your system: bash, curl, and jq.
  2. Set the necessary environment variables in your shell or export them in a .env file.
  3. Make the script executable by running chmod +x scripts/create-elasticsearch-alert.sh.
  4. Run the script with ./scripts/create-elasticsearch-alert.sh.

Upon successful execution, the script will create an alert rule and the required connectors in Kibana. If any errors are encountered, the script will display an error message with details on the issue.

youtube-synch's People

Contributors

badabum avatar bedeho avatar dzhidex avatar kdembler avatar wradoslaw avatar zeeshanakram3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

youtube-synch's Issues

Issues with synced videos

  • it seems not all of the synced videos have audio
  • https://github.com/Joystream/marketing/issues/322
  • they don't have a duration set. In QN It's set to just null.
  • I don't know if it's actually fixable, but thumbnails have a weird format with black bars on the top and bottom:
    image
    They are more square-like. This is how it looks in atlas:
    image
  • they don't have mediaMetadata set correctly. pixelHeight, pixelWidth, is not set. Also mimeMediaType in encoding is null. Although, I'm not sure if we actually need all these values, and if we can live without these properties.

Questions:

  • Should we fill the publishedBeforeJoystream field in this case? They were created on youtube first, so I guess we could add the proper date to QN as well
  • There are a couple of properties that are just null, and I'm wondering if we should keep them like that in the atlas. Those are license, language, isExplicit, hasMarketing.

Effectively choose content for automated syncing

Background

#24 takes a dig into youtube-sync infrastructure scaling & latency issues. As, we don't want to overburden the blockchain infrastructure itself to such an extent that there is degraded experience for real user interactions, e.g., metaprotocol actions, forums, etc, and other DAO-related activities (council, proposals etc).
Al the scale of syncing the videos of channels in an order of thousands to 100s of thousands, the synch infrastructure would have to the employ clever tricks so that 1) aggressive syncing don't overwhelms the blockchain itself 2) and the syncing process also accommodates maximum number of creators. Hence, we need to prioritize the videos syncing of each channels based on some filtration criteria( described below), so that the syncing infrastructure has maximum reach in terms of number of channels being actively synched

Proposal

  • Prioritize all new channels that opted in for the syncing, so every channel should have at least one video synced
  • Probably we should not favourable to syncing only one channel's videos
  • At scale of thousands of channels we should selectively choose which videos of a given channel we want to sync based on video's Engagement Level. We can define video engagement level based on different criteria. e.g. video likes/views/ count, when it was uploaded(probably should prioritize new uploads over the old ones). Inclusion of video duration (prioritizing small videos) in the selection criteria would also serve well for the sync infrastructure

Solution

Manage multiple queues with each queue having priority relative to other for uploading videos to the Joystream network, and based on the priority of the video we can put video in respective queue. Next there are two options:

  • A queue with higher priority should be completely processed before we move to low priority queue
  • Or maybe we can have different processing rates for different queues. E.g. queue with higher priority should have 2x processing rate compared to queue with low priority

We should let the creators know that hey we are following your videos in the system ,and it will probably take estimated x amount of for a specific video to be uploaded to Joystream. Basically for every video in the system there would be an expected upload timestamp calculated based on video position in the queue and its duration.

Question: Multiple YT-synch services?

Question

Should there be one or multiple YT-synch services?

One

There could be one service, that is one backend, auth and replication service, which can be used from multiple applications simultaneously. We could even make sure to have proper credit assignment in allowing to capture the member representing the app when adding creator to the backend, which can be used to do other reward payments later. Of course nothing actually prevents someone from running a new synching service, however, unless the tooling is properly setup for that, it is unlikely to work that well, and thus be attempted in the near term. For example, as a creator tries to use more than one app, each synching services would be unaware that the same channel is already created, or worse yet, they may start replicating videos into the same channel. This does not need to be malicious on the part of any app or creator. A malicious creator could attempt to do so in order to get duplicate rewards, depending on how rigorous the verification is at any given time, which depends on how well verification scales. Another benefit of one synching service is that it likely will give creators a better experience, as resources in operating that service properly in terms of support, cost, abuse detection/prevention, reliability and communication with creator will be better than lots of distinct services with limited resources and expertise. It will be easier to also apply fixes, migrations and bugfixes, without having to coordinate. The proper use of category mapping from YouTube to Joystream, which is still an open question(#38), will also be easier to iterate on. Lastly, since most of the early gateways will be operated by workers of the gateway working group, they will all be in proximity with the lead and the rest of the DAO, hence there is just less of a need for someone to run their own distinct operation.

Multiple

The main benefit I see here is that new operators dont need to coordinate with other participants in any way at all, and it is more permissionless. There is also some extra level of fault tolerance perhaps.

Proposed general architecture

The diagram here shows general system architecture based on AWS services.

Youtube Downloader  (2)

  1. Create a database of users and channels to be monitored for downloads. This database would be structured to allow any number of users to register any number of YouTube channels.

  2. Create a AWS Lambda function that would execute in predefined time intervals (ex. 5 min). This lambda function would retrieve a list of available videos per channel and save the list in the DB. Every new video that is discovered by this Lambda would be recorded in "Inbound" database for processing/download.

  3. Create NodeJS based video downloader and create a docker image that will be executed for each individual download. Downloaders would be orchestrated by AWS Container services and kubernetes which would allow full scalability. Video downloader will be responsible for the following:
    - Download video from youtube
    - Upload video to dedicated AWS S3 bucket
    - Record reference to S3 file in User/Channel Database
    - Mark downloaded video in Inbound DB
    Once the download process for an individual video is completed, the docker container will be removed.

  4. Initiator Lambda would, similar to Youtube Monitor Lambda, run in regular time intervals and initiate the download of the pre-configured number of videos by spinning download containers.

This architecture allows controlled scalability and cost management because setting how often videos are downloaded and the number of downloads per interval gives us full control.

Once again, I would like to remind you to verify the terms of use for Youtube and confirm that we can download a large number of videos without triggering YouTube security which might cause them to block our IP for accessing their services.

Requirements

Background

For creators who already operate a channel on YouTube, we would like to lower the effort required get started with a new channel on Joystream, and also to maintain that channel. This initiative is described in detail here

https://github.com/Joystream/youtube-synch/issues/2

Proposal

We should offer a service where existing YouTube creators can automatically get their content on YouTube copied over to Joystream on an ongoing basis.

Requirements

  • A web-based self-service signup process for the channel owner. The channel owner should be able to prove to our service that they own a given channel using suitable YouTube/Google APIs. It should only require that the creator has already created a channel on Joystream.
    Likewise, should it be possible for the creator to use web-based experience to pause continued synchronisation, or remove their participation from the service entirely. The latter does not imply deleting deleting their content or channel from Joystream.

  • When a channel owner signs up, the system should initially synch over all videos existing on their YT channel, and then monitor the channel for new uploads on an ongoing basis. Syncing over videos requires paying transaction fees in the Joystream system, dealing with possible failures when uploading to Joystream. Lastly, since each Joystream channel has a limit on how much storage space can be used, the service must detect this and pause synching until space is available for this channel.

  • The system must have an architecture where it is easy to scale up horizontally to handle on the order of 10 million channels, each with an average of 1 new video per week and a back catalogue of 20 videos on average. The scaling should be possible in production, without down time,

  • The system should operate on commercial cloud infrastructure like AWS.

  • There must be a public API which reveals all relevant state about the system, including

    • channels subscribed
    • key metrics for ongoing activities
    • for each channel it should list all past replications, any ongoing video replications and their status.
  • There must be a way for an operator to view the state of the service and issue commands.

  • The system must have logging of all success and failure events in an archival system for future debugging, inspection and analysis.

  • Must gracefully recover from faults.

Resources

Looking at the source the LBRY service probably has lots of useful lessons, and perhaps reusing parts of it may be feasible

https://github.com/lbryio/ytsync

YPP Status Suspended

Upon manual update of status in Airtable - trigger suspended email and send status to Atlas

## Scope
⚠️ draft: (@zeeshanakram3 wilo implement the final schema and share the endpoint deets in the comment to the PR)- Create endpoint to receive a json file with along the lines of

Suggested basic:
channel_ID
suspended = true
reason

Endpoint to return all synced channels for given member id

Context:
Let’s say we have a member, who has 4 atlas channels. Two of these channels are already synced with youtube. In the design, once a user clicks the sign-up button, we’re should show a dialog with channels that aren’t synced with youtube. Here is the flow: https://www.figma.com/file/oQqFqdAiPu16eeE2aA5AD5/YouTube-Partner-Program?node-id=1637%3A121102&t=jd7OSusYewdF0XSm-4 (read Adam’s description on the left)

Problem:
The problem is that I need to send 4 requests(GET /channels/id) for each channel to get this data and check if this channel already exists in DB(synced with youtube).

Suggestion:
Could we improve this, and add a query that will return a list of channels that are already synced with youtube for a given member id?

Channel Deleted, Collaborator keys removed cases

If channel gets deleted:

  • opt out from both autosync and ypp programme

If channel owner removes the collaborator keys from the channel (via polkadot.js app)

  • Opt out from auto-sync, but remain in YPP

Unhandled exception when trying to register user

Getting this response when trying to create the user via /users POST:

{"statusCode":400,"message":"Cannot read properties of undefined (reading 'eq') is invalid for the query operation.","error":"Bad Request"}

Spike: YPP backend & architecture

Context

YPP partner programme in the lack of automation is suggested to be based on Airtable as backend/ crm. ⚠️ Importantly some progress was done on the youtube auto sync programme wrt to backend and underlying services: #22

Depending on the status of issue 22, we may end up with re-using some parts of it in pursuit of the end goal - to automate the sync.

If the progress is not sufficient for parts of it to be reusable, then we would need a service that can speak to youtube, gets triggered by atlas, reads from airtable and exposes info from Airtable back to Atlas.

Process flow diagram: https://miro.com/app/board/uXjVOn84FzE=/

In brief the process from the creator and JSG perspective is as follows:

Users arrive to the programme landing page hosted by Atlas, and start the flow there.

  1. Create joystream channel
  2. Auth with Google wrt to channel ownership (get this confirmed)
  3. Sign T&Cs
  4. Provide email for notifications

there's a tiered structure so a tier gets assigned to them based on how many followers they have on youtube ❓ its an open question whether that can be done automatically based on the meta data that's provided by YT api wrt to their channel details.
__
5. The status of their participation in the programme gets changed to Authorised (record on Airtable + tier assigned: three tiers of $JOY amount based on range of followers, defined as outcome of Youtube Auth.)

Screenshot 2022-07-15 at 14 56 13

  1. That status is displayed by Atlas to channel owners
    _
    Outside of systems (manual ops):
    Some actions happen on user side (they post videos etc.. )
    Some scoring happens by the relevant work group and added to Notion/Airtable
    _
  2. Payout happens and details added to Notion table and Airtable (tbc)
  3. Payout notifications sent to user emails

Additional Info

Related tickets:

Scope

  • Take a view on the Miro board AND this ticket

  • #22

  • Finalise the approach for YPP backend, create a service that would be triggered by Atlas, can talk to Youtube API for Auth, writes data to Airtable and exposes this data from Airtable to Atlas via simple API that Atlas can easily work with

┆Issue is synchronized with this Asana task by Unito

Youtube Downloader - Progress Tracking

Architecture documentation

  • Create an overall systems architecture specs
  • Confirm architecture

AWS Environment

  • Get access to AWS
    Done partially as not all permissions are given
  • Setup staging DB
  • Setup staging Lambdas
  • Setup staging Docker Orchestration

Database

  • Create local DB structure
  • Create staging DB structure
  • Create DB API service
  • Convert API service to AWS Lambda

Monitor service

  • Setup ORM and DB API
  • Service authentication and security
  • Youtube API wrapper
  • Monitor service implementation

Initiator service

  • Setup ORM and DB API
  • Service authentication and security
  • Docker Orchestration API wrapper
  • Initiator service implementation

Downloader service

  • Setup ORM and DB API
  • Service authentication and security
  • Downloader service implementation
  • Deploy the service

Infrastructure

  • Setup logging
  • Setup CI/CD deployment pipeline
  • Write Unit tests
  • Write Integration tests
  • Documentation of the system

Integration to other systems

TBD

Polling service to check channel subscribers

Context

Tiers of the channels subscribed to YPP programme may update and this needs to be fed to the App and used for rewards calculation

Scope

  • Add polling service for the YPP BE app to update channel subscribers x times per day. Start with once in 24hrs

Syncing YT channel without doing authentication

Sync real Youtube channels without doing their authentication in the YPP backend to test the complete syncing setup. Compile the list of YT channels and their videos (i.e., in JSON format). Write a script to update the channels and videos Dynamodb tables with the compiled list, and then start syncing the videos.

┆Issue is synchronized with this Asana task by Unito

Investigate: "This is a private video. Please sign in to verify that you may see it" bug

While syncing newly created video on YT, the syncing Lambda function logged the following error,

{
    "errorType": "Error",
    "errorMessage": "This is a private video. Please sign in to verify that you may see it.",
    "stack": [
        "Error: This is a private video. Please sign in to verify that you may see it.",
        "    at privateVideoError (/var/task/webpack:/youtube-sync/node_modules/ytdl-core/lib/info.js:109:12)",
        "    at validate (/var/task/webpack:/youtube-sync/node_modules/ytdl-core/lib/info.js:61:22)",
        "    at pipeline (/var/task/webpack:/youtube-sync/node_modules/ytdl-core/lib/info.js:184:11)",
        "    at process.info (internal/process/task_queues.js:95:5)",
        "    at exports.getBasicInfo (/var/task/webpack:/youtube-sync/node_modules/ytdl-core/lib/info.js:69:7)"
    ]
}

YouTube API service wrapper

@DzhideX in I've pushed a ytmonitor lambda. You might need to run npm install -g serverless first. To test the lambda just run npm run local-monitor. It will invoke lambda locally on your environment.

serverless.ts has enviroment variables that you might need to change

I bootstrapped the API wrapper for YouTube API. Tasks for you:

  1. Add authenticated request for the API. - Get Authorization token and add it to the request.

Joystream Network Integration Notes

Background

Substantial progress as been made on the part of the system which is responsible for

  1. Accepting & authenticating new claimed YT users.
  2. Using the API token of a new user to index all of their channels and videos
  3. Download actual video asset to an S3 placeholder bucket.
  4. Continuously do steps 2+3 by polling YT for any changed state.
  5. Exposing a basic API for inspecting the state of the system and interacting with it.

The best summary of the architecture of this system can be found here

https://github.com/Joystream/YT-synch/issues/22#issuecomment-1012282680

It has become clear that the integration with Joystream may infact impact the required architecture in non-trivial ways, as the blockchain and the storage infrastructure will be very plausible sources of both latency and failures. For this reason, the purpose of this issue is to provide some high level perspective and tips on what to keep in mind as we proceed.

Integration

Blockchain Integration

Terminology

  • Extrinsic: The Substrate name for transaction, namely state transition message in the ledger
  • Atlas: A video consumption, publishing and moneitzation webapp built for the Joystream network. Repo here https://github.com/Joystream/atlas
  • Full Node: A node that downlaods and validates all blocks and transactions in the ledger.
  • Finalized Block: A block which cannot be removed from the history of the chain.
  • Batched Extrinsic calls: A way to do an atomic call to a sequence of extrinsics, with corresponding parameters, where either all succeed of all fail. Is enabled by the runtime pallet utility, as seen here https://paritytech.github.io/substrate/master/pallet_utility/pallet/enum.Call.html#variant.batch_all.

Joystream SDK

Unfortuantely does not exist yet, hence the best way to understand how to interact with the network in the way which is relevant to the synch infrastruacture is to check out the Atlas code base, which does everything from creating membership, channels, videos to uploading assets to the storage system and reading the state via the query node, in Typescript.

Reading

While it is feasible to read the state of the blockchain from the RPC interface of a full node, both in terms of its current actual storage state, content of blocks and extrinsics, and also recent events like block finalization and transaction finalization, the most practical way to read the state is through the query node, as described here

Joystream/atlas#1577

the most up to date version of the schema the query node exposes, can be seen here.

https://hydra.joystream.org/graphql

For example the memberships query allows you to query all current on-chain memberships.

Writing

Writing to the state of the blockchain is done through extrinsics. Like all account based blockchains, successfully invoking an extrinsic is always done from an account, and it requires:

  1. The account having sufficient funds to pay for the invocation: be aware that different extrinsics have different fees, and the amount of fees may even depend on the parameter values to the extrinsic.
  2. That your invocation has a nonce value matching the current on-chain nonce for the account (to avoid replay attacks). The current on-chain nonce value can be recovered by asking a full-node which is in-synch with the chain tip. For each extrinsic invocation that is included in a finalized
  3. Having the ability to sign with the private key which corresponds to the account.

In particular constraint 2 is has a decisive impact on how the sync infrastructure should write to the chain at scale. Obviously there a need to do extrinsics for membership, channel and video creation, possibly more, and the volume may be quite large eventually. These factors will block the throughput of such calls

  • chain throughput: there is one block per 6s, and it only has so much computational weight and physical size.
  • query node latency: after an extrinsic invocation has been included in a block, the block has to be finalized, which can take many seconds, and only then will the query node begin to process the content of the block to update its query state. This means that from sending an extrinsic, and even from it being finalized, there is a non-deterministic amount of time until querying the query node will reflect the new changed state. This constraint has to be kept in mind whenever writing to the chain in a way that you hope to see reflected client side in order to proceed.
  • non-deterministic latency and faulty extrinsic finalization: simply sending off an extrinsic invocation does not provide any strong guarantee that it will actually be included in any given block that will be finalized. Even if your local full node signals accepting it, this provides no perfect guarantee about the full distributed system. The time it may take for this to happen, or even whether it will happen at all, is not predictable. Extrinsic can be dropped for any reason.

In particular the latter factor can be a source of substantial complication when interacting with requirement 3 above. If a large number of extrinsic invocatins, from the same source account, are issued all at once, then any failure at any point in the sequence of extrinsics will block all subsequent extrinsics, possibly permanently, and complex logic will be needed to detect and recover from this by retransmitting the extrinsic. This is going to be a problem even if the actual Joystream related business logic imposes no dependencies across extrinsics, it will be purely due to nonces. Thus being able to manage nonces and accounts in a way that allows for decent overall throughput, but yet is not brittle to non-deterministic errors, will be critical for the success of the system.

The simplest possible approach I can imagine is:

  • Use a dedicated account for simply creating memberships and corresponding channels (see below), and do thisas one batched call. However, only send extrinsic when the prior one has been finalized, and stop if there are no more funds or any other problems. Some sort of queue will be needed probably, and be catious to make sure that if the synch infrastructure fails, there is some graceful way to continue from a queue that was not empty during fault.
  • The initial controller and root accounts for a membership (see below) are unique, and thus if credited with some minimal amount of funds, all subsequent publishing of videos under any channel of this member can happen through this unique controller account. This will automatically mean that nonce management across other channels goes away, and you only need to think about each such channel separately. So you can then use the same approach as in the prior point of publishing one video at a time, waiting for one to be finalized before doing the next. The problem here is that eventually the creator will possibly claim the channel back, in which case all future publishing has to be done through a collaborator member. Here there is then a question of whether one should make one global collaborator across all channels, or make a new one per channel. Doing a global one is probably simpler to start with. The benefit of this approach is that, if a large % of channels will be unclaimed a large part of the time, then uploads to these can happen in parallle, so if you are synching thousands of channels, this can really reduce the overall synch latency.

Now, perhaps going even simpler is a good idea, but this is one way to do it.

Storage Integration

The most important storage system interaction will be to publish new assets, for example images or video files. This has two separate components assocaited with it

  1. An on-chain extrinsic which describes the nature of the asset you want to publish, for example its size and hash. This information is bundled together with the primary action in question, such as creating a channel (in create_channe) or creating a video (in create_membership).
  2. Uploading the asset to a first representative in the storage system, which will then replicated to other key storage providers and content delivery nodes (all community operated!). The authentication required here is trivial, you don't even need to authenticate as the member who controls the relevant asset, you only need to be a member (This will be changed). But the infrastructure does validate the content against the hash supplied in step 1.

Membership Creation

When a YT user is authenticated and added to the system, a corresponding Joystream membership should be made on the Joystream blockchain, using an exterinsic buy_membership. Memberships are described here https://joystream.gitbook.io/joystream-handbook/subsystems/membership. Channels, described below, are owned and operated by members.

The membership allows for a Joystream specific

  • handle
  • avatar: just a HTTP URL, not hosted on Joystream storage infra natively at this moment.
  • description

Each Joystream membership is identified by a unique u64 integer.
It would be ideal for these to somehow mirror whatever is found on YT, however, this may also constitute a privacy issue for the user (for example if they have fragments of their real name in their Google account name). For this reason, the user should probably be offered the option to override these defaults when signing up. For the time being, membership level heavy assets - like image avatars, are not uploaded to the Joystream storage infrastructure, only links are used, but this will change in the future. Lastly, handles on the Joystream side have to be unique, hence the system must deal with a possible conflict that can arise if the YT-handle or the user provided handle are already taken.

Critically, a membership is controlled by two accounts, a root and a controller account, where the former is a recover account and the latter is used for authentication in all extrinsics associated with the account, such as creating a channel, updating membership metadata, etc. Under the assumption that the YT-synch system will not try to automatically detect new YT-channels after signup, it is not needed for the system to know these accounts going forward, however, since we also want the friction for a new user signing up to be minimal, it would be best if the system automaticlly generated such the required keys and stored them until the user was prepared to claim them. This is becuase in order for the user to fully control their membership, they would need to setup a wallet and their frist keys, and this can be difficult. This does however mean that we should at a later time add a way for the user to ask for the system to change the accounts on their behalf to something else that they control, but this will require building some new UIs that are not so urgent.

Channel Creation

Unfortunately the documentation for the content system is for an old version https://joystream.gitbook.io/joystream-handbook/subsystems/content-directory

For each YT channel the user both has, and wishes to synch over to Joystream, a corresponding channel must be created on the Joystream blockchain, using an extrinsic create_channel. Blockchain implementatino can be seen here
https://github.com/Joystream/joystream/blob/master/runtime-modules/content/src/lib.rs#L693

Like YT channel, channels on Joystream have a

  • handle: not unique
  • avatar image: must be uploaded to the joystrem storage system
  • cover image: must be uploaded to the joystrem storage system
  • verified status: should be set to Yes.
  • description

Each Joystream channel is identified by a unique u64 integer. It is not possible to publish any videos before the channel for that video has first been established, however, it is possible to create videos before the image assets of the channel have been uploaded. Control over a channel is exercised by two means, either by being the member who owns the channel, or being one among a listed set of collaborator members on the channel. These collaborator members can be attached during initial on-chain channel creation with create_channel, or they can be attached later. This notion of a collaborator is probably the best way for the synch infrastructure to retain the ability to publish new videos under a channel, as this is something collaborators can do. For this reason, there should probably be one or more designated synch-infra controlled collaborator members, for which the synch system knows the keys, so that it can retain access to publish, but without control over other asset of the channel, like cash or NFTs. This does mean the user can also kick out the synch collaborator, in which case the synch system must be resilient and not fall over, perhaps pause, or do something else graceful.

Be aware that a channel can be deleted, either by the owning member in Joystream, or by a certain set of curating actors in the on-chian content index (called curators). In which case, the synch system has to not recreate the channel and probably avoid evne looking it up in the future on YT.

Note: as I write this, I am not 100% if curators can actualyl do this, and docs are stale, but the issue still stands.

Video Creation

Unfortunately the documentation for the content system is for an old version https://joystream.gitbook.io/joystream-handbook/subsystems/content-directory

For each YT video video the user has under some YT channel they have signed up for synching, a corresponding video must be created on the Joystrem blockain, using an extrinsic create_video, under the corresponding Joystream channel and membership. Blockchain implementation can be seen here
https://github.com/Joystream/joystream/blob/master/runtime-modules/content/src/lib.rs#L1071

A video on Joystream has

  • a title
  • a cover photo: must be uploaded to the joystrem storage system
  • a video media file: must be uploaded to the joystrem storage system
  • some other generic metadata following a standard.

Each Joystream video is identified by a unique u64 integer. Be aware that a video can be deleted, either by the owning member in Joystream, or by a certain set of curating actors in the on-chian content index (called curators). If a channel owner deletes a video from Joystream which was initially added using the synch infra, then it should not be re-uploaded again, hence the synch system needs to somehow track this. An important failure mode here is that a channel runs out of space, there are limits to how much you can upload under a given channel, and the system has to gracefully stop trying to publish new videos until this is resolved, and probably send some alert to someone.

Note: as I write this, I am not 100% if curators can actually do this, and docs are stale, but the issue still stands.

Securely store user access & refresh tokens

After successful Google authentication, the frontend sends the user authorization code to the backend, which is then used to get access & refresh tokens. Both accessToken & refreshToken are then used for any subsequent retrieval of user/channel state, e.g., polling for new videos.

Currently, both of these tokens are saved in plaintext in DB, we need to store them securely. AWS natively supports encrypting DynamoDB table, maybe we can look into that

Synch Meeting Agenda

  • multiple resolutions: if we can at least get playback to work, then infra can plug in... conditional on not becoming redundant or in conflict with fuutre adaptive streaming.

  • how are we dealing with category mapping? I believe it should be static, and creator should provide input in signup flow, but I don't know where we stand. This would also be a good place for a creator to discover that they may be trying to get into a service which does not match their content, it's not so clear how Gleev for example will signal this otherwise, perhaps a separate screen is needed about this regardless?

  • Continuation of prior point. Does Gleev need some more custom messaging to make sure its clear what its about? e.g. Updating YPP page?

  • Continuation of prior point. Should this be configurable so that Vintio and others can change?

  • What is status of YPP being configruable as on or off in Atlas instance?

  • How will we policy wise deal with people signing up, say with a cooking channel, even if we add all the messaging described? Obviously tehy should not receie a payout, but how will they learn about why they are not a fit?

  • It's very unclear for me how YPP should work outside of Gleev+Vinteo: YPP has distinct parts

    1. who pays (until council does) and what are terms?: operationally it seems complex for us to start paying for all sorts of other gateways, so probably not?
    2. what creator backend to use: if there are distinct backends, different gateways may end up onboarding same creator, but perhaps this is totally fine, as council in the end probably will not reward duplicates?
    3. what synching infra to use: these could all be totally distinct.

Fetch 1080 videos

@dmtrjsg commented on Wed Dec 14 2022

## Context
In some cases multiple resolutions videos are synced without audio.

## Scope

  • If video has multiple streams we can stick with 1080 (if not available 720), and make sure we pull in the audio-track.

┆Issue is synchronized with this Asana task by Unito

MVP BE: Receive Call from Atlas

Info passed:

Atlas sends a register request to

YB with Google auth code,
email,
JS channel ID
optional referral ID

Meta for all videos to triage conditions described here:

Video categories for synched videos

  • When a new creator signs up with a channel, it should be possible for that API call to provide a default video category which will be used with all subsequent uploads for videos from that channel. Make sure the category provided actually exists, otherwise fail. The information about this will come from the app, which will have sufficient context to figure this out in some way, such as asking the user.
  • This value should be possible to manually update later by an admin.

Initial thoughts (architecture, implementation,..)

Backend:

I've found a guide on how to run node.js code on aws during my research and I think the diagram found there [picture below] is pretty much exactly along the lines of what I was thinking of (with little changes).

aws-architecture

  • The node.js app will take care of the functionality for all the different endpoints we may need.
  • We can use dynamoDB to store all the user data [atlas, youtube, videos, etc..], it should be very fast, reliable and also scales horizontally to support tables of virtually any size.
  • To automate transfer of videos from Youtube to Atlas, we will need to download the videos and reupload them to Atlas. While doing this, we need to keep these videos somewhere and this is what we will use Amazon S3 for.
  • The one difference from the diagram is that I think we won't need Amazon SNS (a notification service)

Functionality:

  • We will need one endpoint [POST] (the main one) that will take atlas, youtube channel data and video data and from this we can create a user inside the database. Upon doing this, we can start (from oldest video maybe) downloading videos from youtube and moving them to atlas. The state of the synchronisation should I think be kept in the database so in case anything goes wrong we can start over without any problems.
  • We also need functionality for consistently checking for new videos and making sure that all new videos are uploaded to atlas as well. I was thinking that doing this on an interval would be best/easiest (whatever the acceptable value be [1h, 2h, 12h, 24h, etc..]) but there is the problem of this quickly ramping up to an insane amount of request that need to be done (1 for every user, 10 million users = 10 million requests) per every check. (but I'm not sure if this is something that can be avoided anyways)
  • After that, it should be relatively trivial to add other endpoints [GET] for one to be able to query the system for channel data, video data and other important metrics.

Final notes:

  • Wrt this, I don't think my experience really allows me to try and predict any more specifics than what I have already written. I have a pretty good idea of how it should work generally and think I could start coding it up right now.
  • Is downloading videos from youtube allowed? I've found this thread but am still not 100% sure I understood if we can do it even with explicit permission from the user who owns the content.

Frontend:

  • The functionality on the frontend shouldn't be too complex at all and could live anywhere really (Atlas, joystream-org, ..). I think it would realistically make the most sense for the users to have it on Atlas for the sake of continuity of functionality.
  • I am not yet certain what the full extent of the functionality on the app will need to be but the basics should go something like:
    • Form where the user will be able to prove that he indeed owns an Atlas and Youtube channel along with accepting some TOS and the like.
    • After successfully submitting the data, they will be shown a UI where they will be able to follow the progress of the uploads along with possibly any other related info?
    • Something else?

Final notes:

  • We would preferably want some designs for this.

General steps

I've tried to make a diagram for this but it was hard not to make it confusing as one part of this is from user perspective and one from the perspective of the underlying system. After fleshing this out further I think a diagram can be made but currently what happens from the users perspective after logging into the form is largely unknown (from my POV). Steps (simplified):

1. Atlas channel owner opens web application
2. User needs to prove he owns channels on Youtube and Atlas.
3. Get all necessary data and create user in the database.
4. Start synching process. This means that a video needs to be downloaded (from YT) and uploaded (to Atlas). This should be done one by one, both to save storage space but also due to things like space constraints on Atlas and error logging.
5. The system should every so often (perhaps once a day), go through all users and check if there were any new updates and add any new videos to the system.

Finally: I think it may make the most sense to start from the frontend as that would make it easier for us to currently only implement the most important features in the backend and therefore shorten the time to completion of the MVP.

Create Endpoint to Expose YPP Requirements

Scope

  • Create endpoint to expose minimum requirements to Atlas

Context

As part of the YPP flow the minimum requirements are imposed on qualifying channels with params:

Age (date created)
Number of videos
Number of subscribers

Currently these are hardcoded on the BE side and FE modal:
Screenshot 2022-11-30 at 17.59.34.png

We want to make it parametrised to be easily tweaked by JSG team and External GW operators

Validation on Verification

Context

Miro LInk > LINK

Action: Creator proves to our infrastructure that you own a youtube channel, using Google authentication service.
Terms:
Joystream channel avatar, cover, title and description is properly set.
YT channel is at least 3 months old.
YT channel has at least 10 videos, all published at least 1 month ago
YT channel has at least 50 followers
Amount: Three tiers of $JOY amount based on range of followers.

Scope

  • conduct validation of the assessment criteria
  • based on the validation criteria described above assigns status: Authorised
  • for authorised channels add a Tier: from 1 to 3

That's what we save to airtable

Referrer Joystream Channel ID (can be empty)
Joystream Channel ID
Youtube Channel Id

Tiers:
50 - 10k => 1
10k - 100k = > 2
100k + => 3

Automation MVP scope

MVP requirements:
Prerequisites:

  1. Google app joystream-youtube-sync with configured OAuth2 and ApiKey credentials.
  2. AWS account to be used during testing

MVP scope/features:

  1. Minimal UI allowing users to:
    • [Required] authorize joystream-youtube-sync google app to access user's youtube data(channels, playlists etc). Needs following permissions: https://www.googleapis.com/auth/youtube.readonly
    • [Optional] For previously authorized user to
      • see channel(s) system identified
      • videos, and their state. State can be one of the following:
        • new
        • downloading
        • downloaded
        • uploading to joystream
        • upload to joystream failed
        • uploaded to joystream
      • remove account from the system

The ui mentioned is for testing purposes only, thus styling and responsiveness are not important

  1. HTTP Api supporting (at least) following operations:

    • GET /users - list all authorized users
    • GET /users/:id - get user by id
    • POST /users - create new user. Request body should contain(other properties might be added during development):
      Note: all properties should be populated google authorization response
      {
         email: string,
         name: string, 
         authorizationCode: string #authorizationCode is obtained by the client app after successful google authorization
         googleId: string,
         avatarUrl: url
      }
      
      This operation should also perform authorizationToken exchange for accessToken+refreshToken. Details: https://developers.google.com/youtube/v3/guides/auth/server-side-web-apps#exchange-authorization-code
    • GET /users/:id/channels - list channels system identified for user
    • GET /users/:id/channels/:id/videos - list videos of particular channel and their states in the system
    • [OPTIONAL] GET /users/:id/videos - list all videos across all users channels(if we will support multiple), and their states in the syste
  2. Youtube Sync background service
    Main capabilities:

    • Scan channels for newly added user. *
    • Scan channel videos. *
    • Download video to S3 bucket. Note, this step is optional as storing the video in S3 does not bring any value to the system, except locality of the data(possibly lower latency during upload).
      This is due to the fact that downloading from youtube is not a subject for quotas or limits, thus upload to joystream network can be restarted anytime, given we are able to get video url from the database.
  3. Properly configured test database for users data storage(except video files). We will start with AWS RDS(PostgreSQL api).

  4. Configured CI system. Github Actions should be used.

  5. Joystream testnet integration: TDB

Note: Operations marked with * are subject to youtube quota throttling(10k units/day) thus their implementations should handle such cases gracefully.

Payments Table

With every new payment for each creator each time a new record would be added to the payments table, for the use by Retool visualisation.

Fields:

Joystream ChannelID
Channel Reward Account
Youtube Channel Title
Youtube Channel ID
Block Executed (timestamp)
Action
Amount
Rationale/ Reason (optional)

Cooperative synching

Background

Right now, we presume that each operator will run their own synching infrastructure, meaning they have their own database of what YouTube channels have already been synched. This means that if a creators signs up to gateway A, and then later gateway B, then gateway B infra may not detect this and start creating duplicate channel and content. This can also happen despite the creator not being malicious, just confused or solicited mulitiple times by distinct parties.

Proposal

Add some simple signal to channel representation in QN which holds some identifier to YT channel that may be used to synch channel. That way gateway B can verify that its not worth doing the synch. This does open the door for having malicious attackers spam the chain with fake channels with existing channel IDs, but if that happens, there is moderation, and worst case the gateways would then have to ignore this flag, at least for some period of time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.