Giter VIP home page Giter VIP logo

bundled.media's Introduction

What is bundled.media?

See a demo at bundled.media

Gateway to Christian media

Bundled.media is a software product that is (in time) a gateway / API to the vast landscape of Christian media.

It aggregates and normalizes the media meta data. It makes it possible to filter by language code (bcp47), a search term, media type and hopefully in the future by category.

The media itself such as the video or audio files are not touched, this product is only about meta data (so it does contain a link to the source media).

The idea is a that each media consumer installs the product for their own usage. With this decentralization consumers are able to add credentials for protected sources. Also when there is heavy usage of it, the consumer pays for the computational costs.

Unique identifiers

To allow for a unified global gateway to Christian media, we need unique identifiers. One of the world its standards for this are URIs / URLs. An URI (uniform resource identifier) is very similar to an URL. A URL is also known as a link. Example link: https://example.com/hello-world. The main difference is that a URI does not need to resolve to a resource. Our ideal is to have resolvable identifiers so URLs are better, but URIs are allowed.

YouTube URLs can be used as unique identifiers. Some media publishers, publish the same video to YouTube and also to Vimeo, so how will we deal with that? We could have a file inside this repository containing data that tells that one YouTube link is the same as one Vimeo link. The RDF technology allows for such use cases. We would need to have a canonical URL, the primary one. Other URLs can resolve to the same video / media item. This way we would have unique identifiers.

We can have fallbacks to aliasses if the YouTube one is not available anymore. Media publishers should take the responsiblity to have their unique identifiers as stable as they can provide, but sometimes things might happen, that are hard to prevent or it might even be out of the control of the media publisher. Hence the need for a way of aliassing.

Usage statistics for media publishers

When media is used, we want to send back usage statistics to the media publisher. One way of implementing this is by having media publisher specific functionality when a certain URL is called. This would require media consumers who want to use media offline, to call this specific URL when the device is back online. The main target audience for this are the wifi boxes such as ConnectBox and others.

This would require usage of special crafted URLs. We do not make the consumers very dependent on bundled.media, we would prefer a better way. One option would be to have the special URL also contain the source URL.

Example: https://organization-c-bundled.media/open/https://www.youtube.com/watch?v=5mZFXfYEYRY

This would be good when a consumer decides to no longer use bundled.media. It would even allow for storing the target source URL in the database and when calling that, prefix it with the URL of their bundled.media instance. A similar trick is used for image resizing services such as images.weserve.nl.

Intergration into existing systems is peanuts because only this URL needs to be added in the templating layer. When the URL is not known / not claimed by a DataSource a redirect does happen.

Usage statistics for media consumers

What if consumers could see what media is often used? We want to create a way to have insight into the statistics. It might be an option to opt in for world wide media statistics, and when that mode is used, and consumers do use a specific URL that first calls this product and then redirects to the source URL, that it would send statistics to an aggregating place, a time series database that keeps track of usage. This URL is described in 'Usage statistics for media publishers'.

A global taxonomy to categorize media

With unique identifiers in place we also create categories for an identifier. Imagine searching through the vast landscape of Christian media with categories. The great thing with the proposed solution is that categorization does not need to happen at the media publisher. Initiatives could be created where a taxonomy is created for the top 600 media items from the Christian media landscape.

Imagine a taxonomy of categories, another taxonomy for keywords, one for target audience, one with ministry categories or a taxonomy that targets the audience where they are in their journey with Christ (see the gray matrix). These taxonomies can al be different initiatives started by different organizations or working groups. At some stage we would only need to support them here in bundled.media.

We hope to bootstrap a taxonomy of categories. We hope to create one that will become the standard taxonomy for christian media. The scope of this product is not fully described here. We will start with English but hope to translate into many languages. We will begin small with a small set of categories and will put effort into create logical categorization similar to library systems.

This taxonomy will make it possible for ministries to have systems where they search for media, curate media, apply that media to their website, and have their audience filter media in a very good way.

Notifications for new media

Would it be possible to subscribe to a search query, so that when new content is found you can get a notification? This would be awesome and ministries / media consumers could subscribe to media that would perfectly fit their audience. This might be a sub product that periodically calls bundled.media.


Technical details

Stack for bundled.media core

  • Deno TypeScript
  • Cache on the hard disk (might be using Redis in the future)
  • images.weserve.nl for image thumbnails

Technical ideas

  • Decentralization of instances, centralization of code
  • The source APIs remain the source of truth so we have no database
  • Identifiers that come from the source in the form of URLs
  • Fully protected agains supply chain attacks via a run wrapper that only allows whitelisted domains

Installation

  • Install Deno: https://deno.land/
  • Copy .env.default.ts to .env.ts and configure it
  • deno run --allow-run --allow-env --allow-write --allow-read src/App.ts --watch

Development

  • rm -rf public/vendor && deno vendor public/search.ts --output public/vendor

Problems and solutions

A YouTube video is uploaded, used and then taken offline and reuploaded

When using the URL scheme as described at 'Usage statistics for media publishers' a failing URL is noticed by the statistics component. Technically we would be able to create a component where people could subscribe to failing URLs in their DataSource. When this URL would be propagated not by YouTube itself but by an API of a Media Publisher it is even possible to notify the publisher that media consumers are using their broken URL.

The next step would be to add the broken URL to the list of URL aliasses in the media.bundled repository (not yet existing). That file would be read each time a URL is used that is returning a 404. A challenge there is partitioning. It should be a design requirement that the memory of bundled.media is emptied after each request. We should not have too much memory in use while idle because that would not scale very well. This means that the current solution is only a direction and not a full idea for a solution yet.

The reason for this is: A YouTube URL may be from any Media Publisher. To make the link between these two we could have a map but loading maps of hundredth of thousands of links is not good for the memory. A possible solution is to make these on the hard disk in a way that we can easily resolve. We could base64 encode them and see if the file exists for example.

bundled.media's People

Contributors

danielbeeke avatar emanuelgustafzon avatar

Stargazers

James avatar  avatar

Watchers

Russ Martin avatar  avatar Kirk Wilson avatar 0xBuooy avatar  avatar  avatar James avatar Jaime Torres B avatar Scott Starker avatar  avatar  avatar

bundled.media's Issues

Ability to click through to the url

I think maybe I don't fully understand the purpose of the code that comes up when you click on a media tile...

Is the idea that you can then use that code to embed/display the content in another application using that code?

Is there a plan to also allow a click through so if a normal user is looking at the media directory they can click through to the url of the media?

I was wondering if the "View" button would take you to the url, but it seemed to be broken/unfinished...

Implement authenticated responses that may protect sources that need credentials

Bundled.media works with APIs that need credentials. To comply to some APIs it would be great if we can have an authenticated request on bundled.media instances which would give the authenticated requester more sources.

A source must give a property, like 'needsAuthentication' and then we should filter on that.
This also means we should have credentials for the API consumers to use.

We might consider adding users and a database. But I personally would like to hold off on the database / user system as long as we can.

Would basic auth be good enough?

Optimize the order of the requested sources

When streaming data it is most helpful if we somehow can query sources with a high chance of giving data back first. We can create some kind of heuristic to determine this.

Do validation of objects with schemaorg-jsd

It might be good to put this under a URL parameter flag so that you can enable it to do debugging.
It might also be fine to just do this on the output and not all normalized items where some of those will be filtered away.

Add branding

We do no have a logo yet. But it would be great to have a logo and branding.

Vimeo DataSource runs endlessly when searching

When searching it may be that vimeo runs endlessly.

I think it has to with the page pagination of the API and then specifically how I go from zero index to one index. Somehow it keeps being stuck at two.

Notification system for updates on search results

What if we can make it so that people can subscribe to a search query.
A user must be able to subscribe and unsubscribe.
I think we could even do this without creating account. The unsubscribe would be placed in the email and the subscribe would be in the UI when you have done a search.

JSON-ld source

We can have a source which takes a schema.org class as one of the constructor options. It also needs a domain name.

The source goes in and uses Comunica to do query on this source maybe even the Comunica link traversal engine.

Make the UI end user focused

At the moment the UI is geared towards technical users.
Flip it around and make it so that it works for everyone.

The data popup can code in a button with <> icon.
When clicking on a thing we should try to display the media. This is will take a bit of code as we do not know what kind of data is in schema:url.

How to tackle unique identifiers?

Unique identifiers would give the many possibilities, in an abstract way: A way of talking about things while knowing it is a specific one.

And then:

  • World wide grouping of media with categories and keywords
  • World wide classification of target audience, with for example the gray matrix
  • Usage statistics for ministries (what media is generally popular)

Bug, not all results are shown

Reproduce by going to https://bundled.media/search

set pager to 40 and click on next until its not possible anymore:
(11 * 40) + 31 = 471

set pager to 80 and click on next until its not possible anymore:
7 * 80 + 4 = 564

This means somehow pagination is going wrong.

restrict dropdowns to available languages/types etc.

Just had a look through some of the content, and the UI is really nice and slick ๐Ÿ˜ƒ

Is it possible or worth restricting dropdowns to only include the available languages/media types etc.?

Loving the centralised media idea...

'Streaming' CSV source

It is probably possible to create a very low memory consuming CSV source.
It must be a token based source

This is the fetch method in pseudo code

  • Open a file handle read and parse the first line (CSV headers)
  • Start reading from the line that was given by the previous token or start from the first line if no token is given
  • Return the number of bytes that have been read so that in the next request we can continue where we left

With this kind of mechanism we can have a very low memory consuming CSV source.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.