Giter VIP home page Giter VIP logo

tapedrive's Introduction

Tape Drive, a selfhosted Podcast Client and Archiver

Tape Drive Logo

Docker Build Tests

maintainability coverage pre-commit.ci

ruff poetry pre-commit

Tape Drive is a self-hosted podcast client with an emphasis on long-term storage of episodes. The idea is to organize subscribed podcasts properly on disk following a robust naming scheme, and including the available metadata such as shownotes, episode and season numbering, etc.

In future versions I plan on extending Tape Drive, for example to include a web player for listening to downloaded episodes, hoping to turn it into a simple web-based podcatcher.

Whenever possible Tape Drive takes a privacy-first approach. For example this means removing tracking parameters from download URLs and, as many podcasts are starting to embed tracking pixels and pull images from external domains, Tape Drive will show such embeds only shown after the user explicitly requests it.

Current state of affairs

Tape Drive is built using Django and I'm working on a first stable release. Feel free to check it out but there are no guarantees on it working at all at any time right now.

Features

  • Aesthetically pleasing presentation of podcasts, episodes, and their metadata
  • Fully responsive web-UI with distinctively unexcited behavior (no fancy animations, clean look, etc.)
  • Automatic episode downlodas for subscribed podcasts, including downloading the back catalog
  • Storage according to a robust and human-readable naming scheme, including shownotes metadata
  • Manually initiated episode downloads possible
  • Ability to efficiently fetch multi-page feeds

Impressions

Tape Drive login screen

Tape Drive welcoming you

Tape Drive podcast list view

Tape Drive podcast detail view

Prerequisites and setup

Right now, the only fully supported way of deploying Tape Drive is via Docker container. The image is available through GitLab Container Registry, and in the hack/docker/ directory you will find an example docker-compose.yml file for the real out-of-the-box experience. Tape Drive supports PostgreSQL as its database back-end.

Non-dockerized deployment is absolutely possible, as Tape Drive is basically just a Django application.

When applying the initial batch of database migrations (users.0003_create_initial_superuser to be precise), an admin account is created with a random password. That password will printed to the console log of the migrations run:

Applying users.0003_create_initial_superuser...Creating initial user: admin // pass: <randompass>

You may use those credentials to log in at first, and change the password or create additional users from within Tape Drive.

Tape Drive in a standalone Docker container

Creating a Docker container from the Tape Drive Docker image is pretty a straight-forward process. As discussed above, Tape Drive expects you to provide a database connection on input, formatted as the well-known DATABASE_URL environment variable:

docker create \
  --name=tapedrive \
  -v <path to data>:/data \
  -e DATABASE_URL="postgres://USERNAME:PASSWORD@HOSTNAME:PORT/DATABASE_NAME" \
  -e DJANGO_ALLOWED_HOSTS=127.0.0.1,myfancy.domainname.example \
  -p 8273:8273 \
  ghcr.io/janw/tapedrive

Use the DJANGO_ALLOWED_HOSTS variable to tell Tape Drive which hostnames to accept connections from (as a comma-separated list). Most likely you want to link the storage path inside the container to a real location on your filesystem. By default, Tape Drive downloads data to /data, hence the above -v mapping.

Development setup

Poetry is used for dependency management of Tape Drive. Just clone the repo and setup the virtualenv:

git clone https://github.com/janw/tapedrive.git
cd tapedrive
poetry install

To further simplify running the dev environment, it is advised to add the necessary flags to Python / the virtualenv via a .env file. It will be automatically loaded as part of the Django configuration.. You can use it to provide the default development settings, like a DATABASE_URL for your local database instance, or setting DEBUG flags for Django. My .env contains theses variables:

ENVIRONMENT=DEVELOPMENT
DJANGO_DEBUG='yes'
DJANGO_TEMPLATE_DEBUG='yes'
DATABASE_URL=postgres://tapedrive:supersecretpassword@localhost/tapedrive

Setting ENVIRONMENT is not strictly necessary, as Tape Drive launches in development implicitly when cloned from the Git repository. By extension of that, the same is true for the DEBUG flags that are enabled in the development environment by default as well.

Todos and Planned Features

Currently the main goal is completing the sought out feature set around archiving. This mostly entails:

  • Automated periodic feed updates
  • Automated downloads of newly published episodes, including a Subscribed/Unsubscribed paradim to include/exclude feeds from the automated downloads
  • Robust storage paradigm of episode files and metadata
  • Full test coverage, and replacing actual feed downloads with mocked/vcr'ed fixtures

Furthermore I am considering implementing some of the following features:

  • Web player for episode playback, including storing the playback state of episodes
  • Smart handling of duplicate downloads if applicable (hashing, filename comparison, etc.)
  • Turn the used utilities for handling podcast feeds into an externally usable library

Authors and License

The project is licensed under the Apache License 2.0 - see the LICENSE file for details.

tapedrive's People

Contributors

gitter-badger avatar janw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tapedrive's Issues

Look into adding a dashboard showing latest changes

For this I am thinking about an additional section on the "Activity" page that (placed above the act. table) shows for example a graph of downloaded/fetched episodes over time, download sizes, sizes per podcast, etc.

Inspiration came from various dashboard modules for Django et al., and obviously Grafana. The feature should not be as elaborate as the latter. But just a litte bit of graphical eye candy would be nice.

Some references

Potential libraries

Extend privacy-first shownotes functionality

Function belongs into template tags, and should be selectable per podcast (if wanted) and by 3 levels:

  1. Load no images at all until clicked
  2. Load first-party images (from domain of the feed)
    3 Load all images

This means sanitizing shownotes should include images untouched (via the EXTENDED_HTML_TAGS/ATTRIBUTES collections), and template tag for displaying shownotes then parses the wanted (or unwanted) images into placeholders (or not)

Restructure front-end, make use of JS framework

As I'm getting more aware of the limitations of my current level of front-end JS experience, and the way I understood it as a second-class citizen initially, it is time to restructure some of the front-end logic, and implement it using a proper JS framework.

Considering Angular, React, and Vue as the most sensible contestants, so far it looks like Vue is going to make the cut: Vue can easily be plugged into an existing setup, is built with HTML-first templates, and —to my eyes— looks the prettiest, syntax-wise . 🤩

Nail down basic database model

While implementing basic UI paradigms, establishing a solid database model is the main target of the first few weeks.

Global SCSS styling for elements

Currently I use on-element classes to add project-wide button stylings that are different from default-bootstrap styling. This should be done globally via the given SCSS variables, for example for

Buttons

  @include button-size($btn-padding-y, $btn-padding-x, $font-size-base, $btn-line-height, $btn-border-radius);
  @include transition($btn-transition);

Button Color

[Feature]: Automatic Wayback machine

I recently have come across your project, I had a quick idea that I didn't see listed anywhere. What would be the possibility of automatically archiving any links in the show notes to the Wayback Machine then using that link instead of the original so that you don't end up with broken links in the show notes.

Clarify connection details on first launch

web_1  | 19:06:06 web.1    | [2018-05-29 19:06:06 +0000] [57] [INFO] Listening at: http://0.0.0.0:8273 (57)

0.0.0.0 is confusing, should be 'listening on all interfaces' or smthn.

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Add structures for saving shownotes

An episodes feed entry will contain its content in list of dicts, for example:

feed['entries'][0]['content'] = [{'type': 'text/plain',
  'language': None,
  'base': 'https://www.relay.fm/rd/feed',
  'value': 'This week, John and Merlin come in hot with an accidental main topic: punctuality.'},
 {'type': 'text/html',
  'language': None,
  'base': 'https://www.relay.fm/rd/feed',
  'value': '<p>This week, J ...'}]

To support this structure and preserve all list entries, the most sensible approach seems to be a separate EpisodeContent model with ForeignKey to an Episode, `related_name='content'. The table will hold all values from the dicts.

For that, on each episode creation from the info dict, new instances of EpisodeContent would have to be created as well.

Extend onboarding

  • 1. Welcome, select/import feeds
  • 2. Make user define storage backend
  • 3. Let user set crawling/refresh rate

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.