janw / tapedrive Goto Github PK

View Code? Open in Web Editor NEW

36.0 3.0 5.0 11.95 MB

[WIP] The selfhosted Podcast Archive

License: Apache License 2.0

Python 65.66% HTML 0.23% JavaScript 3.38% Shell 0.12% Dockerfile 0.71% Vue 22.33% Procfile 0.06% SCSS 7.51%

podcast-client python self-hosted django

tapedrive's Introduction

Tape Drive, a selfhosted Podcast Client and Archiver

Tape Drive is a self-hosted podcast client with an emphasis on long-term storage of episodes. The idea is to organize subscribed podcasts properly on disk following a robust naming scheme, and including the available metadata such as shownotes, episode and season numbering, etc.

In future versions I plan on extending Tape Drive, for example to include a web player for listening to downloaded episodes, hoping to turn it into a simple web-based podcatcher.

Whenever possible Tape Drive takes a privacy-first approach. For example this means removing tracking parameters from download URLs and, as many podcasts are starting to embed tracking pixels and pull images from external domains, Tape Drive will show such embeds only shown after the user explicitly requests it.

Current state of affairs

Tape Drive is built using Django and I'm working on a first stable release. Feel free to check it out but there are no guarantees on it working at all at any time right now.

Features

Aesthetically pleasing presentation of podcasts, episodes, and their metadata
Fully responsive web-UI with distinctively unexcited behavior (no fancy animations, clean look, etc.)
Automatic episode downlodas for subscribed podcasts, including downloading the back catalog
Storage according to a robust and human-readable naming scheme, including shownotes metadata
Manually initiated episode downloads possible
Ability to efficiently fetch multi-page feeds

Impressions

Prerequisites and setup

Right now, the only fully supported way of deploying Tape Drive is via Docker container. The image is available through GitLab Container Registry, and in the hack/docker/ directory you will find an example docker-compose.yml file for the real out-of-the-box experience. Tape Drive supports PostgreSQL as its database back-end.

Non-dockerized deployment is absolutely possible, as Tape Drive is basically just a Django application.

When applying the initial batch of database migrations (users.0003_create_initial_superuser to be precise), an admin account is created with a random password. That password will printed to the console log of the migrations run:

Applying users.0003_create_initial_superuser...Creating initial user: admin // pass: <randompass>

You may use those credentials to log in at first, and change the password or create additional users from within Tape Drive.

Tape Drive in a standalone Docker container

Creating a Docker container from the Tape Drive Docker image is pretty a straight-forward process. As discussed above, Tape Drive expects you to provide a database connection on input, formatted as the well-known DATABASE_URL environment variable:

docker create \
  --name=tapedrive \
  -v <path to data>:/data \
  -e DATABASE_URL="postgres://USERNAME:PASSWORD@HOSTNAME:PORT/DATABASE_NAME" \
  -e DJANGO_ALLOWED_HOSTS=127.0.0.1,myfancy.domainname.example \
  -p 8273:8273 \
  ghcr.io/janw/tapedrive

Use the DJANGO_ALLOWED_HOSTS variable to tell Tape Drive which hostnames to accept connections from (as a comma-separated list). Most likely you want to link the storage path inside the container to a real location on your filesystem. By default, Tape Drive downloads data to /data, hence the above -v mapping.

Development setup

Poetry is used for dependency management of Tape Drive. Just clone the repo and setup the virtualenv:

git clone https://github.com/janw/tapedrive.git
cd tapedrive
poetry install

To further simplify running the dev environment, it is advised to add the necessary flags to Python / the virtualenv via a .env file. It will be automatically loaded as part of the Django configuration.. You can use it to provide the default development settings, like a DATABASE_URL for your local database instance, or setting DEBUG flags for Django. My .env contains theses variables:

ENVIRONMENT=DEVELOPMENT
DJANGO_DEBUG='yes'
DJANGO_TEMPLATE_DEBUG='yes'
DATABASE_URL=postgres://tapedrive:supersecretpassword@localhost/tapedrive

Setting ENVIRONMENT is not strictly necessary, as Tape Drive launches in development implicitly when cloned from the Git repository. By extension of that, the same is true for the DEBUG flags that are enabled in the development environment by default as well.

Todos and Planned Features

Currently the main goal is completing the sought out feature set around archiving. This mostly entails:

Automated periodic feed updates
Automated downloads of newly published episodes, including a Subscribed/Unsubscribed paradim to include/exclude feeds from the automated downloads
Robust storage paradigm of episode files and metadata
Full test coverage, and replacing actual feed downloads with mocked/vcr'ed fixtures

Furthermore I am considering implementing some of the following features:

Web player for episode playback, including storing the playback state of episodes
Smart handling of duplicate downloads if applicable (hashing, filename comparison, etc.)
Turn the used utilities for handling podcast feeds into an externally usable library

Authors and License

Jan Willhaus - Initial work
Cassette Icon by JustUI via IconFinder
Login backdrop photo "Grado Headphones SR80e" by Michael Mroczek via Unsplash

The project is licensed under the Apache License 2.0 - see the LICENSE file for details.

tapedrive's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger fossabot pmwoodward3 palakshivlani-11 serhankarakoc

tapedrive's Issues

Look into adding a dashboard showing latest changes

For this I am thinking about an additional section on the "Activity" page that (placed above the act. table) shows for example a graph of downloaded/fetched episodes over time, download sizes, sizes per podcast, etc.

Inspiration came from various dashboard modules for Django et al., and obviously Grafana. The feature should not be as elaborate as the latter. But just a litte bit of graphical eye candy would be nice.

Some references

django-controlcenter module

Potential libraries

Add per_podcast setting for sort order

Api calls to enable/disable subscription

Extend privacy-first shownotes functionality

Function belongs into template tags, and should be selectable per podcast (if wanted) and by 3 levels:

Load no images at all until clicked
Load first-party images (from domain of the feed)
3 Load all images

This means sanitizing shownotes should include images untouched (via the EXTENDED_HTML_TAGS/ATTRIBUTES collections), and template tag for displaying shownotes then parses the wanted (or unwanted) images into placeholders (or not)

Develop 'Discovery' section

Differentiate view in serial itunes:type

Show current season, “load more” for past ones

Add user model

Make use of feeds' ETag / last-modified

Saves us from unnecessary feed updates / database queries / bandwidth usage

https://pythonhosted.org/feedparser/http-etag.html#using-last-modified-headers-to-reduce-bandwidth

If parsing shownotes, remove/disable external images/resources

Add django-channels + websocket based live updating UI

https://channels.readthedocs.io/en/latest/index.html

Add custom storage models

Dropbox
Local
FTP
SFTP
...

Allow utilization of itunes:season in path

This should be a setting to create a completely custom path from all possible conponents (in style of youtube-dl but using current .format() notation

Add site000* migration as dep. to PodcastsSettings

Also: add a custom runPython step, to fully intialize the settings object ad migration time.

Restructure front-end, make use of JS framework

As I'm getting more aware of the limitations of my current level of front-end JS experience, and the way I understood it as a second-class citizen initially, it is time to restructure some of the front-end logic, and implement it using a proper JS framework.

Considering Angular, React, and Vue as the most sensible contestants, so far it looks like Vue is going to make the cut: Vue can easily be plugged into an existing setup, is built with HTML-first templates, and —to my eyes— looks the prettiest, syntax-wise . 🤩

Nail down basic database model

While implementing basic UI paradigms, establishing a solid database model is the main target of the first few weeks.

Global SCSS styling for elements

Currently I use on-element classes to add project-wide button stylings that are different from default-bootstrap styling. This should be done globally via the given SCSS variables, for example for

Buttons

  @include button-size($btn-padding-y, $btn-padding-x, $font-size-base, $btn-line-height, $btn-border-radius);
  @include transition($btn-transition);

Button Color

[Feature]: Automatic Wayback machine

I recently have come across your project, I had a quick idea that I didn't see listed anywhere. What would be the possibility of automatically archiving any links in the show notes to the Wayback Machine then using that link instead of the original so that you don't end up with broken links in the show notes.

Still errors when adding podcasts with emoji(?)in description

All things Git

Bring test suite up to snuff

Add options for date in filename segments

Scan storage_dir for existing files

Add “load more” button to episodes list

Deprecations from 1.11

https://docs.djangoproject.com/en/2.0/releases/1.11/

Save covers in storage_path

Clarify connection details on first launch

web_1  | 19:06:06 web.1    | [2018-05-29 19:06:06 +0000] [57] [INFO] Listening at: http://0.0.0.0:8273 (57)

0.0.0.0 is confusing, should be 'listening on all interfaces' or smthn.

Allow for manual editing of podcast details

For example:

image upload
Fix summary, description

Add webplayer functionality

https://github.com/redxtech/vue-plyr

Implement regular management job for crawling feeds

Use cron and python manage.py crawl_feeds etc.

Correlate summary and description, avoid showing duplicate content in frontend

https://github.com/seatgeek/fuzzywuzzy/blob/master/README.rst

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Add structures for saving shownotes

An episodes feed entry will contain its content in list of dicts, for example:

feed['entries'][0]['content'] = [{'type': 'text/plain',
  'language': None,
  'base': 'https://www.relay.fm/rd/feed',
  'value': 'This week, John and Merlin come in hot with an accidental main topic: punctuality.'},
 {'type': 'text/html',
  'language': None,
  'base': 'https://www.relay.fm/rd/feed',
  'value': '<p>This week, J ...'}]

To support this structure and preserve all list entries, the most sensible approach seems to be a separate EpisodeContent model with ForeignKey to an Episode, `related_name='content'. The table will hold all values from the dicts.

For that, on each episode creation from the info dict, new instances of EpisodeContent would have to be created as well.

1. Welcome, select/import feeds
2. Make user define storage backend
3. Let user set crawling/refresh rate