Giter VIP home page Giter VIP logo

library's Introduction

library (media toolkit)

A wise philosopher once told me: "the future is autotainment".

Manage and curate large media libraries. An index for your archive. Primary usage is local filesystem but also supports some virtual constructs like tracking online video playlists (eg. YouTube subscriptions) and scheduling browser tabs.

Install

Linux recommended but Windows setup instructions available.

pip install xklb

Should also work on Mac OS.

External dependencies

Required: ffmpeg

Some features work better with: mpv, fd-find, fish

Getting started

Local media

1. Extract Metadata

For thirty terabytes of video the initial scan takes about four hours to complete. After that, subsequent scans of the path (or any subpaths) are much quicker--only new files will be read by ffprobe.

library fsadd tv.db ./video/folder/

termtosvg

2. Watch / Listen from local files

library watch tv.db                           # the default post-action is to do nothing
library watch tv.db --post-action delete      # delete file after playing
library listen finalists.db -k ask_keep       # ask whether to keep file after playing

To stop playing press Ctrl+C in either the terminal or mpv

Online media

1. Download Metadata

Download playlist and channel metadata. Break free of the YouTube algo~

library tubeadd educational.db https://www.youtube.com/c/BranchEducation/videos

termtosvg

And you can always add more later--even from different websites.

library tubeadd maker.db https://vimeo.com/terburg

To prevent mistakes the default configuration is to download metadata for only the most recent 20,000 videos per playlist/channel.

library tubeadd maker.db --extractor-config playlistend=1000

Be aware that there are some YouTube Channels which have many items--for example the TEDx channel has about 180,000 videos. Some channels even have upwards of two million videos. More than you could likely watch in one sitting--maybe even one lifetime. On a high-speed connection (>500 Mbps), it can take up to five hours to download the metadata for 180,000 videos.

TIP! If you often copy and paste many URLs you can paste line-delimited text as arguments via a subshell. For example, in fish shell with cb:

library tubeadd my.db (cb)

Or in BASH:

library tubeadd my.db $(xclip -selection c)

1a. Get new videos for saved playlists

Tubeupdate will go through the list of added playlists and fetch metadata for any videos not previously seen.

library tube-update tube.db

2. Watch / Listen from websites

library watch maker.db

To stop playing press Ctrl+C in either the terminal or mpv

List all subcommands
$ library
library (v2.9.014; 82 subcommands)

Create database subcommands:
╭─────────────────┬──────────────────────────────────────────╮
│ fs-add          │ Add local media                          │
├─────────────────┼──────────────────────────────────────────┤
│ tube-add        │ Add online video media (yt-dlp)          │
├─────────────────┼──────────────────────────────────────────┤
│ web-add         │ Add open-directory media                 │
├─────────────────┼──────────────────────────────────────────┤
│ gallery-add     │ Add online gallery media (gallery-dl)    │
├─────────────────┼──────────────────────────────────────────┤
│ tabs-add        │ Create a tabs database; Add URLs         │
├─────────────────┼──────────────────────────────────────────┤
│ links-add       │ Create a link-scraping database          │
├─────────────────┼──────────────────────────────────────────┤
│ site-add        │ Auto-scrape website data to SQLITE       │
├─────────────────┼──────────────────────────────────────────┤
│ reddit-add      │ Create a reddit database; Add subreddits │
├─────────────────┼──────────────────────────────────────────┤
│ hn-add          │ Create / Update a Hacker News database   │
├─────────────────┼──────────────────────────────────────────┤
│ substack        │ Backup substack articles                 │
├─────────────────┼──────────────────────────────────────────┤
│ tildes          │ Backup tildes comments and topics        │
├─────────────────┼──────────────────────────────────────────┤
│ nicotine-import │ Import paths from nicotine+              │
├─────────────────┼──────────────────────────────────────────┤
│ places-import   │ Import places of interest (POIs)         │
├─────────────────┼──────────────────────────────────────────┤
│ row-add         │ Add arbitrary data to SQLITE             │
╰─────────────────┴──────────────────────────────────────────╯

Text subcommands:
╭──────────────────┬──────────────────────────────────────────────╮
│ cluster-sort     │ Sort text and images by similarity           │
├──────────────────┼──────────────────────────────────────────────┤
│ extract-links    │ Extract inner links from lists of web links  │
├──────────────────┼──────────────────────────────────────────────┤
│ extract-text     │ Extract human text from lists of web links   │
├──────────────────┼──────────────────────────────────────────────┤
│ markdown-links   │ Extract titles from lists of web links       │
├──────────────────┼──────────────────────────────────────────────┤
│ nouns            │ Unstructured text -> compound nouns (stdin)  │
├──────────────────┼──────────────────────────────────────────────┤
│ dates            │ Unstructured text -> timestamps, dates, time │
├──────────────────┼──────────────────────────────────────────────┤
│ json-keys-rename │ Rename JSON keys by substring match          │
├──────────────────┼──────────────────────────────────────────────┤
│ combinations     │ Enumerate possible combinations              │
╰──────────────────┴──────────────────────────────────────────────╯

Folder subcommands:
╭─────────────────┬─────────────────────────────────────────────────────────────────────╮
│ merge-mv        │ Move files and merge folders in BSD/rsync style, rename if possible │
├─────────────────┼─────────────────────────────────────────────────────────────────────┤
│ merge-folders   │ Merge two or more file trees, check for conflicts before merging    │
├─────────────────┼─────────────────────────────────────────────────────────────────────┤
│ rel-mv          │ Move files preserving parent folder hierarchy                       │
├─────────────────┼─────────────────────────────────────────────────────────────────────┤
│ mergerfs-cp     │ cp files with reflink on mergerfs                                   │
├─────────────────┼─────────────────────────────────────────────────────────────────────┤
│ scatter         │ Scatter files between folders or disks                              │
├─────────────────┼─────────────────────────────────────────────────────────────────────┤
│ mv-list         │ Find specific folders to move to different disks                    │
├─────────────────┼─────────────────────────────────────────────────────────────────────┤
│ mount-stats     │ Show some relative mount stats                                      │
├─────────────────┼─────────────────────────────────────────────────────────────────────┤
│ big-dirs        │ Show large folders                                                  │
├─────────────────┼─────────────────────────────────────────────────────────────────────┤
│ similar-folders │ Find similar folders based on folder name, size, and count          │
╰─────────────────┴─────────────────────────────────────────────────────────────────────╯

File subcommands:
╭────────────────┬─────────────────────────────────────────────────────╮
│ christen       │ Clean file paths                                    │
├────────────────┼─────────────────────────────────────────────────────┤
│ sample-hash    │ Calculate a hash based on small file segments       │
├────────────────┼─────────────────────────────────────────────────────┤
│ sample-compare │ Compare files using sample-hash and other shortcuts │
├────────────────┼─────────────────────────────────────────────────────┤
│ similar-files  │ Find similar files based on filename and size       │
├────────────────┼─────────────────────────────────────────────────────┤
│ llm-map        │ Run LLMs across multiple files                      │
╰────────────────┴─────────────────────────────────────────────────────╯

Tabular data subcommands:
╭──────────────────┬───────────────────────────────────────────────╮
│ eda              │ Exploratory Data Analysis on table-like files │
├──────────────────┼───────────────────────────────────────────────┤
│ mcda             │ Multi-criteria Ranking for Decision Support   │
├──────────────────┼───────────────────────────────────────────────┤
│ markdown-tables  │ Print markdown tables from table-like files   │
├──────────────────┼───────────────────────────────────────────────┤
│ columns          │ Print columns of table-like files             │
├──────────────────┼───────────────────────────────────────────────┤
│ incremental-diff │ Diff large table-like files in chunks         │
╰──────────────────┴───────────────────────────────────────────────╯

Media File subcommands:
╭────────────────┬────────────────────────────────────────────────────────╮
│ media-check    │ Check video and audio files for corruption via ffmpeg  │
├────────────────┼────────────────────────────────────────────────────────┤
│ process-ffmpeg │ Shrink video/audio to AV1/Opus format (.mkv, .mka)     │
├────────────────┼────────────────────────────────────────────────────────┤
│ process-image  │ Shrink images by resizing and AV1 image format (.avif) │
╰────────────────┴────────────────────────────────────────────────────────╯

Multi-database subcommands:
╭──────────────────┬────────────────────────╮
│ merge-dbs        │ Merge SQLITE databases │
├──────────────────┼────────────────────────┤
│ copy-play-counts │ Copy play history      │
╰──────────────────┴────────────────────────╯

Filesystem Database subcommands:
╭────────────┬──────────────────────────╮
│ disk-usage │ Show disk usage          │
├────────────┼──────────────────────────┤
│ search-db  │ Search a SQLITE database │
╰────────────┴──────────────────────────╯

Media Database subcommands:
╭─────────────────┬─────────────────────────────────────────────────────────────╮
│ block           │ Block a channel                                             │
├─────────────────┼─────────────────────────────────────────────────────────────┤
│ playlists       │ List stored playlists                                       │
├─────────────────┼─────────────────────────────────────────────────────────────┤
│ download        │ Download media                                              │
├─────────────────┼─────────────────────────────────────────────────────────────┤
│ download-status │ Show download status                                        │
├─────────────────┼─────────────────────────────────────────────────────────────┤
│ redownload      │ Re-download deleted/lost media                              │
├─────────────────┼─────────────────────────────────────────────────────────────┤
│ history         │ Show and manage playback history                            │
├─────────────────┼─────────────────────────────────────────────────────────────┤
│ history-add     │ Add history from paths                                      │
├─────────────────┼─────────────────────────────────────────────────────────────┤
│ stats           │ Show some event statistics (created, deleted, watched, etc) │
├─────────────────┼─────────────────────────────────────────────────────────────┤
│ search          │ Search captions / subtitles                                 │
├─────────────────┼─────────────────────────────────────────────────────────────┤
│ optimize        │ Re-optimize database                                        │
╰─────────────────┴─────────────────────────────────────────────────────────────╯

Playback subcommands:
╭────────────┬────────────────────────────────────────────────────────╮
│ watch      │ Watch / Listen                                         │
├────────────┼────────────────────────────────────────────────────────┤
│ now        │ Show what is currently playing                         │
├────────────┼────────────────────────────────────────────────────────┤
│ next       │ Play next file and optionally delete current file      │
├────────────┼────────────────────────────────────────────────────────┤
│ seek       │ Set playback to a certain time, fast-forward or rewind │
├────────────┼────────────────────────────────────────────────────────┤
│ stop       │ Stop all playback                                      │
├────────────┼────────────────────────────────────────────────────────┤
│ pause      │ Pause all playback                                     │
├────────────┼────────────────────────────────────────────────────────┤
│ tabs-open  │ Open your tabs for the day                             │
├────────────┼────────────────────────────────────────────────────────┤
│ links-open │ Open links from link dbs                               │
├────────────┼────────────────────────────────────────────────────────┤
│ surf       │ Auto-load browser tabs in a streaming way (stdin)      │
╰────────────┴────────────────────────────────────────────────────────╯

Database enrichment subcommands:
╭────────────────────┬────────────────────────────────────────────────────╮
│ dedupe-db          │ Dedupe SQLITE tables                               │
├────────────────────┼────────────────────────────────────────────────────┤
│ dedupe-media       │ Dedupe similar media                               │
├────────────────────┼────────────────────────────────────────────────────┤
│ merge-online-local │ Merge online and local data                        │
├────────────────────┼────────────────────────────────────────────────────┤
│ mpv-watchlater     │ Import mpv watchlater files to history             │
├────────────────────┼────────────────────────────────────────────────────┤
│ reddit-selftext    │ Copy selftext links to media table                 │
├────────────────────┼────────────────────────────────────────────────────┤
│ tabs-shuffle       │ Randomize tabs.db a bit                            │
├────────────────────┼────────────────────────────────────────────────────┤
│ pushshift          │ Convert pushshift data to reddit.db format (stdin) │
╰────────────────────┴────────────────────────────────────────────────────╯

Update database subcommands:
╭────────────────┬─────────────────────────────────╮
│ fs-update      │ Update local media              │
├────────────────┼─────────────────────────────────┤
│ tube-update    │ Update online video media       │
├────────────────┼─────────────────────────────────┤
│ web-update     │ Update open-directory media     │
├────────────────┼─────────────────────────────────┤
│ gallery-update │ Update online gallery media     │
├────────────────┼─────────────────────────────────┤
│ links-update   │ Update a link-scraping database │
├────────────────┼─────────────────────────────────┤
│ reddit-update  │ Update reddit media             │
╰────────────────┴─────────────────────────────────╯

Misc subcommands:
╭────────────────┬─────────────────────────────────────────╮
│ export-text    │ Export HTML files from SQLite databases │
├────────────────┼─────────────────────────────────────────┤
│ dedupe-czkawka │ Process czkawka diff output             │
╰────────────────┴─────────────────────────────────────────╯

Examples

Watch online media on your PC

wget https://github.com/chapmanjacobd/library/raw/main/example_dbs/mealtime.tw.db
library watch mealtime.tw.db --random --duration 30m

Listen to online media on a chromecast group

wget https://github.com/chapmanjacobd/library/raw/main/example_dbs/music.tl.db
library listen music.tl.db -ct "House speakers" --random

Hook into HackerNews

wget https://github.com/chapmanjacobd/hn_mining/raw/main/hackernews_only_direct.tw.db
library watch hackernews_only_direct.tw.db --random --ignore-errors

Organize via separate databases

library fsadd --audio audiobooks.db ./audiobooks/
library fsadd --audio podcasts.db ./podcasts/ ./another/more/secret/podcasts_folder/

# merge later if you want
library merge-dbs --pk path -t playlists,media audiobooks.db podcasts.db both.db

# or split
library merge-dbs --pk path -t playlists,media both.db audiobooks.db -w 'path like "%/audiobooks/%"'
library merge-dbs --pk path -t playlists,media both.db podcasts.db -w 'path like "%/podcasts%"'

Guides

Music alarm clock

via termux crontab

Wake up to your own music

30 7 * * * library listen ./audio.db

Wake up to your own music only when you are not home (computer on local IP)

30 7 * * * timeout 0.4 nc -z 192.168.1.12 22 || library listen --random

Wake up to your own music on your Chromecast speaker group only when you are home

30 7 * * * ssh 192.168.1.12 library listen --cast --cast-to "Bedroom pair"

Browser Tabs

Visit websites on a schedule

tabs is a way to organize your visits to URLs that you want to remember every once in a while.

The main benefit of tabs is that you can have a large amount of tabs saved (say 500 monthly tabs) and only the smallest amount of tabs to satisfy that goal (500/30) tabs will open each day. 17 tabs per day seems manageable--500 all at once does not.

The use-case of tabs are websites that you know are going to change: subreddits, games, or tools that you want to use for a few minutes daily, weekly, monthly, quarterly, or yearly.

1. Add your websites

library tabsadd tabs.db --frequency monthly --category fun \
    https://old.reddit.com/r/Showerthoughts/top/?sort=top&t=month \
    https://old.reddit.com/r/RedditDayOf/top/?sort=top&t=month

2. Add library tabs to cron

library tabs is meant to run once per day. Here is how you would configure it with crontab:

45 9 * * * DISPLAY=:0 library tabs /home/my/tabs.db

Or with systemd:

tee ~/.config/systemd/user/tabs.service
[Unit]
Description=xklb daily browser tabs

[Service]
Type=simple
RemainAfterExit=no
Environment="DISPLAY=:0"
ExecStart=library tabs /home/my/tabs.db

tee ~/.config/systemd/user/tabs.timer
[Unit]
Description=xklb daily browser tabs timer

[Timer]
Persistent=yes
OnCalendar=*-*-* 9:58

[Install]
WantedBy=timers.target

systemctl --user daemon-reload
systemctl --user enable --now tabs.service

You can also invoke tabs manually:

library tabs tabs.db -L 1  # open one tab

Incremental surfing. 📈🏄 totally rad!

Find large folders

Curate with library big-dirs

If you are looking for candidate folders for curation (ie. you need space but don't want to buy another hard drive). The big-dirs subcommand was written for that purpose:

$ library big-dirs fs/d.db

You may filter by folder depth (similar to QDirStat or WizTree)

$ library big-dirs --depth=3 audio.db

There is also an flag to prioritize folders which have many files which have been deleted (for example you delete songs you don't like--now you can see who wrote those songs and delete all their other songs...)

$ library big-dirs --sort-groups-by deleted audio.db

Recently, this functionality has also been integrated into watch/listen subcommands so you could just do this:

$ library watch --big-dirs ./my.db
$ lb wt -B  # shorthand equivalent

Backfill data

Backfill missing YouTube videos from the Internet Archive
for base in https://youtu.be/ http://youtu.be/ http://youtube.com/watch?v= https://youtube.com/watch?v= https://m.youtube.com/watch?v= http://www.youtube.com/watch?v= https://www.youtube.com/watch?v=
    sqlite3 video.db "
        update or ignore media
            set path = replace(path, '$base', 'https://web.archive.org/web/2oe_/http://wayback-fakeurl.archive.org/yt/')
              , time_deleted = 0
        where time_deleted > 0
        and (path = webpath or path not in (select webpath from media))
        and path like '$base%'
    "
end
Backfill reddit databases with pushshift data

https://github.com/chapmanjacobd/reddit_mining/

for reddit_db in ~/lb/reddit/*.db
    set subreddits (sqlite-utils $reddit_db 'select path from playlists' --tsv --no-headers | grep old.reddit.com | sed 's|https://old.reddit.com/r/\(.*\)/|\1|' | sed 's|https://old.reddit.com/user/\(.*\)/|u_\1|' | tr -d "\r")

    ~/github/xk/reddit_mining/links/
    for subreddit in $subreddits
        if not test -e "$subreddit.csv"
            echo "octosql -o csv \"select path,score,'https://old.reddit.com/r/$subreddit/' as playlist_path from `../reddit_links.parquet` where lower(playlist_path) = '$subreddit' order by score desc \" > $subreddit.csv"
        end
    end | parallel -j8

    for subreddit in $subreddits
        sqlite-utils upsert --pk path --alter --csv --detect-types $reddit_db media $subreddit.csv
    end

    library tubeadd --safe --ignore-errors --force $reddit_db (sqlite-utils --raw-lines $reddit_db 'select path from media')
end

Datasette

Explore `library` databases in your browser
pip install datasette
datasette tv.db

Pipe to mnamer

Rename poorly named files
pip install mnamer
mnamer --movie-directory ~/d/70_Now_Watching/ --episode-directory ~/d/70_Now_Watching/ \
    --no-overwrite -b (library watch -p fd -s 'path : McCloud')
library fsadd ~/d/70_Now_Watching/

Pipe to lowcharts

$ library watch -p f -col time_created | lowcharts timehist -w 80
Matches: 445183.
Each ∎ represents a count of 1896
[2022-04-13 03:16:05] [151689] ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
[2022-04-19 07:59:37] [ 16093] ∎∎∎∎∎∎∎∎
[2022-04-25 12:43:09] [ 12019] ∎∎∎∎∎∎
[2022-05-01 17:26:41] [ 48817] ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
[2022-05-07 22:10:14] [ 36259] ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
[2022-05-14 02:53:46] [  3942] ∎∎
[2022-05-20 07:37:18] [  2371] ∎
[2022-05-26 12:20:50] [   517]
[2022-06-01 17:04:23] [  4845] ∎∎
[2022-06-07 21:47:55] [  2340] ∎
[2022-06-14 02:31:27] [   563]
[2022-06-20 07:14:59] [ 13836] ∎∎∎∎∎∎∎
[2022-06-26 11:58:32] [  1905] ∎
[2022-07-02 16:42:04] [  1269]
[2022-07-08 21:25:36] [  3062] ∎
[2022-07-15 02:09:08] [  9192] ∎∎∎∎
[2022-07-21 06:52:41] [ 11955] ∎∎∎∎∎∎
[2022-07-27 11:36:13] [ 50938] ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
[2022-08-02 16:19:45] [ 70973] ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
[2022-08-08 21:03:17] [  2598] ∎

BTW, for some cols like time_deleted you'll need to specify a where clause so they aren't filtered out:

$ library watch -p f -col time_deleted -w time_deleted'>'0 | lowcharts timehist -w 80

video width

fps

Usage

Create database subcommands

fs-add
Add local media
$ library fs-add -h
usage: library fs-add [(--video) | --audio | --image |  --text | --filesystem] DATABASE PATH ...

The default database type is video

    library fsadd tv.db ./tv/
    library fsadd --video tv.db ./tv/  # equivalent

You can also create audio databases. Both audio and video use ffmpeg to read metadata

    library fsadd --audio audio.db ./music/

Image uses ExifTool

    library fsadd --image image.db ./photos/

Text will try to read files and save the contents into a searchable database

    library fsadd --text text.db ./documents_and_books/

Create a text database and scan with OCR and speech-recognition

    library fsadd --text --ocr --speech-recognition ocr.db ./receipts_and_messages/

Create a video database and read internal/external subtitle files into a searchable database

    library fsadd --scan-subtitles tv.search.db ./tv/ ./movies/

Decode media to check for corruption (slow)

    library fsadd --check-corrupt
    # See media-check command for full options

Normally only relevant filetypes are included. You can scan all files with this flag

    library fsadd --scan-all-files mixed.db ./tv-and-maybe-audio-only-files/
    # I use that with this to keep my folders organized
    library watch -w 'video_count=0 and audio_count>=1' -pf mixed.db | parallel mv {} ~/d/82_Audiobooks/

Remove path roots with --force

    library fsadd audio.db /mnt/d/Youtube/
    [/mnt/d/Youtube] Path does not exist

    library fsadd --force audio.db /mnt/d/Youtube/
    [/mnt/d/Youtube] Path does not exist
    [/mnt/d/Youtube] Building file list...
    [/mnt/d/Youtube] Marking 28932 orphaned metadata records as deleted

If you run out of RAM, for example scanning large VR videos, you can lower the number of threads via --threads

    library fsadd vr.db --delete-unplayable --check-corrupt --full-scan-if-corrupt 15% --delete-corrupt 20% ./vr/ --threads 3

Move files on import

    library fsadd audio.db --move ~/library/ ./added_folder/
    This will run destination paths through `library christen` and move files relative to the added folder root
tube-add
Add online video media (yt-dlp)
$ library tube-add -h
usage: library tube-add [--safe] [--extra] [--subs] [--auto-subs] DATABASE URL ...

Create a dl database / add links to an existing database

    library tubeadd dl.db https://www.youdl.com/c/BranchEducation/videos

Add links from a line-delimited file

    cat ./my_yt_subscriptions.txt | library tubeadd reddit.db -

Add metadata to links already in a database table

    library tubeadd --force reddit.db (sqlite-utils --raw-lines reddit.db 'select path from media')

Fetch extra metadata

    By default tubeadd will quickly add media at the expense of less metadata.
    If you plan on using `library download` then it doesn't make sense to use `--extra`.
    Downloading will add the extra metadata automatically to the database.
    You can always fetch more metadata later via tubeupdate
    library tube-update tw.db --extra
web-add
Add open-directory media
$ library web-add -h
usage: library web-add [(--filesystem) | --video | --audio | --image | --text] DATABASE URL ...

Scan open directories

    library web-add open_dir.db --video http://1.1.1.1/

Check download size of all videos matching some criteria

    library download --fs open_dir.db --prefix ~/d/dump/video/ -w 'height<720' -E preview -pa

    path         count  download_duration                  size    avg_size
    ---------  -------  ----------------------------  ---------  ----------
    Aggregate     5694  2 years, 7 months and 5 days  724.4 GiB   130.3 MiB

Download all videos matching some criteria

    library download --fs open_dir.db --prefix ~/d/dump/video/ -w 'height<720' -E preview

Stream directly to mpv

    library watch open_dir.db
gallery-add
Add online gallery media (gallery-dl)
$ library gallery-add -h
usage: library gallery-add DATABASE URL ...

Add gallery_dl URLs to download later or periodically update

If you have many URLs use stdin

    cat ./my-favorite-manhwa.txt | library galleryadd your.db --insert-only -
tabs-add
Create a tabs database; Add URLs
$ library tabs-add -h
usage: library tabs-add [--frequency daily weekly (monthly) quarterly yearly] [--no-sanitize] DATABASE URL ...

Adding one URL

    library tabsadd -f daily tabs.db https://wiby.me/surprise/

    Depending on your shell you may need to escape the URL (add quotes)

    If you use Fish shell know that you can enable features to make pasting easier
        set -U fish_features stderr-nocaret qmark-noglob regex-easyesc ampersand-nobg-in-token

    Also I recommend turning Ctrl+Backspace into a super-backspace for repeating similar commands with long args
        echo 'bind \b backward-kill-bigword' >> ~/.config/fish/config.fish

Importing from a line-delimitated file

    library tabsadd -f yearly -c reddit tabs.db (cat ~/mc/yearly-subreddit.cron)
links-add
Create a link-scraping database
$ library links-add -h
usage: library links-add DATABASE PATH ... [--case-sensitive] [--cookies-from-browser BROWSER[+KEYRING][:PROFILE][::CONTAINER]] [--selenium] [--manual] [--scroll] [--auto-pager] [--poke] [--chrome] [--local-html] [--file FILE]

Database version of extract-links

You can fine-tune what links get saved with --path/text/before/after-include/exclude.

    library links-add --path-include /video/

Import links from args

    library links-add --no-extract links.db (cb)

Import lines from stdin

    cb | library linksdb example_dbs/links.db --skip-extract -

How I use it

    library links-add links.db https://video/site/ --path-include /video/

    library links-add links.db https://loginsite/ --path-include /article/ --cookies-from-browser firefox
    library links-add links.db https://loginsite/ --path-include /article/ --cookies-from-browser chrome

    cb -t text/html | xidel -s - -e '//@title' | unique | lb linksdb ~/mc/music.db -c p1 --skip-extract -

    library links-add --path-include viewtopic.php --cookies-from-browser firefox \
    --page-key start --page-start 0 --page-step 50 --fixed-pages 14 --stop-pages-no-match 1 \
    plab.db https://plab/forum/tracker.php?o=(string replace ' ' \n -- 1 4 7 10 15)&s=2&tm=-1&f=(string replace ' ' \n -- 1670 1768 60 1671 1644 1672 1111 508 555 1112 1718 1143 1717 1851 1713 1712 1775 1674 902 1675 36 1830 1803 1831 1741 1676 1677 1780 1110 1124 1784 1769 1793 1797 1804 1819 1825 1836 1842 1846 1857 1861 1867 1451 1788 1789 1792 1798 1805 1820 1826 1837 1843 1847 1856 1862 1868 284 1853 1823 1800 1801 1719 997 1818 1849 1711 1791 1762)
site-add
Auto-scrape website data to SQLITE
$ library site-add -h
usage: library site-add DATABASE PATH ... [--auto-pager] [--poke] [--local-html] [--file FILE]

Extract data from website requests to a database

    library siteadd jobs.st.db --poke https://hk.jobsdb.com/hk/search-jobs/python/

Requires selenium-wire
Requires xmltodict when using --extract-xml

    pip install selenium-wire xmltodict

Run with `-vv` to see and interact with the browser
reddit-add
Create a reddit database; Add subreddits
$ library reddit-add -h
usage: library reddit-add [--lookback N_DAYS] [--praw-site bot1] DATABASE URL ...

Fetch data for redditors and reddits

    library redditadd interesting.db https://old.reddit.com/r/coolgithubprojects/ https://old.reddit.com/user/Diastro

If you have a file with a list of subreddits you can do this

    library redditadd 96_Weird_History.db --subreddits (cat ~/mc/96_Weird_History-reddit.txt)

Likewise for redditors

    library redditadd shadow_banned.db --redditors (cat ~/mc/shadow_banned.txt)

To remove entries (for example when you get 404s)

    library search-db reddit.db playlists --or --exact subreddit1 subreddit2 --soft-delete

Note that reddit's API is limited to 1000 posts and it usually doesn't go back very far historically.
Also, it may be the case that reddit's API (praw) will stop working in the near future. For both of these problems
my suggestion is to use pushshift data.
You can find more info here: https://github.com/chapmanjacobd/reddit_mining#how-was-this-made
hn-add
Create / Update a Hacker News database
$ library hn-add -h
usage: library hn-add [--oldest] DATABASE

Fetch latest stories first

    library hnadd hn.db -v
    Fetching 154873 items (33212696 to 33367569)
    Saving comment 33367568
    Saving comment 33367543
    Saving comment 33367564
    ...

Fetch oldest stories first

    library hnadd --oldest hn.db
substack
Backup substack articles
$ library substack -h
usage: library substack DATABASE PATH ...

Backup substack articles
tildes
Backup tildes comments and topics
$ library tildes -h
usage: library tildes DATABASE USER

Backup tildes.net user comments and topics

    library tildes tildes.net.db xk3

Without cookies you are limited to the first page. You can use cookies like this
    https://github.com/rotemdan/ExportCookies
    library tildes tildes.net.db xk3 --cookies ~/Downloads/cookies-tildes-net.txt
nicotine-import
Import paths from nicotine+
$ library nicotine-import -h
usage: library nicotine-import DATABASE PATH ...

Load records from Nicotine+ File Lists

    library nicotine-import ~/lb/soulseek.db /home/xk/.local/share/nicotine/usershares/*

By default we track deletions when only one file list is specified

    library nicotine-import ~/lb/soulseek.db /home/xk/.local/share/nicotine/usershares/user1
    Marking 508387 orphaned metadata records as deleted

    library nicotine-import ~/lb/soulseek.db /home/xk/.local/share/nicotine/usershares/user2
    Marking 31862 metadata records as undeleted
    Marking 216495 orphaned metadata records as deleted

    If this is undesirable, pass the `--no-track-deleted` flag
places-import
Import places of interest (POIs)
$ library places-import -h
usage: library places-import DATABASE PATH ...

Load POIs from Google Maps Google Takeout
row-add
Add arbitrary data to SQLITE
$ library row-add -h
usage: library row-add DATABASE [--table-name TABLE_NAME] --COLUMN-NAME VALUE

Add a row to sqlite

    library row-add t.db --test_b 1 --test-a 2

    ### media (1 rows)
    |   test_b |   test_a |
    |----------|----------|
    |        1 |        2 |

Text subcommands

cluster-sort
Sort text and images by similarity
$ library cluster-sort -h
usage: library cluster-sort [input_path | stdin] [output_path | stdout]

Group lines of text into sorted output

    echo 'red apple
    broccoli
    yellow
    green
    orange apple
    red apple' | library cluster-sort

    orange apple
    red apple
    red apple
    broccoli
    green
    yellow

Show the groupings

    echo 'red apple
    broccoli
    yellow
    green
    orange apple
    red apple' | library cluster-sort --print-groups

    [
        {'grouped_paths': ['orange apple', 'red apple', 'red apple']},
        {'grouped_paths': ['broccoli', 'green', 'yellow']}
    ]

Auto-sort images into directories

    echo 'image1.jpg
    image2.jpg
    image3.jpg' | library cluster-sort --image --move-groups

Print similar paths

    library fs 0day.db -pa --cluster --print-groups
extract-links
Extract inner links from lists of web links
$ library extract-links -h
usage: library extract-links PATH ... [--case-sensitive] [--scroll] [--download] [--local-html] [--file FILE]

Extract links from within local HTML fragments, files, or remote pages; filtering on link text and nearby plain-text

    library links https://en.wikipedia.org/wiki/List_of_bacon_dishes --path-include https://en.wikipedia.org/wiki/ --after-include famous
    https://en.wikipedia.org/wiki/Omelette

Read from local clipboard and filter out links based on nearby plain text

    library links --local-html (cb -t text/html | psub) --after-exclude paranormal spooky horror podcast tech fantasy supernatural lecture sport
    # note: the equivalent BASH-ism is <(xclip -selection clipboard -t text/html)

Use --selenium for sites require JavaScript

    library links --selenium https://archive.org/search?query=subject%3A%22Archive.org+Census%22 --path-include census

    Run with `-vv` to see the browser that normally loads in the background
extract-text
Extract human text from lists of web links
$ library extract-text -h
usage: library extract-text PATH ... [--skip-links]

Sorting suggestions

    library extract-text --skip-links --local-html (cb -t text/html | psub) | library cs --groups | jq -r '.[] | .grouped_paths | "\n" + join("\n")'
markdown-links
Extract titles from lists of web links
$ library markdown-links -h
usage: library markdown-links URL ... [--cookies COOKIES] [--cookies-from-browser BROWSER[+KEYRING][:PROFILE][::CONTAINER]] [--firefox] [--chrome] [--allow-insecure] [--scroll] [--manual] [--auto-pager] [--poke] [--file FILE]

Convert URLs into Markdown links with page titles filled in

    library markdown-links https://www.youtube.com/watch?v=IgZDDW-NXDE
    [Work For Peace](https://www.youtube.com/watch?v=IgZDDW-NXDE)
nouns
Unstructured text -> compound nouns (stdin)
$ library nouns -h
usage: library nouns (stdin)

Extract compound nouns and phrases from unstructured mixed HTML plain text

    xsv select text hn_comment_202210242109.csv | library nouns | sort | uniq -c | sort --numeric-sort
dates
Unstructured text -> timestamps, dates, time
$ library dates -h
usage: library dates ARGS_OR_STDIN

Parse dates

    library dates 'October 2017'
    2017-10-01

Parse times
    library dates --time 'October 2017 3pm'
    2017-10-01T15:00:00
json-keys-rename
Rename JSON keys by substring match
$ library json-keys-rename -h
usage: library json-keys-rename --new-key 'old key substring' (stdin)

Rename/filter keys in JSON

    echo '{"The Place of Birthings": "Yo Mama", "extra": "key"}' | library json-keys-rename --country 'place of birth'
    {"country": "Yo Mama"}
combinations
Enumerate possible combinations
$ library combinations -h
usage: library combinations --PROPERTY OPTION

Enumerate the possible combinations of things that have multiple properties with more than one options

    library combinations --prop1 opt1 --prop1 opt2 --prop2 A --prop2 B

    {"prop1": "opt1", "prop2": "A"}
    {"prop1": "opt1", "prop2": "B"}
    {"prop1": "opt2", "prop2": "A"}
    {"prop1": "opt2", "prop2": "B"}

Folder subcommands

merge-mv
Move files and merge folders in BSD/rsync style, rename if possible
$ library merge-mv -h
usage: library merge-mv SOURCE ... DEST [--simulate] [--ext EXT]

By default it won't matter if source folders end with a path separator or not

    library merge-mv folder1  folder2/  # folder1 will be merged with folder2/
    library merge-mv folder1/ folder2/  # folder1 will be merged with folder2/

--bsd mode: an ending path separator determines if each source is to be placed within or merged with the destination

    library merge-mv --bsd folder1/ folder2/  # folder1 will be merged with folder2/
    library merge-mv --bsd folder1  folder2/  # folder1 will be moved to folder2/folder1/

--parent mode: always include the parent folder name when merging

    library merge-mv --parent folder1  folder2/  # folder1 will be moved to folder2/folder1/
    library merge-mv --parent folder1/ folder2/  # folder1 will be moved to folder2/folder1/
    library merge-mv --parent file1.txt folder2/ # file1 will be moved to folder2/file1_parent_folder/file1.txt

nb. This tool, like other library subcommands, only works on files. Empty folders will not be moved to the destination
merge-folders
Merge two or more file trees, check for conflicts before merging
$ library merge-folders -h
usage: library merge-folders [--replace] [--no-replace] [--simulate] SOURCES ... DESTINATION

Merge multiple folders with the same file tree into a single folder.

https://github.com/chapmanjacobd/journal/blob/main/programming/linux/misconceptions.md#mv-src-vs-mv-src

Trumps are new or replaced files from an earlier source which now conflict with a later source.
If you only have one source then the count of trumps will always be zero.
The count of conflicts also includes trumps.
rel-mv
Move files preserving parent folder hierarchy
$ library rel-mv -h
usage: library rel-mv [--simulate] SOURCE ... DEST

Move files/folders without losing hierarchy metadata

Move fresh music to your phone every Sunday

    # move last week music back to their source folders
    library mv /mnt/d/sync/weekly/ /mnt/d/check/audio/

    # move new music for this week
    library relmv (
        library listen audio.db --local-media-only --where 'play_count=0' --random -L 600 -p f
    ) /mnt/d/sync/weekly/
mergerfs-cp
cp files with reflink on mergerfs
$ library mergerfs-cp -h
usage: library mergerfs-cp SOURCE ... DEST [--simulate] [--ext EXT]

Copy files with reflink and handle mergerfs mounts

    library mergerfs-cp --dry-run d/files* d/folder2/
    cp --interactive --reflink=always /mnt/d9/files1.txt /mnt/d9/folder2/files1.txt
    ...

    btrfs fi du /mnt/d3/files1.txt /mnt/d3/folder2/files1.txt
        Total   Exclusive  Set shared  Filename
    12.57GiB       0.00B    12.57GiB  /mnt/d3/files1.txt
    12.57GiB       0.00B    12.57GiB  /mnt/d3/folder2/files1.txt
scatter
Scatter files between folders or disks
$ library scatter -h
usage: library scatter [--limit LIMIT] [--policy POLICY] [--sort SORT] --targets TARGETS DATABASE RELATIVE_PATH ...

Scatter filesystem folder trees (without mountpoints; limited functionality; good for balancing fs inodes)

    library scatter scatter.db /test/{0,1,2,3,4,5,6,7,8,9}

Reduce number of files per folder (creates more folders)

    library scatter scatter.db --max-files-per-folder 16000 /test/{0,1,2,3,4,5,6,7,8,9}

Balance files across filesystem folder trees or multiple devices (mostly useful for mergerfs)

Multi-device re-bin: balance by size

    library scatter -m /mnt/d1:/mnt/d2:/mnt/d3:/mnt/d4/:/mnt/d5:/mnt/d6:/mnt/d7 fs.db subfolder/of/mergerfs/mnt
    Current path distribution:
    ╒═════════╤══════════════╤══════════════╤═══════════════╤════════════════╤═════════════════╤════════════════╕
    │ mount   │   file_count │ total_size   │ median_size   │ time_created   │ time_modified   │ time_downloaded│
    ╞═════════╪══════════════╪══════════════╪═══════════════╪════════════════╪═════════════════╪════════════════╡
    │ /mnt/d1 │        12793 │ 169.5 GB     │ 4.5 MB        │ Jan 27         │ Jul 19 2022     │ Jan 31         │
    ├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
    │ /mnt/d2 │        13226 │ 177.9 GB     │ 4.7 MB        │ Jan 27         │ Jul 19 2022     │ Jan 31         │
    ├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
    │ /mnt/d3 │            1 │ 717.6 kB     │ 717.6 kB      │ Jan 31         │ Jul 18 2022     │ yesterday      │
    ├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
    │ /mnt/d4 │           82 │ 1.5 GB       │ 12.5 MB       │ Jan 31         │ Apr 22 2022     │ yesterday      │
    ╘═════════╧══════════════╧══════════════╧═══════════════╧════════════════╧═════════════════╧════════════════╛

    Simulated path distribution:
    5845 files should be moved
    20257 files should not be moved
    ╒═════════╤══════════════╤══════════════╤═══════════════╤════════════════╤═════════════════╤════════════════╕
    │ mount   │   file_count │ total_size   │ median_size   │ time_created   │ time_modified   │ time_downloaded│
    ╞═════════╪══════════════╪══════════════╪═══════════════╪════════════════╪═════════════════╪════════════════╡
    │ /mnt/d1 │         9989 │ 46.0 GB      │ 2.4 MB        │ Jan 27         │ Jul 19 2022     │ Jan 31         │
    ├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
    │ /mnt/d2 │        10185 │ 46.0 GB      │ 2.4 MB        │ Jan 27         │ Jul 19 2022     │ Jan 31         │
    ├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
    │ /mnt/d3 │         1186 │ 53.6 GB      │ 30.8 MB       │ Jan 27         │ Apr 07 2022     │ Jan 31         │
    ├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
    │ /mnt/d4 │         1216 │ 49.5 GB      │ 29.5 MB       │ Jan 27         │ Apr 07 2022     │ Jan 31         │
    ├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
    │ /mnt/d5 │         1146 │ 53.0 GB      │ 30.9 MB       │ Jan 27         │ Apr 07 2022     │ Jan 31         │
    ├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
    │ /mnt/d6 │         1198 │ 48.8 GB      │ 30.6 MB       │ Jan 27         │ Apr 07 2022     │ Jan 31         │
    ├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
    │ /mnt/d7 │         1182 │ 52.0 GB      │ 30.9 MB       │ Jan 27         │ Apr 07 2022     │ Jan 31         │
    ╘═════════╧══════════════╧══════════════╧═══════════════╧════════════════╧═════════════════╧════════════════╛
    ### Move 1182 files to /mnt/d7 with this command: ###
    rsync -aE --xattrs --info=progress2 --remove-source-files --files-from=/tmp/tmpmr1628ij / /mnt/d7
    ### Move 1198 files to /mnt/d6 with this command: ###
    rsync -aE --xattrs --info=progress2 --remove-source-files --files-from=/tmp/tmp9yd75f6j / /mnt/d6
    ### Move 1146 files to /mnt/d5 with this command: ###
    rsync -aE --xattrs --info=progress2 --remove-source-files --files-from=/tmp/tmpfrj141jj / /mnt/d5
    ### Move 1185 files to /mnt/d3 with this command: ###
    rsync -aE --xattrs --info=progress2 --remove-source-files --files-from=/tmp/tmpqh2euc8n / /mnt/d3
    ### Move 1134 files to /mnt/d4 with this command: ###
    rsync -aE --xattrs --info=progress2 --remove-source-files --files-from=/tmp/tmphzb0gj92 / /mnt/d4

Multi-device re-bin: balance device inodes for specific subfolder

    library scatter -m /mnt/d1:/mnt/d2 fs.db subfolder --group count --sort 'size desc'

Multi-device re-bin: only consider the most recent 100 files

    library scatter -m /mnt/d1:/mnt/d2 -l 100 -s 'time_modified desc' fs.db /

Multi-device re-bin: empty out a disk (/mnt/d2) into many other disks (/mnt/d1, /mnt/d3, and /mnt/d4)

    library scatter fs.db -m /mnt/d1:/mnt/d3:/mnt/d4 /mnt/d2

This tool is intended for local use. If transferring many small files across the network something like
[fpart](https://github.com/martymac/fpart) or [fpsync](https://www.fpart.org/fpsync/) will be better.
mv-list
Find specific folders to move to different disks
$ library mv-list -h
usage: library mv-list [--limit LIMIT] [--lower LOWER] [--upper UPPER] MOUNT_POINT DATABASE

Free up space on a specific disk. Find candidates for moving data to a different mount point


The program takes a mount point and a xklb database file. If you don't have a database file you can create one like this

    library fsadd --filesystem d.db ~/d/

But this should definitely also work with xklb audio and video databases

    library mv-list /mnt/d/ video.db

The program will print a table with a sorted list of folders which are good candidates for moving.
Candidates are determined by how many files are in the folder (so you don't spend hours waiting for folders with millions of tiny files to copy over).
The default is 4 to 4000--but it can be adjusted via the --lower and --upper flags.

    ██╗███╗░░██╗░██████╗████████╗██████╗░██╗░░░██╗░█████╗░████████╗██╗░█████╗░███╗░░██╗░██████╗
    ██║████╗░██║██╔════╝╚══██╔══╝██╔══██╗██║░░░██║██╔══██╗╚══██╔══╝██║██╔══██╗████╗░██║██╔════╝
    ██║██╔██╗██║╚█████╗░░░░██║░░░██████╔╝██║░░░██║██║░░╚═╝░░░██║░░░██║██║░░██║██╔██╗██║╚█████╗░
    ██║██║╚████║░╚═══██╗░░░██║░░░██╔══██╗██║░░░██║██║░░██╗░░░██║░░░██║██║░░██║██║╚████║░╚═══██╗
    ██║██║░╚███║██████╔╝░░░██║░░░██║░░██║╚██████╔╝╚█████╔╝░░░██║░░░██║╚█████╔╝██║░╚███║██████╔╝
    ╚═╝╚═╝░░╚══╝╚═════╝░░░░╚═╝░░░╚═╝░░╚═╝░╚═════╝░░╚════╝░░░░╚═╝░░░╚═╝░╚════╝░╚═╝░░╚══╝╚═════╝░

    Type "done" when finished
    Type "more" to see more files
    Paste a folder (and press enter) to toggle selection
    Type "*" to select all files in the most recently printed table

Then it will give you a prompt

    Paste a path:

Wherein you can copy and paste paths you want to move from the table and the program will keep track for you.

    Paste a path: /mnt/d/75_MovieQueue/720p/s11/
    26 selected paths: 162.1 GB ; future free space: 486.9 GB

You can also press the up arrow or paste it again to remove it from the list

    Paste a path: /mnt/d/75_MovieQueue/720p/s11/
    25 selected paths: 159.9 GB ; future free space: 484.7 GB

After you are done selecting folders you can press ctrl-d and it will save the list to a tmp file

    Paste a path: done

        Folder list saved to /tmp/tmp7x_75l8. You may want to use the following command to move files to an EMPTY folder target:

            rsync -a --info=progress2 --no-inc-recursive --remove-source-files --files-from=/tmp/tmp7x_75l8 -r --relative -vv --dry-run / jim:/free/real/estate/
mount-stats
Show some relative mount stats
$ library mount-stats -h
usage: library mount-stats MOUNTPOINT ...

Print relative use and free for multiple mount points

    lb mu (fd -td -d1 'd[0-9]+$' /mnt)
    Relative disk dependence:
    /mnt/d1: ###### 8.1%
    /mnt/d2: ######### 12.2%
    /mnt/d3: ######### 12.2%
    /mnt/d4: ####### 9.5%
    /mnt/d5: ####### 9.5%
    /mnt/d6: ######### 12.2%
    /mnt/d7: ######### 12.2%
    /mnt/d8: ######### 12.2%
    /mnt/d9: ######### 12.2%

    Relative free space:
    /mnt/d1: ##### 6.9%
    /mnt/d2: ########### 13.8%
    /mnt/d3: ######## 10.4%
    /mnt/d4: ######## 10.5%
    /mnt/d5: ###### 8.7%
    /mnt/d6: ######### 11.8%
    /mnt/d7: ######### 11.9%
    /mnt/d8: ######### 12.2%
    /mnt/d9: ########### 13.8%
big-dirs
Show large folders
$ library big-dirs -h
usage: library big-dirs PATH ... [--limit (4000)] [--depth (0)] [--sort-groups-by deleted | played]

See what folders take up space

    library big-dirs ./video/

Filter folders by size

    library big-dirs ./video/ -FS+10GB -FS-200GB

Filter folders by count

    library big-dirs ./video/ -FC+300 -FC-5000

Filter folders by depth

    library big-dirs ./video/ --depth 5
    library big-dirs ./video/ -D 7

Load from fs database

    library fs video.db --cols path,duration,size,time_deleted --to-json | library big-dirs --from-json

    Only include files between 1MiB and 5MiB
    library fs video.db -S+1M -S-5M --cols path,duration,size,time_deleted --to-json | library big-dirs --from-json

You can even sort by auto-MCDA ~LOL~

library big-dirs ./video/ -u 'mcda median_size,-deleted'
similar-folders
Find similar folders based on folder name, size, and count
$ library similar-folders -h
usage: library similar-folders PATH ...

Find similar folders based on foldernames, similar size, and similar number of files

    library similar-folders ~/d/

    group /home/xk/d/dump/datasets/*vector          total_size    median_size      files
    ----------------------------------------------  ------------  -------------  -------
    /home/xk/d/dump/datasets/vector/output/         1.8 GiB       89.5 KiB          1980
    /home/xk/d/dump/datasets/vector/output2/        1.8 GiB       89.5 KiB          1979

Find similar folders based on ONLY foldernames, using the full path

    library similar-folders --filter-names --full-path ~/d/

Find similar folders based on ONLY number of files

    library similar-folders --filter-counts ~/d/

Find similar folders based on ONLY median size

    library similar-folders --filter-sizes ~/d/

Find similar folders based on ONLY total size

    library similar-folders --filter-sizes --total-size ~/d/

Read paths from dbs

    library fs audio.db --cols path,duration,size,time_deleted --to-json | library similar-folders --from-json -v

Print only paths

    library similar-folders ~/d/ -pf
    /home/xk/d/dump/datasets/vector/output/
    /home/xk/d/dump/datasets/vector/output2/

How I use it
    library fs video.db --cols path,duration,size,time_deleted --to-json | library similar-folders --from-json -FS=+8G --filter-names --filter-counts --filter-durations

File subcommands

christen
Clean file paths
$ library christen -h
usage: library christen [--run]

Rename files to be somewhat normalized

Default mode is simulate

    library christen ~/messy/

To actually do stuff use the run flag

    library christen . --run

You can optionally replace all the spaces in your filenames with dots

    library christen --dot-space
sample-hash
Calculate a hash based on small file segments
$ library sample-hash -h
usage: library sample-hash [--same-file-threads 1] [--chunk-size BYTES] [--gap BYTES OR 0.0-1.0*FILESIZE] PATH ...

Calculate hashes for large files by reading only small segments of each file

    library sample-hash ./my_file.mkv

The threads flag seems to be faster for rotational media but slower on SSDs
sample-compare
Compare files using sample-hash and other shortcuts
$ library sample-compare -h
usage: library sample-compare [--same-file-threads 1] [--chunk-size BYTES] [--gap BYTES OR 0.0-1.0*FILESIZE] PATH ...

Convenience subcommand to compare multiple files using sample-hash
similar-files
Find similar files based on filename and size
$ library similar-files -h
usage: library similar-files PATH ...

Find similar files using filenames and size

    library similar-files ~/d/

Find similar files based on ONLY foldernames, using the full path

    library similar-files --filter-names --full-path ~/d/

Find similar files based on ONLY size

    library similar-files --filter-sizes ~/d/

Read paths from dbs

    library fs audio.db --cols path,duration,size,time_deleted --to-json | library similar-files --from-json -v

How I use it
    library similar-files --filter-names --filter-durations --estimated-duplicates 3 .
llm-map
Run LLMs across multiple files
$ library llm-map -h
usage: library llm-map LLAMA_FILE [paths ...] [--llama-args LLAMA_ARGS] [--prompt STR] [--text [INT]] [--rename]

Run a llamafile with a prompt including path names and file contents

Rename files based on file contents

    library llm-map ./gemma2.llamafile ~/Downloads/booka.pdf --rename --text

    cat llm_map_renames.csv
    Path,Output
    /home/xk/Downloads/booka.pdf,/home/xk/Downloads/Mining_Massive_Datasets.pdf

Using GGUF files

    wget https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.9/llamafile-0.8.9
    chmod +x ~/Downloads/llamafile-0.8.9
    mv ~/Downloads/llamafile-0.8.9 ~/.local/bin/llamafile  # move it somewhere in your $PATH

    library llm-map --model ~/Downloads/llava-v1.5-7b-Q4_K.gguf --image-model ~/Downloads/llava-v1.5-7b-mmproj-Q4_0.gguf --prompt 'what do you see?' ~/Downloads/comp_*.jpg

Tabular data subcommands

eda
Exploratory Data Analysis on table-like files
$ library eda -h
usage: library eda PATH ... [--table STR] [--end-row INT] [--repl]

Perform Exploratory Data Analysis (EDA) on one or more files

Only 500,000 rows per file are loaded for performance purposes. Set `--end-row inf` to read all the rows and/or run out of RAM.
mcda
Multi-criteria Ranking for Decision Support
$ library mcda -h
usage: library mcda PATH ... [--table STR] [--end-row INT]

Perform Multiple Criteria Decision Analysis (MCDA) on one or more files

Only 500,000 rows per file are loaded for performance purposes. Set `--end-row inf` to read all the rows and/or run out of RAM.

library mcda ~/storage.csv --minimize price --ignore warranty

    ### Goals
    #### Maximize
    - size
    #### Minimize
    - price

    |    |   price |   size |   warranty |   TOPSIS |      MABAC |   SPOTIS |   BORDA |
    |----|---------|--------|------------|----------|------------|----------|---------|
    |  0 |     359 |     36 |          5 | 0.769153 |  0.348907  | 0.230847 | 7.65109 |
    |  1 |     453 |     40 |          2 | 0.419921 |  0.0124531 | 0.567301 | 8.00032 |
    |  2 |     519 |     44 |          2 | 0.230847 | -0.189399  | 0.769153 | 8.1894  |

It also works with HTTP/GCS/S3 URLs

library mcda https://en.wikipedia.org/wiki/List_of_Academy_Award-winning_films --clean --minimize Year

    ### Goals

    #### Maximize

    - Nominations
    - Awards

    #### Minimize

    - Year

    |      | Film                                                                    |   Year |   Awards |   Nominations |      TOPSIS |    MABAC |      SPOTIS |   BORDA |
    |------|-------------------------------------------------------------------------|--------|----------|---------------|-------------|----------|-------------|---------|
    |  378 | Titanic                                                                 |   1997 |       11 |            14 | 0.999993    | 1.38014  | 4.85378e-06 | 4116.62 |
    |  868 | Ben-Hur                                                                 |   1959 |       11 |            12 | 0.902148    | 1.30871  | 0.0714303   | 4116.72 |
    |  296 | The Lord of the Rings: The Return of the King                           |   2003 |       11 |            11 | 0.8558      | 1.27299  | 0.107147    | 4116.76 |
    | 1341 | West Side Story                                                         |   1961 |       10 |            11 | 0.837716    | 1.22754  | 0.152599    | 4116.78 |
    |  389 | The English Patient                                                     |   1996 |        9 |            12 | 0.836725    | 1.2178   | 0.162341    | 4116.78 |
    | 1007 | Gone with the Wind                                                      |   1939 |        8 |            13 | 0.807086    | 1.20806  | 0.172078    | 4116.81 |
    |  990 | From Here to Eternity                                                   |   1953 |        8 |            13 | 0.807086    | 1.20806  | 0.172079    | 4116.81 |
    | 1167 | On the Waterfront                                                       |   1954 |        8 |            12 | 0.785       | 1.17235  | 0.207793    | 4116.83 |
    | 1145 | My Fair Lady                                                            |   1964 |        8 |            12 | 0.785       | 1.17235  | 0.207793    | 4116.83 |
    |  591 | Gandhi                                                                  |   1982 |        8 |            11 | 0.755312    | 1.13663  | 0.243509    | 4116.86 |
markdown-tables
Print markdown tables from table-like files
$ library markdown-tables -h
usage: library markdown-tables PATH ... [--table STR] [--end-row INT]

Print tables from files as markdown

Only 500,000 rows per file are loaded for performance purposes. Set `--end-row inf` to read all the rows and/or run out of RAM.
columns
Print columns of table-like files
$ library columns -h
usage: library columns PATH ... [--table STR] [--start-row INT]

Print columns from table-like files

Only print column names

    library columns https://en.wikipedia.org/wiki/List_of_Academy_Award-winning_films --cols name --table-index 0
    Film
    Year
    Awards
    Nominations
incremental-diff
Diff large table-like files in chunks
$ library incremental-diff -h
usage: library incremental-diff PATH1 PATH2 [--join-keys JOIN_KEYS] [--table1 TABLE1] [--table2 TABLE2] [--table1-index TABLE1_INDEX] [--table2-index TABLE2_INDEX] [--start-row START_ROW] [--batch-size BATCH_SIZE]

See data differences in an incremental way to quickly see how two different files differ.

Data (PATH1, PATH2) can be two different files of different file formats (CSV, Excel) or it could even be the same file with different tables.

If files are unsorted you may need to use `--join-keys id,name` to specify ID columns. Rows that have the same ID will then be compared.
If you are comparing SQLITE files you may be able to use `--sort id,name` to achieve the same effect.

To diff everything at once run with `--batch-size inf`

Media File subcommands

media-check
Check video and audio files for corruption via ffmpeg
$ library media-check -h
usage: library media-check [--chunk-size SECONDS] [--gap SECONDS OR 0.0-1.0*DURATION] [--delete-corrupt >0-100] [--full-scan] [--audio-scan] PATH ...

Defaults to decode 0.5 second per 10% of each file

    library media-check ./video.mp4

Decode all the frames of each file to evaluate how corrupt it is
(scantime is very slow; about 150 seconds for an hour-long file)

    library media-check --full-scan ./video.mp4

Decode all the packets of each file to evaluate how corrupt it is
(scantime is about one second of each file but only accurate for formats where 1 packet == 1 frame)

    library media-check --full-scan --gap 0 ./video.mp4

Decode all audio of each file to evaluate how corrupt it is
(scantime is about four seconds per file)

    library media-check --full-scan --audio ./video.mp4

Decode at least one frame at the start and end of each file to evaluate how corrupt it is
(scantime is about one second per file)

    library media-check --chunk-size 5% --gap 99.9% ./video.mp4

Decode 3s every 5% of a file to evaluate how corrupt it is
(scantime is about three seconds per file)

    library media-check --chunk-size 3 --gap 5% ./video.mp4

Delete the file if 20 percent or more of checks fail

    library media-check --delete-corrupt 20% ./video.mp4

To scan a large folder use `fsadd`. I recommend something like this two-stage approach

    library fsadd --delete-unplayable --check-corrupt --chunk-size 5% tmp.db ./video/ ./folders/
    library media-check (library fs tmp.db -w 'corruption>15' -pf) --full-scan --delete-corrupt 25%

The above can now be done in one command via `--full-scan-if-corrupt`

    library fsadd --delete-unplayable --check-corrupt --chunk-size 5% tmp.db ./video/ ./folders/ --full-scan-if-corrupt 15% --delete-corrupt 25%

Corruption stats

    library fs tmp.db -w 'corruption>15' -pa
    path         count  duration             avg_duration         size    avg_size
    ---------  -------  -------------------  --------------  ---------  ----------
    Aggregate      907  15 days and 9 hours  24 minutes      130.6 GiB   147.4 MiB

Corruption graph

    sqlite --raw-lines tmp.db 'select corruption from media' | lowcharts hist --min 10 --intervals 10

    Samples = 931; Min = 10.0; Max = 100.0
    Average = 39.1; Variance = 1053.103; STD = 32.452
    each ∎ represents a count of 6
    [ 10.0 ..  19.0] [561] ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
    [ 19.0 ..  28.0] [ 69] ∎∎∎∎∎∎∎∎∎∎∎
    [ 28.0 ..  37.0] [ 33] ∎∎∎∎∎
    [ 37.0 ..  46.0] [ 18] ∎∎∎
    [ 46.0 ..  55.0] [ 14] ∎∎
    [ 55.0 ..  64.0] [ 12] ∎∎
    [ 64.0 ..  73.0] [ 15] ∎∎
    [ 73.0 ..  82.0] [ 18] ∎∎∎
    [ 82.0 ..  91.0] [ 50] ∎∎∎∎∎∎∎∎
    [ 91.0 .. 100.0] [141] ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
process-ffmpeg
Shrink video/audio to AV1/Opus format (.mkv, .mka)
$ library process-ffmpeg -h
usage: library process-ffmpeg PATH ... [--always-split] [--split-longer-than DURATION] [--min-split-segment SECONDS] [--simulate]

Resize videos to max 1440x960px AV1 and/or Opus to save space

Convert audio to Opus. Optionally split up long tracks into multiple files.

    fd -tf -eDTS -eAAC -eWAV -eAIF -eAIFF -eFLAC -eAIFF -eM4A -eMP3 -eOGG -eMP4 -eWMA -j4 -x library process --audio

Use --always-split to _always_ split files if silence is detected

    library process-audio --always-split audiobook.m4a

Use --split-longer-than to _only_ detect silence for files in excess of a specific duration

    library process-audio --split-longer-than 36mins audiobook.m4b audiobook2.mp3
process-image
Shrink images by resizing and AV1 image format (.avif)
$ library process-image -h
usage: library process-image PATH ...

Resize images to max 2400x2400px and format AVIF to save space

Multi-database subcommands

merge-dbs
Merge SQLITE databases
$ library merge-dbs -h
usage: library merge-dbs SOURCE_DB ... DEST_DB [--only-target-columns] [--only-new-rows] [--upsert] [--pk PK ...] [--table TABLE ...]

Merge-DBs will insert new rows from source dbs to target db, table by table. If primary key(s) are provided,
and there is an existing row with the same PK, the default action is to delete the existing row and insert the new row
replacing all existing fields.

Upsert mode will update each matching PK row such that if a source row has a NULL field and
the destination row has a value then the value will be preserved instead of changed to the source row's NULL value.

Ignore mode (--only-new-rows) will insert only rows which don't already exist in the destination db

Test first by using temp databases as the destination db.
Try out different modes / flags until you are satisfied with the behavior of the program

    library merge-dbs --pk path tv.db movies.db (mktemp --suffix .db)

Merge database data and tables

    library merge-dbs --upsert --pk path tv.db movies.db video.db
    library merge-dbs --only-target-columns --only-new-rows --table media,playlists --pk path --skip-column id audio-fts.db audio.db

    library merge-dbs --pk id --only-tables subreddits audio.db reddit/81_New_Music.db
    library merge-dbs --only-new-rows --pk subreddit,path --only-tables reddit_posts audio.db reddit/81_New_Music.db -v

 To skip copying primary-keys from the source table(s) use --business-keys instead of --primary-keys

 Split DBs using --where

     library merge-dbs --pk path big.db specific-site.db -v --only-new-rows -t media,playlists -w 'path like "https://specific-site%"'
copy-play-counts
Copy play history
$ library copy-play-counts -h
usage: library copy-play-counts SOURCE_DB ... DEST_DB [--source-prefix x] [--target-prefix y]

Copy play count information between databases

    library copy-play-counts phone.db audio.db --source-prefix /storage/6E7B-7DCE/d --target-prefix /mnt/d

Filesystem Database subcommands

disk-usage
Show disk usage
$ library disk-usage -h
usage: library disk-usage DATABASE [--sort-groups-by size | count] [--depth DEPTH] [PATH / SUBSTRING SEARCH]

Only include files smaller than 1kib

    library disk-usage du.db --size=-1Ki
    library du du.db -S-1Ki
    | path                                  |      size |   count |
    |---------------------------------------|-----------|---------|
    | /home/xk/github/xk/lb/__pycache__/    | 620 Bytes |       1 |
    | /home/xk/github/xk/lb/.github/        |    1.7 kB |       4 |
    | /home/xk/github/xk/lb/__pypackages__/ |    1.4 MB |    3519 |
    | /home/xk/github/xk/lb/xklb/           |    4.4 kB |      12 |
    | /home/xk/github/xk/lb/tests/          |    3.2 kB |       9 |
    | /home/xk/github/xk/lb/.git/           |  782.4 kB |    2276 |
    | /home/xk/github/xk/lb/.pytest_cache/  |    1.5 kB |       5 |
    | /home/xk/github/xk/lb/.ruff_cache/    |   19.5 kB |     100 |
    | /home/xk/github/xk/lb/.gitattributes  | 119 Bytes |         |
    | /home/xk/github/xk/lb/.mypy_cache/    | 280 Bytes |       4 |
    | /home/xk/github/xk/lb/.pdm-python     |  15 Bytes |         |

Only include files with a specific depth

    library disk-usage du.db --depth 19
    library du du.db -d 19
    | path                                                                                                                                                                |     size |
    |---------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
    | /home/xk/github/xk/lb/__pypackages__/3.11/lib/jedi/third_party/typeshed/third_party/2and3/requests/packages/urllib3/packages/ssl_match_hostname/__init__.pyi        | 88 Bytes |
    | /home/xk/github/xk/lb/__pypackages__/3.11/lib/jedi/third_party/typeshed/third_party/2and3/requests/packages/urllib3/packages/ssl_match_hostname/_implementation.pyi | 81 Bytes |
search-db
Search a SQLITE database
$ library search-db -h
usage: library search-db DATABASE TABLE SEARCH ... [--delete-rows]

Search all columns in a SQLITE table. If the table does not exist, uses the table which startswith (if only one match)

Media Database subcommands

block
Block a channel
$ library block -h
usage: library block DATABASE URL ...

Blocklist specific URLs (eg. YouTube channels, etc)

    library block dl.db https://annoyingwebsite/etc/

Or URL substrings

    library block dl.db "%fastcompany.com%"

Block videos from the playlist uploader

    library block dl.db --match-column playlist_path 'https://youtube.com/playlist?list=PLVoczRgDnXDLWV1UJ_tO70VT_ON0tuEdm'

Or other columns

    library block dl.db --match-column title "% bitcoin%"
    library block dl.db --force --match-column uploader Zeducation

Display subdomains (similar to `library download-status`)

    library block audio.db
    subdomain              count    new_links    tried  percent_tried      successful  percent_successful      failed  percent_failed
    -------------------  -------  -----------  -------  ---------------  ------------  --------------------  --------  ----------------
    dts.podtrac.com         5244          602     4642  88.52%                    690  14.86%                    3952  85.14%
    soundcloud.com         16948        11931     5017  29.60%                    920  18.34%                    4097  81.66%
    twitter.com              945          841      104  11.01%                      5  4.81%                       99  95.19%
    v.redd.it               9530         6805     2725  28.59%                    225  8.26%                     2500  91.74%
    vimeo.com                865          795       70  8.09%                      65  92.86%                       5  7.14%
    www.youtube.com       210435       140952    69483  33.02%                  66017  95.01%                    3467  4.99%
    youtu.be               60061        51911     8150  13.57%                   7736  94.92%                     414  5.08%
    youtube.com             5976         5337      639  10.69%                    599  93.74%                      40  6.26%

Find some words to block based on frequency / recency of downloaded media

    library watch dl.db -u time_downloaded desc -L 10000 -pf | library nouns | sort | uniq -c | sort -g
    ...
    183 ArchiveOrg
    187 Documentary
    237 PBS
    243 BBC
    ...
playlists
List stored playlists
$ library playlists -h
usage: library playlists DATABASE

List of Playlists

    library playlists

Search playlists

    library playlists audio.db badfinger
    path                                                        extractor_key    title                             count
    ----------------------------------------------------------  ---------------  ------------------------------  -------
    https://music.youtube.com/channel/UCyJzUJ95hXeBVfO8zOA0GZQ  ydl_Youtube      Uploads from Badfinger - Topic      226

Aggregate Report of Videos in each Playlist

    library playlists -p a


Print only playlist urls

    Useful for piping to other utilities like xargs or GNU Parallel.
    library playlists -p f
    https://www.youtube.com/playlist?list=PL7gXS9DcOm5-O0Fc1z79M72BsrHByda3n

Remove a playlist/channel and all linked videos

    library playlists --delete-rows https://vimeo.com/canal180
download
Download media
$ library download -h
usage: library download DATABASE [--prefix /mnt/d/] --video [--subs] [--auto-subs] [--small] | --audio | --photos [--safe]

Files will be saved to <prefix>/<extractor>/. The default prefix is the current working directory.

By default things will download in a random order

    library download dl.db --prefix ~/output/path/root/

But you can sort; eg. oldest first

    library download dl.db -u m.time_modified,m.time_created

Limit downloads to a specified playlist URLs

    library fs video.db --to-json --playlists https://www.youtube.com/c/BlenderFoundation/videos | library download --video video.db --from-json -

Limit downloads to a specified video URLs or substring

    library download dl.db --include https://www.youtube.com/watch?v=YE7VzlLtp-4
    library download dl.db -s https://www.youtube.com/watch?v=YE7VzlLtp-4  # equivalent

Maximizing the variety of subdomains

    library download photos.db --photos --image --sort "ROW_NUMBER() OVER ( PARTITION BY SUBSTR(m.path, INSTR(m.path, '//') + 2, INSTR( SUBSTR(m.path, INSTR(m.path, '//') + 2), '/') - 1) )"

Print list of queued up downloads

    library download --print

Print list of saved playlists

    library playlists dl.db -p a

Print download queue groups

    library download-status audio.db

Check videos before downloading

    library watch open_dir.db --online-media-only --loop --exit-code-confirm -i --action ask-keep -m 4  --start 35% --volume=0 -w 'height<720' -E preview

    Assuming you have bound in mpv input.conf a key to 'quit' and another key to 'quit 4',
    using the ask-keep action will mark a video as deleted when you 'quit 4' and it will mark a video as watched when you 'quit'.

    For example, here I bind "'" to "KEEP" and  "j" to "DELETE"

        ' quit
        j quit 4

    This is pretty intuitive after you use it a few times but another option is to
    define your own post-actions

        `--cmd5 'echo {} >> keep.txt' --cmd6 'echo {} >> rejected.txt'`

    But you will still bind keys in mpv input.conf

        k quit 5  # goes to keep.txt
        r quit 6  # goes to rejected.txt

Download checked videos

    library download --fs open_dir.db --prefix ~/d/dump/video/ -w 'id in (select media_id from history)'
download-status
Show download status
$ library download-status -h
usage: library download-status DATABASE

Print download queue groups

    library download-status video.db

Simulate --safe flag

    library download-status video.db --safe
redownload
Re-download deleted/lost media
$ library redownload -h
usage: library redownload DATABASE

If you have previously downloaded YouTube or other online media, but your
hard drive failed or you accidentally deleted something, and if that media
is still accessible from the same URL, this script can help to redownload
everything that was scanned-as-deleted between two timestamps.

List deletions

    library redownload news.db
    Deletions:
    ╒═════════════════════╤═════════╕
    │ time_deleted        │   count │
    ╞═════════════════════╪═════════╡
    │ 2023-01-26T00:31:26 │     120 │
    ├─────────────────────┼─────────┤
    │ 2023-01-26T19:54:42 │      18 │
    ├─────────────────────┼─────────┤
    │ 2023-01-26T20:45:24 │      26 │
    ╘═════════════════════╧═════════╛
    Showing most recent 3 deletions. Use -l to change this limit

Mark videos as candidates for download via specific deletion timestamp

    library redownload city.db 2023-01-26T19:54:42

...or between two timestamps inclusive

    library redownload city.db 2023-01-26T19:54:42 2023-01-26T20:45:24
history
Show and manage playback history
$ library history -h
usage: library history [--frequency daily weekly (monthly) yearly] [--limit LIMIT] DATABASE [(all) watching watched created modified deleted]

View playback history

    library history web_add.image.db
    In progress:
    play_count  time_last_played    playhead    path                                     title
    ------------  ------------------  ----------  ---------------------------------------  -----------
            0  today, 20:48        2 seconds   https://siliconpr0n.org/map/COPYING.txt  COPYING.txt

Show only completed history

    library history web_add.image.db --completed

Show only completed history

    library history web_add.image.db --in-progress

Delete history

    Delete two hours of history
    library history web_add.image.db --played-within '2 hours' -L inf --delete-rows

    Delete all history
    library history web_add.image.db -L inf --delete-rows

See also: library stats -h
          library history-add -h
history-add
Add history from paths
$ library history-add -h
usage: library history-add DATABASE PATH ...

Add history

    library history-add links.db $urls $paths
    library history-add links.db (cb)

Items that don't already exist in the database will be counted under "skipped"
stats
Show some event statistics (created, deleted, watched, etc)
$ library stats -h
usage: library stats DATABASE TIME_COLUMN

View watched stats

    library stats video.db --completed

View download stats

    library stats video.db time_downloaded --frequency daily

    See also: library stats video.db time_downloaded -f daily --hide-deleted

View deleted stats

    library stats video.db time_deleted

View time_modified stats

    library stats example_dbs/web_add.image.db time_modified -f year
    Time_Modified media:
    year      total_size    avg_size    count
    ------  ------------  ----------  -------
    2010         4.4 MiB     1.5 MiB        3
    2011       136.2 MiB    68.1 MiB        2
    2013         1.6 GiB    10.7 MiB      154
    2014         4.6 GiB    25.2 MiB      187
    2015         4.3 GiB    26.5 MiB      167
    2016         5.1 GiB    46.8 MiB      112
    2017         4.8 GiB    51.7 MiB       95
    2018         5.3 GiB    97.9 MiB       55
    2019         1.3 GiB    46.5 MiB       29
    2020        25.7 GiB   113.5 MiB      232
    2021        25.6 GiB    96.5 MiB      272
    2022        14.6 GiB    82.7 MiB      181
    2023        24.3 GiB    72.5 MiB      343
    2024        17.3 GiB   104.8 MiB      169
    14 media
search
Search captions / subtitles
$ library search -h
usage: library search DATABASE QUERY

Search text databases and subtitles

    library search fts.db boil
        7 captions
        /mnt/d/70_Now_Watching/DidubeTheLastStop-720p.mp4
           33:46 I brought a real stainless steel boiler
           33:59 The world is using only stainless boilers nowadays
           34:02 The boiler is old and authentic
           34:30 - This boiler? - Yes
           34:44 I am not forcing you to buy this boiler…
           34:52 Who will give her a one liter stainless steel boiler for one Lari?
           34:54 Glass boilers cost two

Search and open file

    library search fts.db 'two words' --open
optimize
Re-optimize database
$ library optimize -h
usage: library optimize DATABASE [--force]

Optimize library databases

The force flag is usually unnecessary and it can take much longer

Playback subcommands

watch
Watch / Listen
$ library watch -h
usage: library watch DATABASE [optional args]

Control playback

    To stop playback press Ctrl-C in either the terminal or mpv

    Or use `lb next` or `lb stop`

    Or create global shortcuts in your desktop environment by sending commands to mpv_socket
    echo 'playlist-next force' | socat - /run/user/1000/mpv_socket  # library listen default
    echo 'playlist-next force' | socat - /home/xk/.config/mpv/socket  # library watch default

    If you prefer you can also send mpv the playlist, but this is incompatible with post-actions
    mpv --playlist=(lb wt videos.db --ext mp4 -l 50 -p fw | psub)  # fish shell, mark 50 videos as watched
    mpv --playlist=<(lb wt videos.db --ext mp4 -p f)  # BASH, all videos

Print an aggregate report of deleted media

    library fs -w time_deleted!=0 -pa
    path         count  duration               size
    ---------  -------  ------------------  -------
    Aggregate      337  2 days and 5 hours  1.6 GiB

Print an aggregate report of media that has no duration information (ie.
online media or corrupt local media)

    library watch -w 'duration is null' -pa

Print a list of filenames which have below 1280px resolution

    library watch -w 'width<1280' -pf

View how much time you have played

    library watch -w play_count'>'0 -pa

View all the columns

    library watch -p -L 1 --cols '*'

Open ipython with all of your media

    library watch -vv -p --cols '*'
    ipdb> len(media)
    462219

View most recent files

    library watch example_dbs/web_add.image.db -u time_modified desc --cols path,width,height,size,time_modified -p -l 10
    path                                                                                                                      width    height       size  time_modified
    ----------------------------------------------------------------------------------------------------------------------  -------  --------  ---------  -----------------
    https://siliconpr0n.org/map/infineon/m7690-b1/single/infineon_m7690-b1_infosecdj_mz_nikon20x.jpg                           7066     10513   16.4 MiB  2 days ago, 20:54
    https://siliconpr0n.org/map/starchip/scf384g/single/starchip_scf384g_infosecdj_mz_nikon20x.jpg                            10804     10730   19.2 MiB  2 days ago, 15:31
    https://siliconpr0n.org/map/hp/2hpt20065-1-68k-core/single/hp_2hpt20065-1-68k-core_marmontel_mz_ms50x-1.25.jpg            28966     26816  192.2 MiB  4 days ago, 15:05
    https://siliconpr0n.org/map/hp/2hpt20065-1-68k-core/single/hp_2hpt20065-1-68k-core_marmontel_mz_ms20x-1.25.jpg            11840     10978   49.2 MiB  4 days ago, 15:04
    https://siliconpr0n.org/map/hp/2hpt20065-1/single/hp_2hpt20065-1_marmontel_mz_ms10x-1.25.jpg                              16457     14255  101.4 MiB  4 days ago, 15:03
    https://siliconpr0n.org/map/pervasive/e2213ps01e1/single/pervasive_e2213ps01e1_azonenberg_back_roi1_mit10x_rotated.jpg    18880     61836  136.8 MiB  6 days ago, 16:00
    https://siliconpr0n.org/map/pervasive/e2213ps01e/single/pervasive_e2213ps01e_azonenberg_back_mit5x_rotated.jpg            62208     30736  216.5 MiB  6 days ago, 15:57
    https://siliconpr0n.org/map/amd/am2964bpc/single/amd_am2964bpc_infosecdj_mz_lmplan10x.jpg                                 12809     11727   39.8 MiB  6 days ago, 10:28
    https://siliconpr0n.org/map/unknown/ks1804ir1/single/unknown_ks1804ir1_infosecdj_mz_lmplan10x.jpg                          6508      6707    8.4 MiB  6 days ago, 08:04
    https://siliconpr0n.org/map/amd/am2960dc-b/single/amd_am2960dc-b_infosecdj_mz_lmplan10x.jpg                               16434     15035   64.9 MiB  7 days ago, 19:01
    10 media (limited by --limit 10)

How I use it

    lb lt ~/lb/audio.db --local-media-only -k delete-if-audiobook -w play_count=0 --fetch-siblings each
    lb wt ~/lb/video.db --local-media-only -k delete --cmd5 'echo skip'

    When sorting videos
    focus_under_mouse
    lb wt ~/lb/sort.db --action ask_move_or_delete --keep-dir /home/xk/d/library/video/ --loop --exit-code-confirm -i --cmd130 exit_multiple_playback --cmd5 'library process-audio --no-preserve-video' --cmd6 'mv {} /mnt/d/library/vr/' -m 4 --start 35% --volume=0 -u size desc
    focus_follows_mouse

    On-the-go mobile smartphone mode (Android)
    repeat lb wt ~/lb/video.db --player termux-open -L1 --refresh --action ask_move_or_delete --keep-dir ~/sync/video/keep/ --portrait -u duration desc
now
Show what is currently playing
$ library now -h
usage: library now

Print now playing
next
Play next file and optionally delete current file
$ library next -h
usage: library next

Go to the next track in the playqueue, optionally delete the currently playing media
seek
Set playback to a certain time, fast-forward or rewind
$ library seek -h
usage: library seek

Seek to an exact time

    library seek 5:30     # 5 minutes, 30 seconds
    library seek 5:30:00  # 5 hours, 30 minutes

Seek forward or backward a relative duration

    library seek +5:00    # 5 minutes forward
    library seek +5:      # 5 minutes forward
    library seek +5       # 5 seconds forward
    library seek 5        # 5 seconds forward

    library seek -5       # 5 seconds backward
stop
Stop all playback
$ library stop -h
usage: library stop

Stop playback (close mpv, turn off chromecast, etc)
pause
Pause all playback
$ library pause -h
usage: library pause

Pause playback (pause mpv, pause chromecast, etc)
tabs-open
Open your tabs for the day
$ library tabs-open -h
usage: library tabs-open DATABASE

Tabs is meant to run **once per day**. Here is how you would configure it with `crontab`

    45 9 * * * DISPLAY=:0 library tabs /home/my/tabs.db

If things aren't working you can use `at` to simulate a similar environment as `cron`

    echo 'fish -c "export DISPLAY=:0 && library tabs /full/path/to/tabs.db"' | at NOW

Also, if you're just testing things out be aware that `tabs-add` assumes that you visited the
website right before adding it; eg. if you use `tabs-add --frequency yearly` today the tab won't
open until one year from now (at most). You can override this default

    library tabs-add --allow-immediate ...

To re-"play" some tabs, delete some history

    library history ~/lb/tabs.db --played-within '1 day' -L inf -p --delete-rows
    library tabs ~/lb/tabs.db

You can also invoke tabs manually

    library tabs -L 1  # open one tab

Print URLs

    library tabs -w "frequency='yearly'" -p

View how many yearly tabs you have

    library tabs -w "frequency='yearly'" -p a

Delete URLs

    library tabs -p -s cyber
    ╒═══════════════════════════════════════╤═════════════╤══════════════╕
    │ path                                  │ frequency   │ time_valid   │
    ╞═══════════════════════════════════════╪═════════════╪══════════════╡
    │ https://old.reddit.com/r/cyberDeck/to │ yearly      │ Dec 31 1970  │
    │ p/?sort=top&t=year                    │             │              │
    ├───────────────────────────────────────┼─────────────┼──────────────┤
    │ https://old.reddit.com/r/Cyberpunk/to │ yearly      │ Aug 29 2023  │
    │ p/?sort=top&t=year                    │             │              │
    ├───────────────────────────────────────┼─────────────┼──────────────┤
    │ https://www.reddit.com/r/cyberDeck/   │ yearly      │ Sep 05 2023  │
    ╘═══════════════════════════════════════╧═════════════╧══════════════╛

    library tabs -p -w "path='https://www.reddit.com/r/cyberDeck/'" --delete-rows
    Removed 1 metadata records

    library tabs -p -s cyber
    ╒═══════════════════════════════════════╤═════════════╤══════════════╕
    │ path                                  │ frequency   │ time_valid   │
    ╞═══════════════════════════════════════╪═════════════╪══════════════╡
    │ https://old.reddit.com/r/cyberDeck/to │ yearly      │ Dec 31 1970  │
    │ p/?sort=top&t=year                    │             │              │
    ├───────────────────────────────────────┼─────────────┼──────────────┤
    │ https://old.reddit.com/r/Cyberpunk/to │ yearly      │ Aug 29 2023  │
    │ p/?sort=top&t=year                    │             │              │
    ╘═══════════════════════════════════════╧═════════════╧══════════════╛
links-open
Open links from link dbs
$ library links-open -h
usage: library links-open DATABASE [search] [--title] [--title-prefix TITLE_PREFIX]

Open links from a links db

    wget https://github.com/chapmanjacobd/library/raw/main/example_dbs/music.korea.ln.db
    library open-links music.korea.ln.db

Only open links once

    library open-links ln.db -w 'time_modified=0'

Print a preview instead of opening tabs

    library open-links ln.db -p
    library open-links ln.db --cols time_modified -p

Delete rows

    Make sure you have the right search query
    library open-links ln.db "query" -p -L inf
    library open-links ln.db "query" -pa  # view total

    library open-links ln.db "query" -pd  # mark as deleted

Custom search engine

    library open-links ln.db --title --prefix 'https://duckduckgo.com/?q='

Skip local media

    library open-links dl.db --online
    library open-links dl.db -w 'path like "http%"'  # equivalent
surf
Auto-load browser tabs in a streaming way (stdin)
$ library surf -h
usage: library surf [--count COUNT] [--target-hosts TARGET_HOSTS] < stdin

Streaming tab loader: press ctrl+c to stop.

Open tabs from a line-delimited file

    cat tabs.txt | library surf -n 5

You will likely want to use this setting in `about:config`

    browser.tabs.loadDivertedInBackground = True

If you prefer GUI, check out https://unli.xyz/tabsender/

Database enrichment subcommands

dedupe-db
Dedupe SQLITE tables
$ library dedupe-db -h
usage: library dedupe-dbs DATABASE TABLE --bk BUSINESS_KEYS [--pk PRIMARY_KEYS] [--only-columns COLUMNS]

Dedupe your database (not to be confused with the dedupe subcommand)

It should not need to be said but *backup* your database before trying this tool!

Dedupe-DB will help remove duplicate rows based on non-primary-key business keys

    library dedupe-db ./video.db media --bk path

By default all non-primary and non-business key columns will be upserted unless --only-columns is provided
If --primary-keys is not provided table metadata primary keys will be used
If your duplicate rows contain exactly the same data in all the columns you can run with --skip-upsert to save a lot of time
dedupe-media
Dedupe similar media
$ library dedupe-media -h
usage: library dedupe-media [--audio | --id | --title | --filesystem] [--only-soft-delete] [--limit LIMIT] DATABASE

Dedupe your files (not to be confused with the dedupe-db subcommand)

Exact file matches

    library dedupe-media --fs video.db

Dedupe based on duration and file basename or dirname similarity

    library dedupe-media video.db --duration --basename -s release_group  # pre-filter with a specific text substring
    library dedupe-media video.db --duration --basename -u m1.size  # sort such that small files are treated as originals and larger files are deleted
    library dedupe-media video.db --duration --basename -u 'm1.size desc'  # sort such that large files are treated as originals and smaller files are deleted

Dedupe online against local media

    library dedupe-media --compare-dirs video.db / http
merge-online-local
Merge online and local data
$ library merge-online-local -h
usage: library merge-online-local DATABASE

If you have previously downloaded YouTube or other online media, you can dedupe
your database and combine the online and local media records as long as your
files have the youtube-dl / yt-dlp id in the filename.
mpv-watchlater
Import mpv watchlater files to history
$ library mpv-watchlater -h
usage: library mpv-watchlater DATABASE [--watch-later-directory ~/.config/mpv/watch_later/]

Extract timestamps from MPV to the history table
reddit-selftext
Copy selftext links to media table
$ library reddit-selftext -h
usage: library reddit-selftext DATABASE

Extract URLs from reddit selftext from the reddit_posts table to the media table
tabs-shuffle
Randomize tabs.db a bit
$ library tabs-shuffle -h
usage: library tabs-shuffle DATABASE

Moves each tab to a random day-of-the-week by default

It may also be useful to shuffle monthly tabs, etc. You can accomplish this like so

    library tabs-shuffle tabs.db -d  31 -f monthly
    library tabs-shuffle tabs.db -d  90 -f quarterly
    library tabs-shuffle tabs.db -d 365 -f yearly
pushshift
Convert pushshift data to reddit.db format (stdin)
$ library pushshift -h
usage: library pushshift DATABASE < stdin

Download data (about 600GB jsonl.zst; 6TB uncompressed)

    wget -e robots=off -r -k -A zst https://files.pushshift.io/reddit/submissions/

Load data from files via unzstd

    unzstd --memory=2048MB --stdout RS_2005-07.zst | library pushshift pushshift.db

Or multiple (output is about 1.5TB SQLITE fts-searchable)

    for f in psaw/files.pushshift.io/reddit/submissions/*.zst
        echo "unzstd --memory=2048MB --stdout $f | library pushshift (basename $f).db"
        library optimize (basename $f).db
    end | parallel -j5

Update database subcommands

fs-update
Update local media
$ library fs-update -h
usage: library fs-update DATABASE

Update each path previously saved

    library fsupdate video.db
tube-update
Update online video media
$ library tube-update -h
usage: library tube-update [--audio | --video] DATABASE

Fetch the latest videos for every playlist saved in your database

    library tubeupdate educational.db

Fetch extra metadata

    By default tubeupdate will quickly add media.
    You can run with --extra to fetch more details: (best resolution width, height, subtitle tags, etc)

    library tubeupdate educational.db --extra https://www.youtube.com/channel/UCBsEUcR-ezAuxB2WlfeENvA/videos

Remove duplicate playlists

    library dedupe-db video.db playlists --bk extractor_playlist_id
web-update
Update open-directory media
$ library web-update -h
usage: library web-update DATABASE

Update saved open directories
gallery-update
Update online gallery media
$ library gallery-update -h
usage: library gallery-update DATABASE

Check previously saved gallery_dl URLs for new content
links-update
Update a link-scraping database
$ library links-update -h
usage: library links-update DATABASE

Fetch new links from each path previously saved

    library links-update links.db
reddit-update
Update reddit media
$ library reddit-update -h
usage: library reddit-update [--audio | --video] [--lookback N_DAYS] [--praw-site bot1] DATABASE

Fetch the latest posts for every subreddit/redditor saved in your database

    library redditupdate edu_subreddits.db

Misc subcommands

export-text
Export HTML files from SQLite databases
$ library export-text -h
usage: library export-text DATABASE

Generate HTML files from SQLite databases
dedupe-czkawka
Process czkawka diff output
$ library dedupe-czkawka -h
usage: library dedupe-czkawka [--volume VOLUME] [--auto-seek] [--ignore-errors] [--folder] [--folder-glob [FOLDER_GLOB]] [--replace] [--no-replace] [--override-trash OVERRIDE_TRASH] [--delete-files] [--gui]
           [--auto-select-min-ratio AUTO_SELECT_MIN_RATIO] [--all-keep] [--all-left] [--all-right] [--all-delete]
           czkawka_dupes_output_path

Choose which duplicate to keep by opening both side-by-side in mpv
Chicken mode
       ////////////////////////
      ////////////////////////|
     //////////////////////// |
    ////////////////////////| |
    |    _\/_   |   _\/_    | |
    |     )o(>  |  <)o(     | |
    |   _/ <\   |   /> \_   | |        just kidding :-)
    |  (_____)  |  (_____)  | |_
    | ~~~oOo~~~ | ~~~0oO~~~ |/__|
   _|====\_=====|=====_/====|_ ||
  |_|\_________ O _________/|_|||
   ||//////////|_|\\\\\\\\\\|| ||
   || ||       |\_\\        || ||
   ||/||        \\_\\       ||/||
   ||/||         \)_\)      ||/||
   || ||         \  O /     || ||
   ||             \  /      || LGB

               \________/======
               / ( || ) \\

You can expand all by running this in your browser console:

(() => { const readmeDiv = document.querySelector("article"); const detailsElements = readmeDiv.getElementsByTagName("details"); for (let i = 0; i < detailsElements.length; i++) { detailsElements[i].setAttribute("open", "true"); } })();

library's People

Contributors

chapmanjacobd avatar deldesir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

library's Issues

What is "xk"?

I just stumbled onto this repository today and think it's pretty cool!

Please excuse my ignorance, what does "xk" stand for in the repository description? As in, "xk media library".

I spent a few minutes searching but I think it might be a code name or old project name.

I wanted to see if I was missing something obvious and thought maybe this was part of a larger ecosystem of stuff, but plan to dive into trying this out nonetheless, as the "autotainment" idea resonates with me.

Thank you!

all: follow yt-dlp print arg syntax

Is your feature request related to a problem? Please describe.
The current -p and --cols flags are a little confusing. There are a few benefits but overall the situation could likely be cleanup up significantly and made more intuitive.

Describe the solution you'd like
It would be good to follow a convention like yt-dlp's output template: https://github.com/yt-dlp/yt-dlp#output-template
But careful design will be required to preserve many aspects of the xklb printer (aggregation, etc)

feat: galleryadd, galleryupdate

Is your feature request related to a problem? Please describe.
Integrate gallery-dl to the same level as yt-dlp

Describe the solution you'd like
xklb/galy_extract.py contains some existing work

Additional context
I don't plan on pursuing this at the moment

error: invalid choice: 'media-check'

Describe the bug
when running /home/manderso/.local/bin/library media-check --full-scan unraid-media/movies/american\ psycho/American\ Psycho\ \(2000\).mkv I receive lb: error: invalid choice: 'media-check' (choose from...

Expected behavior
the media-check plugin to be found and get started on it's task

To Reproduce
library media-check --full-scan <filename>

  • library --version
    2.2.135

Captions/Subtitle Search feature is broken

Describe the bug
Searching captions/subtitles no longer works
Error:
sqlite3.OperationalError: no such column: m.time_deleted

Expected behavior
Xklb should return the captions found and the titles of the video(s) related to them

To Reproduce

root@box:/usr/local/calibre-web-py3# lb search /library/calibre-web/xklb-metadata.db people
Traceback (most recent call last):
  File "/usr/local/bin/lb", line 8, in <module>
    sys.exit(library())
             ^^^^^^^^^
  File "/root/.local/share/pipx/venvs/xklb/lib/python3.12/site-packages/xklb/lb.py", line 304, in library
    return args.func()
           ^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/xklb/lib/python3.12/site-packages/xklb/lb.py", line 250, in import_func
    return getattr(module, function_name)()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/xklb/lib/python3.12/site-packages/xklb/mediadb/search.py", line 100, in search
    captions = list(args.db.query(query, bindings))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/xklb/lib/python3.12/site-packages/sqlite_utils/db.py", line 503, in query
    cursor = self.execute(sql, params or tuple())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/xklb/lib/python3.12/site-packages/sqlite_utils/db.py", line 521, in execute
    return self.conn.execute(sql, parameters)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: no such column: m.time_deleted
  • library --version 2.8.0.49

Too many values to unpack

Describe the bug
When attempting to use the lb dl command with the provided arguments and configuration, an error occurs during the parsing of the --extractor-config argument. The specific error message is: "argument --extractor-config/-extractor-config: Could not parse argument "['format=bestvideo[height<=720][vcodec=vp9]+bestaudio/best[height<=720][vcodec=vp9]']" as k1=1 k2=2 format too many values to unpack (expected 2)."

To Reproduce

  1. Execute the following command in the terminal:
    lb tubeadd test.db https://youtu.be/WcLlpWmEpQ8  --verbose && \
    lb dl test.db --prefix /home/user/Downloads --extractor-config "writethumbnail=True" \
         --extractor-config "format=bestvideo[height<=720][vcodec=vp9]+bestaudio/best[height<=720][vcodec=vp9]" --video https://youtu.be/WcLlpWmEpQ8 --verbose
    

Expected behavior
The lb dl command should execute without errors, utilizing the specified video URL and extractor configurations, and initiate the download process.

Screenshots
N/A

Desktop:

  • OS: Ubuntu 23.10
  • xklb version: 2.2.165

Additional context
It seems that the issue is related to the parsing of the --extractor-config argument, specifically with the provided configuration string "['format=bestvideo[height<=720][vcodec=vp9]+bestaudio/best[height<=720][vcodec=vp9]'." The error suggests that there is an unexpected number of values to unpack during the parsing process, though it's technically 2.

Here is the actual log:

library v2.2.165
['/usr/local/bin/lb', 'dl', '/var/tmp/test.db', '--video', 'https://youtu.be/WcLlpWmEpQ8', '--extractor-config', 'writethumbnail=True', '--extractor-config', 'format=bestvideo[height<=720][vcodec=vp9]+bestaudio/best[height<=720][vcodec=vp9]', '--verbose']
usage: library download [--prefix /home/user/Downloads] [--safe] [--subs] [--auto-subs] [--small] DATABASE --video | --audio | --photos

    Files will be saved to <lb download prefix>/<extractor>/. If prefix is not specified the current working directory will be used

    By default things will download in a random order

        library download dl.db --prefix ~/output/path/root/

    Limit downloads to a specified playlist URLs or substring

        library download dl.db https://www.youtube.com/c/BlenderFoundation/videos

    Maximizing the variety of subdomains

        library download photos.db --photos --image --sort "ROW_NUMBER() OVER ( PARTITION BY SUBSTR(m.path, INSTR(m.path, '//') + 2, INSTR( SUBSTR(m.path, INSTR(m.path, '//') + 2), '/') - 1) )"

    Print list of queued up downloads

        library download --print

    Print list of saved playlists

        library playlists dl.db -p a

    Print download queue groups

        library download-status audio.db
        ╒════════════╤══════════════════╤════════════════════╤══════════╕
        │ extractor_key     │ duration         │   never_downloaded │   errors │
        ╞════════════╪══════════════════╪════════════════════╪══════════╡
        │ Soundcloud │                  │                 10 │        0 │
        ├────────────┼──────────────────┼────────────────────┼──────────┤
        │ Youtube    │ 10 days, 4 hours │                  1 │     2555 │
        │            │ and 20 minutes   │                    │          │
        ├────────────┼──────────────────┼────────────────────┼──────────┤
        │ Youtube    │ 7.68 minutes     │                 99 │        1 │
        ╘════════════╧══════════════════╧════════════════════╧══════════╛
library download: error: argument --extractor-config/-extractor-config: Could not parse argument "['format=bestvideo[height<=720][vcodec=vp9]+bestaudio/best[height<=720][vcodec=vp9]']" as k1=1 k2=2 format too many values to unpack (expected 2)

du: textualize tui

Is your feature request related to a problem? Please describe.
I tried using textualize to create a TUI for the disk-usage subcommand but it was taking too much time and was not fun so I gave up.

Describe the solution you'd like

textual console
TEXTUAL=devtools python xklb/scripts/disk_usage.py du.db

Describe alternatives you've considered
Given the different runtime config flags (ie. --size, --depth, --include, --exclude), the value of the tool is diminished when squished into a TUI. So I will not pursue it further but PRs are welcome to make the existing code work or even to add more features to the TUI mode

fs feature: support subs/ subfolder

Is your feature request related to a problem? Please describe.
Some of my videos have a subs/ subfolder with external subtitles

Describe the solution you'd like
A few new functions to search for those common patterns and make the subtitle files known (part of the fs_extract:external_subtitles array) for extraction into the tags column

Additional context
Very low priority for me because fewer than 1 percent of folders that I have are structured like that

tube: yt-dlp arguments

Is your feature request related to a problem? Please describe.

Requiring the use of API subcommands is kinda weird so it would be good to allow using yt-dlp commands directly:

To distinguish from library arguments the double dash would need to be removed. Not sure how to handle single-dash arguments

import yt_dlp

def cli_to_api(*opts):
    default = yt_dlp.parse_options([]).ydl_opts
    diff = {k: v for k, v in yt_dlp.parse_options(opts).ydl_opts.items() if default[k] != v}
    diff['postprocessors'] = [pp for pp in diff['postprocessors'] if pp not in default['postprocessors']]
    return diff
    
from pprint import pprint

pprint(cli_to_api('--embed-metadata', '--embed-thumbnail'))  # Change according to your need

Output

{'outtmpl': {'pl_thumbnail': ''},
 'postprocessors': [{'add_chapters': True,
                     'add_infojson': 'if_exists',
                     'add_metadata': True,
                     'key': 'FFmpegMetadata'},
                    {'already_have_thumbnail': False, 'key': 'EmbedThumbnail'}],
 'writethumbnail': True}

Stolen from https://discord.com/channels/807245652072857610/1023190793491599391/1023190796826066974

Log progress while downloading

Is your feature request related to a problem? Please describe.
When using lb tubeadd and lb dl with --verbose option, I wished yt_dlp output weren't supressed. This makes it difficult to track the progress of video downloads.

Describe the solution you'd like
I would like the ability to see the progress of video downloads. This could be achieved by allowing an option to display the download progress logs during execution.

Describe alternatives you've considered
I have considered manually modifying tube_backend.py to enable yt_dlp output

   "default_opts": {
    ...
    "quiet": False,
    "noprogress": False,
    "skip_download": False,
    ...
    }

but a built-in option to control log visibility during execution would be more convenient and user-friendly.

Additional context
Having a real-time progress indicator for downloads would make it easier to monitor the status of ongoing downloads.

fs: split_by_silence without modifying files

Is your feature request related to a problem? Please describe.
Split by silence without needing to modify the source files

Describe the solution you'd like
Because path is the PK the solution should create a new, separate table called tracks or something like that with the start and end timestamps of each section. The playqueue would be lengthened by left join with the tracks table to make the multiple tracks accessible.

Describe alternatives you've considered

https://stackoverflow.com/questions/40896370/detecting-the-index-of-silence-from-a-given-audio-file-using-python

wt: non-mpv recently played

It might be nice to allow people to know if something was recently played. This won't be 100% overlapping with the existing mpv watch_log functionality because things are only marked watched when post_actions are ran while mpv saves timestamp data at a more configurable basis. Still, it might be useful for some:

SELECT * FROM media 
WHERE time_created > cast(STRFTIME('%s', datetime( time_played, 'unixepoch', '-1 month', '-3 hours' )) as int)  
AND play_count > 0

Error: 'Namespace' object has no attribute 'hash'

Describe the bug
When running the lb dl command, an AttributeError is encountered in the xklb/fs_extract.py module. The error message indicates that the 'Namespace' object has no attribute 'hash,' leading to the failure of the extraction process.

AttributeError: 'Namespace' object has no attribute 'hash'
> /home/dev/.local/pipx/venvs/xklb/lib/python3.11/site-packages/xklb/fs_extract.py(174)extract_metadata()
    172     }
    173 
--> 174     if mp_args.hash:
    175         # TODO: it would be better if this was saved to and checked against an external global file
    176         media["hash"] = sample_hash.sample_hash_file(path)

To Reproduce

  1. Execute the following command in the terminal:
    lb tubeadd test.db https://www.youtube.com/watch?v=sqoOzGMqCQU
    lb dl test.db --video https://www.youtube.com/watch?v=sqoOzGMqCQU -v
  2. Observe the AttributeError in the xklb/fs_extract.py module.

Expected behavior
The lb dl command should execute successfully without encountering an AttributeError. The 'hash' attribute should be properly handled in the fs_extract.py module, ensuring a smooth extraction process.

Desktop (please complete the following information):

  • OS: Ubuntu 24.04 (Noble Numbat)
  • xklb version: 2.3.004

Additional context
The error occurred within an ipdb (IPython Debugger) session.

Error: unrecognized argument when using reddit-add

Describe the bug
No matter what I try I'm getting an "error: unrecognized arguments: https://old.reddit.com/r/coolgithubprojects/" when using reddit-add. I have already created a test.db file for the database. I've left the type blank and also used text as the type with no luck.

Expected behavior
To download the selfpost from a subreddit and then using reddit-selftext extract the links contained in the selfpost.

To Reproduce
Command I'm running:
library redditadd /home/gary/libraryDB/test.db https://old.reddit.com/r/coolgithubprojects/

  • library --version
    2.8.010

im2txt but for video

What is the problem that is being solved with the new feature?

I would like to extract more metadata from videos, objective data that can be used to cluster similar videos together, preferably offline and output <1kb per row per column.

Enumerate an unordered list of alternatives that you've thought about

  • extract frames and run im2txt
  • foundation model
  • use subtitles or generate captions from audio

If applicable, state your a preferred solution

It would be nice if there was an existing C, Rust, or Python application or library that can do this already

time limit

  • split file. cut first x mins into new file leave remainder in existing filename, cast or play temp file

lb dl downloads more than it should

Suppose I add the following playlist https://www.youtube.com/playlist?list=PLqxP5EuGxPnfmg5P0_96bz9E__-XrLuFc by doing :

lb tubeadd test.db https://www.youtube.com/playlist?list=PLqxP5EuGxPnfmg5P0_96bz9E__-XrLuFc -vv

How can I download a single video using its relevant url from all those added to the database by lb tubeadd?

Normally, I'd expect the following command to do the trick:

lb dl test.db --video https://www.youtube.com/watch?v=MhlFR2wWqHA -vv

Instead of downloading only this video, lb dl downloads them all, which makes progress polling problematic and leads to a race condition.

pl: search playlists

Describe the bug

Right now lb pl -s will search the media table but that should not be the default. The default should be to search the playlists table. Perhaps an option could be added to search the media table in addition but that should not be the primary functionality.

The library playlists subcommand should provide interactivity when querying or managing playlists.

xklb 2.2.107 works but 2.2.110 fails — 'lb tubeadd <db> <url>' returns 'sqlite3.OperationalError: no such table: playlists'

Describe the bug

xklb 2.2.107 works but 2.2.110 fails — lb tubeadd <db> <url> returns sqlite3.OperationalError: no such table: playlists

To Reproduce

Example error below; Thanks @chapmanjacobd for taking a look!

lb tubeadd panama5.db https://youtu.be/6GI0zANA3S4
Importing playlist-less media https://youtu.be/6GI0zANA3S4
Traceback (most recent call last):
  File "/home/iiab-admin/.local/lib/python3.10/site-packages/xklb/playlists.py", line 136, in decrease_update_delay
    args.db.conn.execute(
sqlite3.OperationalError: no such table: playlists

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/iiab-admin/.local/bin/lb", line 8, in <module>
    sys.exit(library())
  File "/home/iiab-admin/.local/lib/python3.10/site-packages/xklb/lb.py", line 311, in library
    args.func()
  File "/home/iiab-admin/.local/lib/python3.10/site-packages/xklb/tube_extract.py", line 109, in tube_add
    tube_backend.get_playlist_metadata(args, path, tube_backend.tube_opts(args))
  File "/home/iiab-admin/.local/lib/python3.10/site-packages/xklb/tube_backend.py", line 174, in get_playlist_metadata
    playlists.decrease_update_delay(args, playlist_path)
  File "/home/iiab-admin/.local/lib/python3.10/site-packages/xklb/playlists.py", line 152, in decrease_update_delay
    args.db.conn.execute("ALTER TABLE playlists ADD COLUMN hours_update_delay INTEGER DEFAULT 70")
sqlite3.OperationalError: no such table: playlists

Expected behavior

Eliminating xklb 2.2.110 regression, so things presumably work like 2.2.107 😅

Additional context

Ubuntu 22.04.3 LTS

$ uname -a
Linux lrn2 5.15.0-86-generic #96-Ubuntu SMP Wed Sep 20 08:23:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ python3 --version
Python 3.10.12

Just FYI "urllib3" appears twice in pdm.lock

No doubt harmless! ✅

But just FYI if in future dependencies ever need to be organized or reconciled... 💯

library/pdm.lock

Lines 2322 to 2345 in 8e1d57c

[[package]]
name = "urllib3"
version = "2.1.0"
requires_python = ">=3.8"
summary = "HTTP library with thread-safe connection pooling, file post, and more."
files = [
{file = "urllib3-2.1.0-py3-none-any.whl", hash = "sha256:55901e917a5896a349ff771be919f8bd99aff50b79fe58fec595eb37bbc56bb3"},
{file = "urllib3-2.1.0.tar.gz", hash = "sha256:df7aa8afb0148fa78488e7899b2c59b5f4ffcfa82e6c54ccb9dd37c1d7b52d54"},
]
[[package]]
name = "urllib3"
version = "2.1.0"
extras = ["socks"]
requires_python = ">=3.8"
summary = "HTTP library with thread-safe connection pooling, file post, and more."
dependencies = [
"pysocks!=1.5.7,<2.0,>=1.5.6",
"urllib3==2.1.0",
]
files = [
{file = "urllib3-2.1.0-py3-none-any.whl", hash = "sha256:55901e917a5896a349ff771be919f8bd99aff50b79fe58fec595eb37bbc56bb3"},
{file = "urllib3-2.1.0.tar.gz", hash = "sha256:df7aa8afb0148fa78488e7899b2c59b5f4ffcfa82e6c54ccb9dd37c1d7b52d54"},
]

all: follow fd-find size arg syntax

Describe the solution you'd like

I would like to be able to constrain using more than the default unit.

For example, size:

  • current: library watch -z+1024 (more than 1024 MB; this should still be valid)
  • desired: library watch -z+1gb (more than 1GB)

Duration:

  • current: library watch -d+60 (more than 60 mins; this should still be valid)
  • desired: library watch -d+1h (more than 1 hour)

upscale command

Is your feature request related to a problem? Please describe.

Download higher quality videos; use webpath, PURL, or comment to get the URL

Describe the solution you'd like

The solution would need to check that the media are, in fact, equivalent before replacing the file.

Describe alternatives you've considered

Not related to A.I. upscaling...

GUI

What is the problem that is being solved with the new feature?

It would be good to have a lightweight GUI for people that are afraid of CLI/TUI.

lb disk-usage might be a good candidate for early exploration as the command options are not too complicated and the interactivity loop might be interesting to experiment with

Enumerate an unordered list of alternatives that you've thought about

  • TUI but it seems like the benefits are lower than the cost

If applicable, state your a preferred solution

Maybe something like:

Windows Support

I would like to make sure this works on Windows but I don't use Windows so if anyone is willing to try this program out please report any Windows specific issues here

Describe alternatives you've considered
well people could just switch to linux but whatevs

all: multiple simultaneous playback

Is your feature request related to a problem? Please describe.
It might be interesting to experiment with multiple-playback: playing back multiple videos at the same time with different grids

Describe the solution you'd like
Something like -m 3 would open three files at the same time and,

  • if the number matches the number of screens, play fullscreen on each
  • if there are fewer or more screens then it would grid by splitting up the screen into equal parts

Describe alternatives you've considered

Additional considerations

  1. --loop for passing loop to mpv
  2. If one of the videos is closed then another takes its place.
  3. How to handle post-actions?

lt: flag to beep before playing something less than 30 seconds

Is your feature request related to a problem? Please describe.
Sometimes when mobile it's nice to know if some track is short or long. If you press next track then one might accidentally skip the song that just started playing.

Describe the solution you'd like
It would be nice to hear a chime or another soft sound before a short track is played

Describe alternatives you've considered
Perhaps the real solution for this will be to ignore media-key next if the currently playing song playhead is less than 5 seconds within Tasker or something.

Is there a safe/recommended way to upgrade yt-dlp before xklb has quite caught up?

Or is that unwise?

(Many thanks @chapmanjacobd for your suggestions and/or recommendation here!)

Related:

  • https://github.com/yt-dlp/yt-dlp/releases/tag/2023.11.16
  • https://github.com/yt-dlp/yt-dlp/releases/tag/2023.11.14
  • e.g. xklb currently includes yt-dlp 2023.10.13:

    library/pdm.lock

    Lines 2533 to 2549 in e96edb0

    [[package]]
    name = "yt-dlp"
    version = "2023.10.13"
    requires_python = ">=3.7"
    summary = "A youtube-dl fork with additional features and patches"
    dependencies = [
    "brotli; platform_python_implementation == \"CPython\"",
    "brotlicffi; platform_python_implementation != \"CPython\"",
    "certifi",
    "mutagen",
    "pycryptodomex",
    "websockets",
    ]
    files = [
    {file = "yt-dlp-2023.10.13.tar.gz", hash = "sha256:e026ea1c435ff36eef1215bc4c5bb8c479938b90054997ba99f63a4541fe63b4"},
    {file = "yt_dlp-2023.10.13-py2.py3-none-any.whl", hash = "sha256:2b069f22675532eebacdfd6372b1825651a751fef848de9ae6efe6491b2dc38a"},
    ]

Dogsheep RSS feed

Is your feature request related to a problem? Please describe.

rssadd rssupdate commands

Describe the solution you'd like

a stub exists in xklb/rss_extract.py. should probably keep using feedparser

Full video description not recorded in database

Describe the bug
Upon initiating a video download using the lb dl command, it has come to attention that the complete description/caption of the video is no longer being recorded in the description field within the database. Only the web path is stored in this field.

To Reproduce

  1. Execute the following command in the terminal using xklb:
    lb dl /path/to/database.db --video https://youtu.be/example-video-id --verbose
  2. Examine the description field in the database to confirm whether the full description is accurately recorded.

Expected behavior
The entire description of the downloaded video should be captured in the description field of the database entry.

Screenshots
N/A

Desktop (please complete the following information):

  • OS: Ubuntu 24.04 (Noble Numbat)
  • xklb version: 2.2.189

Additional context
The absence of the full description in the database obstructs users from obtaining available information about the downloaded videos, as highlighted in this reported case.

No module named 'pyparsing'

Describe the bug
Upon running the lb tubeadd command with the provided YouTube video link (https://www.youtube.com/watch?v=sqoOzGMqCQU), the following error is encountered:

ModuleNotFoundError: No module named 'pyparsing'

This error suggests that the pyparsing module is missing or not installed in the environment where the lb command is executed.

To Reproduce

  1. Execute the following command in the terminal:
    lb tubeadd https://www.youtube.com/watch?v=sqoOzGMqCQU
  2. Observe the error message indicating the absence of the pyparsing module.

Expected behavior
The lb tubeadd command should execute successfully without encountering a ModuleNotFoundError. The necessary dependencies, including pyparsing, should be available in the environment.

Desktop (please complete the following information):

  • OS: Ubuntu 24.04 (Noble Numbat)
  • xklb version: 2.2.197

Additional context
Should be easily fixable running the following inside the xklb venv (but it failed):

pip install pyparsing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.