Giter VIP home page Giter VIP logo

yt-fts's People

Contributors

cherrries avatar danlamanna avatar dimakov avatar notjoemartinez avatar teddybear06 avatar tonym128 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yt-fts's Issues

Prevent duplicate subtitle entries in db

The current way we parse vtt files inserts duplicate quote entries with time stamp off by a couple seconds. This is because the vtt files we get from yt-dlp contain duplicate entries except one of them has a bunch of markup to segment the quote. See line 192. Removing these duplicates would probably speed something up

README: How to use old subtitles db?

The program downloaded many subtitles. It took some time. But then i closed the terminal session. And now it does not appear to recognise the .db file in the current working path. Is there a way to specify the db file to use?

Update database

Hi, can I update my database without downloading all a subtitles of YouTube channel again?

Missing LICENSE

Hi, what is the license of that code? The LICENCE file is missing.

Support Live Streamed Videos

It seems that the download command only downloads transcripts of the uploaded videos
It would be nice to also support videos which are live streamed

Alias for channel

Hi, first thanks for this useful package!

It would be great if it can support alias.

like

python3 yt_fts.py alias [NAME] [channel_id]
python3 yt_fts.py search [ALIAS_NAME or ID] [search text]

It would be better: when downloading, we can also specify the alias and it would create it automatically.

Implement sqlite_utils full-text search

from pr #17

As suggested on HN, yt-fts is currently using LIKE operator for searches.
The goal here is to leverage the SQLite FTS5 full-text search using sqlite_utils library.

HN suggestion:

It looks like you're running searches using LIKE: https://github.com/NotJoeMartinez/yt-fts/blob/050981c0519a96...

SQLite has a really power full-text search mechanism built in - FTS5. It can handle things like stemming and stop words and relevance ranking.

My sqlite-utils Python library includes helper methods for setting that up: https://sqlite-utils.datasette.io/en/stable/python-api.html#...

[Feature request] Allow searching only some videos in channel

This is an alternative to #18 to achieve similar goals.

It would be nice to be able to supply a regex on video titles as well as searching for content.

Using Lex Fridman's channel as an example:

His podcast has 376 videos: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4

However his "channel" has 689 videos: https://www.youtube.com/@lexfridman/videos

After downloading the channel content and querying through the episodes, a regex of /(Podcast)(?! Clips)/ will return all his podcast episodes but none of the other content.

This is obviously not as reliable as allowing a playlist URL but it might be a handy feature nonetheless and would seemingly only involve adjusting the search command with a new flag.

No such file or directory: 'yt-dlp'

I tried to run the example python yt_fts.py download "https://www.youtube.com/@TimDillonShow/videos"
UC4woSp8ITBoYDmjkukhEhxg

and consistently end up with an error No such file or directory: 'yt-dlp'


Downloading channel
Saving vtt files to /var/folders/x7/0r36c9sn7yg7tvs5sdm471000000gn/T/tmpbrh06qzz
The Tim Dillon Show
Traceback (most recent call last):
  File "/Users/saif/WORKSPACE/yt-fts/yt_fts.py", line 273, in <module>
    cli()
  File "/Users/saif/opt/anaconda3/envs/yt/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/saif/opt/anaconda3/envs/yt/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/saif/opt/anaconda3/envs/yt/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/saif/opt/anaconda3/envs/yt/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/saif/opt/anaconda3/envs/yt/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/saif/WORKSPACE/yt-fts/yt_fts.py", line 31, in download
    download_channel(channel_id)
  File "/Users/saif/WORKSPACE/yt-fts/yt_fts.py", line 84, in download_channel
    subprocess.run([
  File "/Users/saif/opt/anaconda3/envs/yt/lib/python3.10/subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/Users/saif/opt/anaconda3/envs/yt/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/Users/saif/opt/anaconda3/envs/yt/lib/python3.10/subprocess.py", line 1863, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'yt-dlp'

Cookies consent page

Hi,

First, thanks for this tool, really useful.

As reported on HN by Europe users, it exists a YouTube cookies consent page that blocks channel_id retrieving (first) and consequently, all other requests.

French version

English version

File ".../yt-fts/yt_fts.py", line 29, in download
    channel_id = get_channel_id(channel_url)
  File ".../yt-fts/yt_fts.py", line 176, in get_channel_id
    channel_id = re.search('channelId":"(.{24})"', html).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

I already faced this issue and adding a cookie indicating that consent has been given to a requests session can "solve" this.

s = requests.session()
s.cookies.set("CONSENT", "YES+1")
[...]
res = s.get(url)

In order to respect the initial goal of this consent page, we can ask the user to give its consent through a CLI argument like so:

python yt_fts.py download "https://www.youtube.com/@ycombinator/videos" --cookies_consent=1

It's just a suggestion as it can also be a question that prompt in CLI during download but this require to know that the user is in Europe (or it can apply to all users but it can be annoying if it's not really needed after all).

I tried to analyse "Reject all" selection behavior but the CONSENT cookie's content is still PENDING+{RANDOM NUMBER} (perhaps not random from Google's POV but I couldn't explain this value) so from my point of view only "Accept all" is "working".

Do you have any thoughts about this?

Kind regards,

Support downloading a specific quote as audio or video.

Imagine you need a sound bite. Currently the workflow is as follows:

  1. You run download for all the subs.
  2. Then you search and find a quote that fits your needs.
  3. Now you need to manually download the file with yt-dl. yt-dl <link> or yt-dl -x <link>
  4. Next step is to cut the media file: ffmpeg -i <input file> -ss <ts> -t <duration> -acodec copy -vcodec copy <output. file>

A streamlined workflow could look like this:

  1. download channel subs
  2. search key words
  3. Get quote id from listing
  4. yt-fts quote-dl --audio <ID> to download sound or video bite. Maybe this needs a duration argument?
  5. You find a file name <video-ID>-<quote-ID><Sanitized Quote>.mp3 (or similar) in your working dir.

Done. yt-fts would download the file as specified (e.g. via --audio or --video) and cut it to bits.

Is this something that is in scope of this project? Do any user users have this use case?

Seach across channels

It would be nice if it were possible to search across all downloaded channels.
Maybe with an --all flag?

Only fetch videos with CC

I think yt-dlp fetches all the videos in a channel, then fetches the stats of each video (checking to see if there are captions).

Large channels with single-digit number number of videos with captions are slow to download (and hit api limits).

The (paid and official) YouTube API allows you to retrieve the video IDs with captions in a specific channel.

curl

curl \
  'https://youtube.googleapis.com/youtube/v3/search?channelId=[ChannelID]&part=id&type=video&videoCaption=closedCaption&key=[KEY]' \
  --header 'Accept: application/json' \
  --compressed

response

{
  "kind": "youtube#searchListResponse",
  "etag": "995jyKTI3Q_SpXkNvcBCDR77qP0",
  "nextPageToken": "CAUQAA",
  "regionCode": "",
  "pageInfo": {
    "totalResults": 141,
    "resultsPerPage": 5
  },
  "items": [
    {
      "kind": "youtube#searchResult",
      "etag": "",
      "id": {
        "kind": "youtube#video",
        "videoId": ""
      }
    },
    {
      "kind": "youtube#searchResult",
      "etag": "",
      "id": {
        "kind": "youtube#video",
        "videoId": ""
      }
    },
    {
      "kind": "youtube#searchResult",
      "etag": "",
      "id": {
        "kind": "youtube#video",
        "videoId": ""
      }
    },
    {
      "kind": "youtube#searchResult",
      "etag": "",
      "id": {
        "kind": "youtube#video",
        "videoId": ""
      }
    },
    {
      "kind": "youtube#searchResult",
      "etag": "",
      "id": {
        "kind": "youtube#video",
        "videoId": ""
      }
    }
  ]
}

[Feature request] Playlist support

Please add playlist support. Many video collections of interest are organized in playlists and not channels. I don't know if the identifier for playlists is in a different namespace. yt-dlp support playlists.

Fix default database config not being created

on macos/linux default config path should be

 db_path = f"{os.path.join(os.getenv('HOME'), '.config', 'yt-fts')}/subtitles.db"

on windows

db_path = f"{os.path.join(os.getenv('APPDATA'), 'yt-fts')}/subtitles.db"

for some reason it's defaulting to the current directory

Save database and config files to user .config folder

The script currently saves the database to the current working directory, ideally it should be some where in ~/.local/share/yt-fts/subtitles.db. I don't know the best practices for writing software that "invites itself" to a users config directories.

My general questions are:

  • Do I prompt the user for a config path or just make one without asking?
  • Where do I store these configs on different platforms?
  • Do packages installed through pypi have the system permissions to do this on their own?
  • How ispip uninstall yt-fts supposed to know where this is?

Support for Windows

Hi, this looks like a promising tool. A few points to hopefully help towards Windows support:

  1. The README should be updated with instructions to set up a venv using activate.bat,.

  2. What Python version(s) are supported? What versions do we know work with yt-fts?

  3. Current state on Windows fails to run download command. Here is the output from my terminal:

python yt_fts.py download "https://www.youtube.com/@TimDillonShow/videos"

UC4woSp8ITBoYDmjkukhEhxg
Downloading channel
Saving vtt files to C:\Users\FOO\AppData\Local\Temp\tmp6oqtgfyb
The Tim Dillon Show
Traceback (most recent call last):
  File "C:\Users\FOO\Documents\git\yt-fts\yt_fts.py", line 273, in <module>
    cli()
  File "C:\Users\FOO\Documents\git\yt-fts\.env\Lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\Documents\git\yt-fts\.env\Lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\Users\FOO\Documents\git\yt-fts\.env\Lib\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\FOO\Documents\git\yt-fts\.env\Lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\FOO\Documents\git\yt-fts\.env\Lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\FOO\Documents\git\yt-fts\yt_fts.py", line 31, in download
    download_channel(channel_id)
  File "C:\Users\FOO\Documents\git\yt-fts\yt_fts.py", line 84, in download_channel
    subprocess.run([
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1008.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1008.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 1024, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1008.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 1509, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.