Giter VIP home page Giter VIP logo

scrapetube's Introduction

Scrapetube

This module will help you scrape youtube without the official youtube api and without selenium.

With this module you can:

  • Get all videos from a Youtube channel.
  • Get all videos from a playlist.
  • Search youtube.

Installation

pip3 install scrapetube

Usage

Here's a few short code examples.

Get all videos for a channel

import scrapetube

videos = scrapetube.get_channel("UCCezIgC97PvUuR4_gbFUs5g")

for video in videos:
    print(video['videoId'])

Get all videos for a playlist

import scrapetube

videos = scrapetube.get_playlist("PL-osiE80TeTt2d9bfVyTiXJA-UTHn6WwU")

for video in videos:
    print(video['videoId'])

Make a search

import scrapetube

videos = scrapetube.get_search("python")

for video in videos:
    print(video['videoId'])

Full Documentation

https://scrapetube.readthedocs.io/en/latest/

scrapetube's People

Contributors

ahmetbersoz avatar beheadedstraw avatar dermasmid avatar emresvd avatar nannosilver avatar surajbhari avatar sylvqin avatar twissell- avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scrapetube's Issues

Shouldn't key names use double quotes, instead of single?

If I copy the output of the search result ("print(videos)") and paste it into a JSON editor, the editor fails to parse the output, because the name is single-quoted. That is, it's like {'name': 'value'}. Shouldn't it be {"name": "value"}?

1 channel has 22,000 videos in total.

What is the limit of scraptube?
I am using it to get all video links at once from YouTube channels. Channels that have a lot of videos. Because humanly it is not possible, we need software to do so. But scraptube can't recognize it.

I just want to know why. Is it possible to get 22,000 video links at once or not?

In flask scrapetube is getting error please help

from flask import Flask, jsonify
from flask_cors import CORS
import scrapetube

app = Flask(name)
CORS(app)

@app.route("/")
def hello_world():
return "Hello world"

@app.route('/name/string:song_name')
def search(song_name):
videoid = scrapetube.get_search(song_name , limit=1 , sleep=1)
for video in videoid:
return video["videoid"]

if name == "main":
app.run(debug=True)

91f833e8-f0f8-4d08-b09d-62364c19cf79

After YouTube update nothing works

After the lastest update i can't get videos from a channel

video_generator = scrapetube.get_channel(channel_id=channel_id, limit=5, sort_by="newest")

The generator gives an empty array

get_search weird output length

Hello
Am I doing something wrong? It seems like get_search always finds around 500-600 results even if there are more (see screenshot below), and there is not the same amount of videos from one time to the next (600 for global warming vs 2 million in reality?).
Do you know why ?

In [1]: import scrapetube
In [2]: videos = scrapetube.get_search("global warming")
In [3]: len([1 for _ in videos])
Out[3]: 604

In [4]: videos = scrapetube.get_search("global warming")
In [5]: len([1 for _ in videos])
Out[5]: 605

In [6]: videos = scrapetube.get_search("global warming")
In [7]: len([1 for _ in videos])
Out[7]: 602

image

how to scrape channel like @Danidev

how to scrape channel with @ at the url.

i tried channel_url="https://www.youtube.com/@Danidev" and channel_id="@Danidev" but it didn't work

any suggestion or help will be thank full

How I got it working

Partly working for me.

Pip install version did not work. Can't remember the error. (Python 3.8)

Ended up downloading the respiratory as a zip file and unzipped it into my desired folder.

I now use the lovely code by using a modified version of the test script in the 'tests' folder.

That works lovely.

Thanks for sharing 💯

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

import scrapetube

videos = scrapetube.get_channel("UCCezIgC97PvUuR4_gbFUs5g")

for video in videos:
    print(video['videoId'])

results in:

Traceback (most recent call last):
  File "/tmp/test/main.py", line 5, in <module>
    for video in videos:
  File "/tmp/test/scrapetube.py", line 75, in get_channel
    for video in videos:
  File "/tmp/test/scrapetube.py", line 199, in get_videos
    client = json.loads(
             ^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

[Feature Request] Equivalent of the "UPLOAD DATE" option on YouTube's website for `get_search`.

get_search method does not have the "UPLOAD DATE" filter which is available on the YouTube's website. The thing is, when "SORT BY" "Upload date", it returns different results depending on whether "UPLOAD DATE" is set to "Today" or it is not set. From my experiment, get_search returns the result that is the same searching YouTube's website without setting "UPLOAD DATE".

I need this "UPLOAD DATE" argument, because without this, the search does not return a lot of the new videos.

If adding this argument is technically infeasible, due to YouTube's encryption, obfuscation, or something like that, please close this issue.

Search on YouTube's website
image

Searching for "puppies", with "SORT BY" = "Upload date", "UPLOAD DATE" = "Last hour"
image

Searching for "puppies", with "SORT BY" = "Upload date"
image

Feature request: Get playlist links?

There is already a playlist_id parameter in scrapetube.get_playlist, which gets videos in a playlist:

image

What if I want to get all the playlists from a specific channel, just like I can already get its videos?

get_playlist is limited to 100 videos

Currently it appears that the get_playlist function is only able to collect 100 videos, and if the playlist is longer, it simply skips over the remaining videos. Is it possible for this to be fixed?

Set Accept-Language header to English

Some of the data returned by the internal YouTube API is localized, e.g. the upload date string. If no Accept-Language header is present, YouTube will assume the language via the users's IP.

This leads to inconsistent output data and possible parsing errors if scrapetube is used in a non-English-speaking country.

That's why I would suggest setting the Accept-Language header to en. I am from Germany and can confirm that YouTube ouputs English date strings with that setting.

session = requests.Session()
session.headers[
    "User-Agent"
] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101
Safari/537.36"
session.headers["Accept-Language"] = "en"

[Feature Request] Video Information by ID

Feature Request

Description:

I would like to request a new feature to retrieve video titles and dates published using a video ID.

Feature Details:

  • Feature Name: Retrieve Video Title and Date Published
  • Description: This feature will allow users to fetch the title and date of publication for a video by providing its video ID.
  • Use Case: This feature is essential for users who need to extract video metadata, such as title and publication date, using video IDs.

Suggested Implementation:

The feature could be implemented by adding a new function, e.g., scrapetube.get_video_metadata(video_id: str), which accepts a video ID and returns the video's title and date of publication.

get_playlist

I method get_playlist without arg limit max video always 100, I noticed this when I tried get playlist 1000 videos

Any way to scrape traffic thru a proxy

Very interesting and useful library.

I wonder if there is a way to force the traffic to go thru a proxy, instead directly to Youtube, to not get blocked.

"ResourceWarning: Unclosed socket" while stopping generator iteration

I would like to exit the get channel generator after I have reached the video I want. However, it seems there is no safe exit, a socket is left open. Below is the code:

import scrapetube

channel_videos_generator = scrapetube.get_channel(channel_id=channel_id, sort_by="newest")

for video in channel_videos_generator:
    if video["videoId"] != since_video_id:
        yield video["videoId"]
    else:
        channel_videos_generator.close()

When I run the above code, I get the warning below:

ResourceWarning: unclosed <ssl.SSLSocket fd=756, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('192.168.0.20', 58222), raddr=('216.58.223.78', 443)>
  channel_videos_generator.close()
ResourceWarning: Enable tracemalloc to get the object allocation traceback

Add Channel Name support

This is not a bug; it's a feature request!

Youtube supports at least two formats of channel url:

  1. /channel/{channel_id}, which is currently covered by the default channel_id argument in the get_channels() function
  2. /c/{channel_name}, this typically pops up when Youtube channels are big enough that they get claim to an actual "name", vs a randomized ID value.

The current workaround for this is to utilize the current channel_url argument, however, thought it might prove more streamlined to support an actual channel_name argument in case users want to search videos by a list of actual channel names!

Here's an example of the additional format which this argument would support (one of my favorite streamers): https://www.youtube.com/c/Welyn

I'll make a PR with the changes; open to thoughts / comments / concerns! And, as always, thank you for your work on this useful package / contribution to the open source community!

Cannot go beyond 20K videos

previously we could have get all videos data in each channel. However, recently it is limited to only last 20k videos.
How can we address this problem?

Published Date?

I only see this in the output:

'publishedTimeText': {
 		'simpleText': 'Streamed 59 minutes ago'
 	},

Is there a way to get a proper date?

Scraping periodically returns potentially cached results.

Issue: When posting a new video, scraping returns the correct video id's including the new video, the next scrape returns the previous videos without the new video, the third time scraping returns the correct video ids.

How to reproduce:
Every 70 seconds
videos = scrapetube.get_channel(channel_username = your youtube username)
video_ids = [video['videoId'] for video in videos]
print(video_ids)

Run the script and wait for a print to compare with, then post a video on your youtube channel, and wait for 3+ prints, then compare the results.

I'll use numbers instead of youtube video ID's to demonstrate what results i get, think of the number as a youtube video ID.

[5,4,3,2,1] (state of the channel before new video posted)
[6,5,4,3,2] (new video posted)
[5,4,3,2,1] (scrape now returns the old state of the channel, the previous 5 videos, is this from cache?)
[6,5,4,3,2] (from now on, it returns the correct video ID's)
[6,5,4,3,2]
[6,5,4,3,2]

EDIT: I increased the scrape period to 120 seconds and that worked

JSONDecodeError

I'm getting the same error as #37 and #36

package is up to date as it was installed post-last release

Unhandled exception in internal background task 'upload_check'.
Traceback (most recent call last):
File "C:\Users\lee_p\AppData\Roaming\Python\Python311\site-packages\disnake\ext\tasks_init_.py", line 162, in loop
await self.coro(*args, **kwargs)
File "C:\Users\lee_p\Documents\Programming\The Handler\cogs\youtube.py", line 27, in upload_check
for latest_video in videos:
File "C:\Users\lee_p\AppData\Roaming\Python\Python311\site-packages\scrapetube\scrapetube.py", line 75, in get_channel
for video in videos:
File "C:\Users\lee_p\AppData\Roaming\Python\Python311\site-packages\scrapetube\scrapetube.py", line 199, in get_videos
client = json.loads(
^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\json_init
.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

it was initially working but suddenly stopped

Unable to get popular YouTube channel videos after 2.4.0

Hi, I am facing issues after latest update of the package.

In scrapetube.get_channel("UCCezIgC97PvUuR4_gbFUs5g", sort_by="popular"); the function is not retrieving popular videos instead getting the recent videos only.

Please fix it.

Getting videos of channel not working

It seems that getting videos for a channel is not working because the html youtube responds doesn't contain the desired json anymore.

Consider the example

import scrapetube

videos = scrapetube.get_channel("UCCezIgC97PvUuR4_gbFUs5g")

for video in videos:
    print(video['videoId'])

it currently only leads to

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[removed for privacy]/.local/lib/python3.8/site-packages/scrapetube/scrapetube.py", line 46, in get_channel
    for video in videos:
  File "[removed for privacy]/.local/lib/python3.8/site-packages/scrapetube/scrapetube.py", line 142, in get_videos
    client = json.loads(
  File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I'm using the latest version 2.2.2. The search example still works fine.

The api is sending old videos

class notify(commands.Cog):
    def __init__(self, bot):
        self.bot = bot
        self.channels = {
            "Airi Viridis Ch. 【V-Dere】": f"@AiriViridis"
        }
        self.videos = {}

    @commands.Cog.listener()
    async def on_ready(self):
        self.check.start()

    @tasks.loop(seconds=60)
    async def check(self):
        discord_channel = self.bot.get_channel(838365746069372982)

        for channel_name in self.channels:
            videos = scrapetube.get_channel(channel_url=self.channels[channel_name], limit=1,content_type="streams")
            video_ids = [video["videoId"] for video in videos]
            print(video_ids)

            if self.check.current_loop == 0:
                self.videos[channel_name] = video_ids
                continue

            for video_id in video_ids:
                if video_id not in self.videos[channel_name]:
                    url = f"https://youtu.be/{video_id}"
                    await discord_channel.send(f"@everyone\n{url}")

            self.videos[channel_name] = video_ids

def setup(bot):
    bot.add_cog(notify(bot))

I dont know if i'm doing something wrong, but when the bot is running, it sends old videos, like 1 month o more

The video list is empty when using channel_url

Hello, this library you made is very good, I really like it, but I have a problem while using it.

When I try to get a list of videos from a youtube channel, using channel_url , it doesn't return anything. You can see my code below

>>> from scrapetube import get_channel
>>> list(get_channel(channel_url = "https://youtube.com/@INSOMNIAFILM?si=idnQuTmk6g5XODzT", limit = 1))
[]

When I use channel_url it doesn't return anything, but when I use the id of the youtube channel it returns a list of videos.

>>> list(get_channel(channel_id = 'UCoWUsYrb3xtukm91d3_S3jw', limit = 1)
[{'videoId': 'WH692kPmQfQ', 'thumbnail': {'thumbnails': [{'url': 'https://i.ytimg.com/vi/WH692kPmQfQ/hqdefault.jpg?sqp=-oaymwEbCKgBEF5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLDj0SgQg8icqmdeYnY8SuhU-Nm9aQ', 'width': 168, 'height': 94}, {'url': 'https://i.ytimg.com/vi/WH692kPmQfQ/hqdefault.jpg?sqp=-oaymwEbCMQBEG5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLDZ2tkPU-XkFB3gd56YAA9kjvp1Qw', 'width': 196, 'height': 110}, {'url': 'https://i.ytimg.com/vi/WH692kPmQfQ/hqdefault.jpg?sqp=-oaymwEcCPYBEIoBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLBy3H3kE0Nwo7-zP0E8SyRogwUyDw', 'width': 246, 'height': 138}, {'url': 'https://i.ytimg.com/vi/WH692kPmQfQ/hqdefault.jpg?sqp=-oaymwEcCNACELwBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLBUvFhCh4Ty2CzJy-mnRG9EZ2xEvg', 'width': 336, 'height': 188}]}, 'title': {'runs': [{'text': 'HABIS NONTON FILM INI LANGSUNG PRAKTEKIN ILMUNYA...'}], 'accessibility': {'accessibilityData': {'label': 'HABIS NONTON FILM INI LANGSUNG PRAKTEKIN ILMUNYA... by INSOMNIA FILM 343,444 views 4 days ago 27 minutes'}}}, 'descriptionSnippet': {'runs': [{'text': '"Terinspirasi" dari Film LE BRIO\n\nPengisi Suara :\nRichard Chandra\n\nManager :\nJoko Mulyanto\n\nPenulis :\nSurya Yahya Wijaya\n\nEditor :\nAnton Ramdhani\nDedi Hidayat\nAchmad Regi Permana\n\nMakasih udah...'}]}, 'publishedTimeText': {'simpleText': '4 days ago'}, 'lengthText': {'accessibility': {'accessibilityData': {'label': '27 minutes, 7 seconds'}}, 'simpleText': '27:07'}, 'viewCountText': {'simpleText': '343,444 views'}, 'navigationEndpoint': {'clickTrackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJcWhhVQ29XVXNZcmIzeHR1a205MWQzX1MzaneaAQMQ8jg=', 'commandMetadata': {'webCommandMetadata': {'url': '/watch?v=WH692kPmQfQ', 'webPageType': 'WEB_PAGE_TYPE_WATCH', 'rootVe': 3832}}, 'watchEndpoint': {'videoId': 'WH692kPmQfQ', 'watchEndpointSupportedOnesieConfig': {'html5PlaybackOnesieConfig': {'commonConfig': {'url': 'https://rr1---sn-uxa3vhnxa-ngpe.googlevideo.com/initplayback?source=youtube&oeis=1&c=WEB&oad=3200&ovd=3200&oaad=11000&oavd=11000&ocs=700&oewis=1&oputc=1&ofpcc=1&beids=24350018&msp=1&odepv=1&id=587ebdda43e641f4&ip=182.3.140.209&initcwndbps=270000&mt=1694766340&oweuc='}}}}}, 'ownerBadges': [{'metadataBadgeRenderer': {'icon': {'iconType': 'CHECK_CIRCLE_THICK'}, 'style': 'BADGE_STYLE_TYPE_VERIFIED', 'tooltip': 'Verified', 'trackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJc', 'accessibilityData': {'label': 'Verified'}}}], 'trackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJcQPSDmZ-ku6-_WA==', 'showActionMenu': False, 'shortViewCountText': {'accessibility': {'accessibilityData': {'label': '343K views'}}, 'simpleText': '343K views'}, 'menu': {'menuRenderer': {'items': [{'menuServiceItemRenderer': {'text': {'runs': [{'text': 'Add to queue'}]}, 'icon': {'iconType': 'ADD_TO_QUEUE_TAIL'}, 'serviceEndpoint': {'clickTrackingParams': 'CPIBEP6YBBgHIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True}}, 'signalServiceEndpoint': {'signal': 'CLIENT_SIGNAL', 'actions': [{'clickTrackingParams': 'CPIBEP6YBBgHIhMI1uPzk5qsgQMVY071BR1sNwJc', 'addToPlaylistCommand': {'openMiniplayer': True, 'videoId': 'WH692kPmQfQ', 'listType': 'PLAYLIST_EDIT_LIST_TYPE_QUEUE', 'onCreateListCommand': {'clickTrackingParams': 'CPIBEP6YBBgHIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/playlist/create'}}, 'createPlaylistServiceEndpoint': {'videoIds': ['WH692kPmQfQ'], 'params': 'CAQ%3D'}}, 'videoIds': ['WH692kPmQfQ']}}]}}, 'trackingParams': 'CPIBEP6YBBgHIhMI1uPzk5qsgQMVY071BR1sNwJc'}}, {'menuServiceItemDownloadRenderer': {'serviceEndpoint': {'clickTrackingParams': 'CPEBENGqBRgIIhMI1uPzk5qsgQMVY071BR1sNwJc', 'offlineVideoEndpoint': {'videoId': 'WH692kPmQfQ', 'onAddCommand': {'clickTrackingParams': 'CPEBENGqBRgIIhMI1uPzk5qsgQMVY071BR1sNwJc', 'getDownloadActionCommand': {'videoId': 'WH692kPmQfQ', 'params': 'CAI%3D'}}}}, 'trackingParams': 'CPEBENGqBRgIIhMI1uPzk5qsgQMVY071BR1sNwJc'}}, {'menuServiceItemRenderer': {'text': {'runs': [{'text': 'Share'}]}, 'icon': {'iconType': 'SHARE'}, 'serviceEndpoint': {'clickTrackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/share/get_share_panel'}}, 'shareEntityServiceEndpoint': {'serializedShareEntity': 'CgtXSDY5MmtQbVFmUQ%3D%3D', 'commands': [{'clickTrackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJc', 'openPopupAction': {'popup': {'unifiedSharePanelRenderer': {'trackingParams': 'CPABEI5iIhMI1uPzk5qsgQMVY071BR1sNwJc', 'showLoadingSpinner': True}}, 'popupType': 'DIALOG', 'beReused': True}}]}}, 'trackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJc'}}], 'trackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJc', 'accessibility': {'accessibilityData': {'label': 'Action menu'}}}}, 'thumbnailOverlays': [{'thumbnailOverlayTimeStatusRenderer': {'text': {'accessibility': {'accessibilityData': {'label': '27 minutes, 7 seconds'}}, 'simpleText': '27:07'}, 'style': 'DEFAULT'}}, {'thumbnailOverlayToggleButtonRenderer': {'isToggled': False, 'untoggledIcon': {'iconType': 'WATCH_LATER'}, 'toggledIcon': {'iconType': 'CHECK'}, 'untoggledTooltip': 'Watch later', 'toggledTooltip': 'Added', 'untoggledServiceEndpoint': {'clickTrackingParams': 'CO8BEPnnAxgCIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/browse/edit_playlist'}}, 'playlistEditEndpoint': {'playlistId': 'WL', 'actions': [{'addedVideoId': 'WH692kPmQfQ', 'action': 'ACTION_ADD_VIDEO'}]}}, 'toggledServiceEndpoint': {'clickTrackingParams': 'CO8BEPnnAxgCIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/browse/edit_playlist'}}, 'playlistEditEndpoint': {'playlistId': 'WL', 'actions': [{'action': 'ACTION_REMOVE_VIDEO_BY_VIDEO_ID', 'removedVideoId': 'WH692kPmQfQ'}]}}, 'untoggledAccessibility': {'accessibilityData': {'label': 'Watch later'}}, 'toggledAccessibility': {'accessibilityData': {'label': 'Added'}}, 'trackingParams': 'CO8BEPnnAxgCIhMI1uPzk5qsgQMVY071BR1sNwJc'}}, {'thumbnailOverlayToggleButtonRenderer': {'untoggledIcon': {'iconType': 'ADD_TO_QUEUE_TAIL'}, 'toggledIcon': {'iconType': 'PLAYLIST_ADD_CHECK'}, 'untoggledTooltip': 'Add to queue', 'toggledTooltip': 'Added', 'untoggledServiceEndpoint': {'clickTrackingParams': 'CO4BEMfsBBgDIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True}}, 'signalServiceEndpoint': {'signal': 'CLIENT_SIGNAL', 'actions': [{'clickTrackingParams': 'CO4BEMfsBBgDIhMI1uPzk5qsgQMVY071BR1sNwJc', 'addToPlaylistCommand': {'openMiniplayer': True, 'videoId': 'WH692kPmQfQ', 'listType': 'PLAYLIST_EDIT_LIST_TYPE_QUEUE', 'onCreateListCommand': {'clickTrackingParams': 'CO4BEMfsBBgDIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/playlist/create'}}, 'createPlaylistServiceEndpoint': {'videoIds': ['WH692kPmQfQ'], 'params': 'CAQ%3D'}}, 'videoIds': ['WH692kPmQfQ']}}]}}, 'untoggledAccessibility': {'accessibilityData': {'label': 'Add to queue'}}, 'toggledAccessibility': {'accessibilityData': {'label': 'Added'}}, 'trackingParams': 'CO4BEMfsBBgDIhMI1uPzk5qsgQMVY071BR1sNwJc'}}, {'thumbnailOverlayNowPlayingRenderer': {'text': {'runs': [{'text': 'Now playing'}]}}}], 'richThumbnail': {'movingThumbnailRenderer': {'movingThumbnailDetails': {'thumbnails': [{'url': 'https://i.ytimg.com/an_webp/WH692kPmQfQ/mqdefault_6s.webp?du=3000&sqp=CIDuj6gG&rs=AOn4CLD3RhdFDTEnfUy23ZcDvsbpDMBPSQ', 'width': 320, 'height': 180}], 'logAsMovingThumbnail': True}, 'enableHoveredLogging': True, 'enableOverlay': True}}}]

Can you help me?:)

No attribute 'channel_url'

Scrapetube's documentation says this about the parameter 'channel_url':

channel_url (str, optional) – The url to the channel you want to get the videos for. Since there is a few type’s of channel url’s, you can use the one you want by passing it here instead of using channel_id.

image

...yet I can't actually use that parameter:

>>> videos = scrapetube.channel_url("https://www.youtube.com/c/Cmaj7/videos")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'scrapetube' has no attribute 'channel_url'

Feature request: Check the publication date of a video & set limit by date

It would be nice to have this feature when getting videos from a channel.

video['publishedTimeText']['simpleText']

will return some strings like "13 days ago" or "2 weeks ago" but the result will change over time.
Also, there are some issues when converting text to date.

Is there a way to check the publication date of a video while running get_channel function?

Duration

Is it possible to get the video duration?

not work in linux when behind proxy

Hi
i m trying to use scrapetube on ubuntu 20.04 Python 3.8.10 behind company proxy setup-ed as follow:
proxies = {"http": "http://x.x.253.137:80","https": "http://x.x.253.137:80"}

scrapetube return:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/blairfancy/.local/lib/python3.8/site-packages/scrapetube/scrapetube.py", line 121, in get_search
    for video in videos:
  File "/home/blairfancy/.local/lib/python3.8/site-packages/scrapetube/scrapetube.py", line 138, in get_videos
    client = json.loads(
  File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I suspect that proxy block connection.

strangely, get_video funtcion doesn't work

Use replit free environment.

I kept getting the error message using the scrapetube.get_video function.
I don't understand why because the code looks fine.

It works after I create a custom python file that only keeps the functions I need for get_video.

``
import json
from typing import Generator

import requests
from typing_extensions import Literal

type_property_map = {
"videos": "videoRenderer",
"streams": "videoRenderer",
"shorts": "reelItemRenderer"
}

def get_session() -> requests.Session:
session = requests.Session()
session.headers[
"User-Agent"
] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
session.headers["Accept-Language"] = "en"
return session

def get_initial_data(session: requests.Session, url: str) -> str:
session.cookies.set("CONSENT", "YES+cb", domain=".youtube.com")
response = session.get(url, params={"ucbcb":1})

html = response.text
return html

def get_json_from_html(html: str, key: str, num_chars: int = 2, stop: str = '"') -> str:
pos_begin = html.find(key) + len(key) + num_chars
pos_end = html.find(stop, pos_begin)
return html[pos_begin:pos_end]

def search_dict(partial: dict, search_key: str) -> Generator[dict, None, None]:
stack = [partial]
while stack:
current_item = stack.pop(0)
if isinstance(current_item, dict):
for key, value in current_item.items():
if key == search_key:
yield value
else:
stack.append(value)
elif isinstance(current_item, list):
for value in current_item:
stack.append(value)

def get_video(
id: str,
) -> dict:

"""Get a single video.

Parameters:
id (str):
The video id from the video you want to get.
"""

session = get_session()
url = f"https://www.youtube.com/watch?v={id}"
html = get_initial_data(session, url)
client = json.loads(
get_json_from_html(html, "INNERTUBE_CONTEXT", 2, '"}},') + '"}}'
)["client"]
session.headers["X-YouTube-Client-Name"] = "1"
session.headers["X-YouTube-Client-Version"] = client["clientVersion"]
data = json.loads(
get_json_from_html(html, "var ytInitialData = ", 0, "};") + "}"
)
return next(search_dict(data, "videoPrimaryInfoRenderer"))
``

[Question] How to search for the next page when I want?

I have tried the following code, but it kept on searching on its own. I am not good at Python, so I am not sure how that is possible, but my guess is that get_search is asynchronously keep searching, and Python's for statement keeps monitoring videos for newly added elements and continues looping?

import scrapetube

videos = scrapetube.get_search("what to search",  sort_by="upload_date")

for video in videos:
    print(video["title"]["runs"][0]["text"])

But what I want is manually continuing the search when I want, in the traditional way, like having "more" button and when the user presses it, it shows more search results. Is that possible with this library?

shorts thumbnails ?

I got plenty of channel links which I would like to get the video thumbnail urls from (shorts), so that I can add them to a google sheet. Can scrapetube achive that ?

Add get_user

Hi dermasmid,
first thanks for your work, I really appreciate it.
I'm testing scrapetube and I think that if you add a support for youtube user, get_user, in a similar way that get_channel it would expand the use. I modified it in order to test.

Again thanks.

Getting videos of channel not working [part.2]

HI,
I'm using the latest version 2.5.0. It worked a few days ago, but now it doesn't work anymore, it has the same error it had last year:

for video in videos:

File "/usr/local/lib/python3.10/dist-packages/scrapetube/scrapetube.py", line 75, in get_channel
for video in videos:
File "/usr/local/lib/python3.10/dist-packages/scrapetube/scrapetube.py", line 199, in get_videos
client = json.loads(
File "/usr/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

scrapetube.get_channel # JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [764]: import scrapetube
     ...: videos = scrapetube.get_channel('@ToolingUSME')
     ...: vs=list(videos)
     ...: print(U.stime(),len(vs))

~/anaconda3/lib/python3.9/site-packages/scrapetube/scrapetube.py in get_channel(channel_id, channel_url, limit, sleep, sort_by)
     48     api_endpoint = "https://www.youtube.com/youtubei/v1/browse"
     49     videos = get_videos(url, api_endpoint, "videoRenderer", limit, sleep)
---> 50     for video in videos:
     51         yield video
     52 

~/anaconda3/lib/python3.9/site-packages/scrapetube/scrapetube.py in get_videos(url, api_endpoint, selector, limit, sleep)
    148         if is_first:
    149             html = get_initial_data(session, url)
--> 150             client = json.loads(
    151                 get_json_from_html(html, "INNERTUBE_CONTEXT", 2, '"}},') + '"}}'
    152             )["client"]

~/anaconda3/lib/python3.9/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    344             parse_int is None and parse_float is None and
    345             parse_constant is None and object_pairs_hook is None and not kw):
--> 346         return _default_decoder.decode(s)
    347     if cls is None:
    348         cls = JSONDecoder

~/anaconda3/lib/python3.9/json/decoder.py in decode(self, s, _w)
    335 
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()
    339         if end != len(s):

~/anaconda3/lib/python3.9/json/decoder.py in raw_decode(self, s, idx)
    353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.