Giter VIP home page Giter VIP logo

bernardro / actor-youtube-scraper Goto Github PK

View Code? Open in Web Editor NEW
23.0 23.0 19.0 300 KB

Apify actor to scrape Youtube search results. You can set the maximum videos to scrape per page as well as the date from which to start scraping.

Home Page: https://apify.com/bernardo/youtube-scraper

License: Apache License 2.0

Dockerfile 1.98% JavaScript 98.02%
apifier apify crawler pupetteer search youtube

actor-youtube-scraper's People

Contributors

bernardro avatar levent91 avatar metalwarrior665 avatar olehveselov92 avatar pocesar avatar rajivm1991 avatar x0r0x avatar zpelechova avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

actor-youtube-scraper's Issues

Each request fails with timeout error

Looks like they changed something on the site. Selector is there, but each request fails with:
TimeoutError: waiting for XPath \"//ytd-video-primary-info-renderer/div/h1/yt-formatted-string\" failed: timeout 30000ms exceeded

Only 30 results and old videos missing

{
"maxResults": 200,
"postsFromDate": "15 years",
"verboseLog": false,
"extendOutputFunction": "async ({ data, item, page, request, customData }) => {\n return item; \n}",
"extendScraperFunction": "async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {\n \n}",
"handlePageTimeoutSecs": 3600,
"proxyConfiguration": {
"useApifyProxy": true
},
"startUrls": [
{
"url": "https://www.youtube.com/c/scooterofficial/videos"
}
],
"customData": {}
}

how to capture the type of subtitles

Hi guys. Thanks for this great tool. We would like a new feature added so that the actor can determine the type of subtitles provided - whether they are user generated, or auto-generated.

This info is available on the youtube video if the actor could click the three dots to reveal the transcripts popup.

See image here: https://ibb.co/dLYcJFv

Can you guys code this for us in the script? We can pay for your work.
Thankyou!

Too little results

A general search input like this only gave 29 results and some errors

  "searchKeywords": "makeup ",
  "maxResults": 999,
  "postsFromDate": "6 month ago",
  "startUrl": "https://www.youtube.com/",
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "verboseLog": true
}```

stealth mode failing

It seems all attempts to scrape are failing.

Here are the errors:

"ERROR Stealth: StealthError: Failed to apply stealth trick reason: Refused to evaluate a string as JavaScript because 'unsafe-eval' is not an allowed source of script in the following Content Security Policy directive: "script-src 'report-sample' 'nonce-7PyoQsK7StgoaW6nndvU9g' 'unsafe-inline'".
2021-03-12T20:49:52.869Z ",

2021-03-12T20:50:03.949Z ERROR Request https://www.youtube.com/watch?v=BDZ6ujYN610 failed too many times

Old videos are not scraped

{
  "maxResults": 999999,
  "postsFromDate": "20 years",
  "verboseLog": false,
  "startUrls": [
    {
      "url": "https://www.youtube.com/user/dysonteam",
      "method": "GET"
    }
  ],
  "extendOutputFunction": "async ({ data, item, page, request, customData }) => {\n  return item; \n  \"title\"; \"likes\"; \"dislikes\"; \"url\"; \"upload date\"\n}",
  "extendScraperFunction": "async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {\n \n}",
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "customData": {}
}

Did not scrape 7 years old videos from the channel

How to go to the next page?

Hi there, I managed to get the actor working and returning 50 results. In the channel we are getting however, there are much more than 50 results. How can we get the actor to step through ALL pages so that it can scrape all videos of the channel, not just the first 50?

Multiple requests per video - ERROR PupeteerCrawler handleRequestFunction failed

For some reason when I try to scrape the videos of a channel, multiple requests are done per video due to the error mentioned in the title. Even though the data is collected on the first request, the error makes the request repeat until it has failed too many times. Unfortunately, this has an impact on time and cost so I would greatly appreciate some feedback on whether this is a known problem or I am doing something wrong.

My input:

{
  "extendOutputFunction": "async ({ data, item, page, request, customData }) => {\n  return item; \n}",
  "extendScraperFunction": "async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {\n \n}",
  "handlePageTimeoutSecs": 3600,
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "startUrls": [
    {
      "url": "https://www.youtube.com/channel/UCCgVtpDnUeUgOjqKVB3bE_A"
    }
  ],
  "subtitlesLanguage": "en",
  "customData": {},
  "maxComments": 0
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.