Giter VIP home page Giter VIP logo

actor-youtube-scraper's Issues

Only 30 results and old videos missing

{
"maxResults": 200,
"postsFromDate": "15 years",
"verboseLog": false,
"extendOutputFunction": "async ({ data, item, page, request, customData }) => {\n return item; \n}",
"extendScraperFunction": "async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {\n \n}",
"handlePageTimeoutSecs": 3600,
"proxyConfiguration": {
"useApifyProxy": true
},
"startUrls": [
{
"url": "https://www.youtube.com/c/scooterofficial/videos"
}
],
"customData": {}
}

Multiple requests per video - ERROR PupeteerCrawler handleRequestFunction failed

For some reason when I try to scrape the videos of a channel, multiple requests are done per video due to the error mentioned in the title. Even though the data is collected on the first request, the error makes the request repeat until it has failed too many times. Unfortunately, this has an impact on time and cost so I would greatly appreciate some feedback on whether this is a known problem or I am doing something wrong.

My input:

{
  "extendOutputFunction": "async ({ data, item, page, request, customData }) => {\n  return item; \n}",
  "extendScraperFunction": "async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {\n \n}",
  "handlePageTimeoutSecs": 3600,
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "startUrls": [
    {
      "url": "https://www.youtube.com/channel/UCCgVtpDnUeUgOjqKVB3bE_A"
    }
  ],
  "subtitlesLanguage": "en",
  "customData": {},
  "maxComments": 0
}

Each request fails with timeout error

Looks like they changed something on the site. Selector is there, but each request fails with:
TimeoutError: waiting for XPath \"//ytd-video-primary-info-renderer/div/h1/yt-formatted-string\" failed: timeout 30000ms exceeded

Too little results

A general search input like this only gave 29 results and some errors

  "searchKeywords": "makeup ",
  "maxResults": 999,
  "postsFromDate": "6 month ago",
  "startUrl": "https://www.youtube.com/",
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "verboseLog": true
}```

How to go to the next page?

Hi there, I managed to get the actor working and returning 50 results. In the channel we are getting however, there are much more than 50 results. How can we get the actor to step through ALL pages so that it can scrape all videos of the channel, not just the first 50?

stealth mode failing

It seems all attempts to scrape are failing.

Here are the errors:

"ERROR Stealth: StealthError: Failed to apply stealth trick reason: Refused to evaluate a string as JavaScript because 'unsafe-eval' is not an allowed source of script in the following Content Security Policy directive: "script-src 'report-sample' 'nonce-7PyoQsK7StgoaW6nndvU9g' 'unsafe-inline'".
2021-03-12T20:49:52.869Z ",

2021-03-12T20:50:03.949Z ERROR Request https://www.youtube.com/watch?v=BDZ6ujYN610 failed too many times

Old videos are not scraped

{
  "maxResults": 999999,
  "postsFromDate": "20 years",
  "verboseLog": false,
  "startUrls": [
    {
      "url": "https://www.youtube.com/user/dysonteam",
      "method": "GET"
    }
  ],
  "extendOutputFunction": "async ({ data, item, page, request, customData }) => {\n  return item; \n  \"title\"; \"likes\"; \"dislikes\"; \"url\"; \"upload date\"\n}",
  "extendScraperFunction": "async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {\n \n}",
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "customData": {}
}

Did not scrape 7 years old videos from the channel

how to capture the type of subtitles

Hi guys. Thanks for this great tool. We would like a new feature added so that the actor can determine the type of subtitles provided - whether they are user generated, or auto-generated.

This info is available on the youtube video if the actor could click the three dots to reveal the transcripts popup.

See image here: https://ibb.co/dLYcJFv

Can you guys code this for us in the script? We can pay for your work.
Thankyou!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.