Giter VIP home page Giter VIP logo

vaclavrut / actor-amazon-crawler Goto Github PK

View Code? Open in Web Editor NEW
68.0 5.0 33.0 147 KB

Amazon crawler - this configuration will extract items for a keywords that you will specify in the input, and it will automatically extract all pages for the given keyword. You can specify more keywords on the input for one run.

Home Page: https://apify.com

Dockerfile 2.89% JavaScript 97.11%
amazon-crawler extract-items amazon-de amazon-com amazon-extractor apify apify-sdk apify-cli apify-proxy

actor-amazon-crawler's Introduction

Amazon Scraper

Features

This actor will crawl items for specified keywords on Amazon and will automatically extract all pages for those keywords. The scraper then extracts all seller offers for each given keyword, so if there is pagination on the seller offers page, note that you will get all offers.

Find out more about why you should use this scraper for your business and suggestions on how to use the data in this YouTube Video.

Sample result

{
  "title": "Samsung SE450 Series 27 inch FHD 1920x1080 Desktop Monitor for Business, DVI, VGA, DisplayPort, VESA mountable, 3-Year Warranty, TAA (S27E450D)",
  "thumbnailImage": "https://images-na.ssl-images-amazon.com/images/I/51kKM4aZ+WL.jpg",
  "sellers": [
    {
      "price": "$174.99",
      "priceParsed": 174.99,
      "condition": "Used - Like New",
      "sellerName": "Luigi & Co. LLC",
      "prime": true,
      "shippingInfo": "",
      "shopUrl": "www.amazon.com/gp/aag/main/ref=olp_merch_name_1/?seller=AP5WUUVHWNT7",
      "pricePerUnit": null
    },
    {
      "price": "$208.50",
      "priceParsed": 208.5,
      "condition": "New",
      "sellerName": "Scatterlings Store",
      "prime": true,
      "shippingInfo": "",
      "shopUrl": "www.amazon.com/gp/aag/main/ref=olp_merch_name_2/?seller=A42717TRWCXE4",
      "pricePerUnit": null
    }
  ],
  "asin": "B010N07D4W",
  "itemDetailUrl": "https://www.amazon.com/dp/B010N07D4W",
  "sellerOffersUrl": "https://www.amazon.com/gp/offer-listing/B010N07D4W",
  "currency": "USD",
  "itemDetail": {
    "InStock": true,
    "delivery": "Arrives:  July 30 - Aug 14",
    "featureDesc": "About this item\n\n\n\n\n\n\n\n\n\nThis fits your .\n\n\n\n\n\n Make sure this fits\nby entering your model number.\n\n\n\n\n\n\n\n27-inch 16:9 FHD 1920 x 1200 resolution, LED-backlit LCD screen delivers bright, sharp images with a low-glare TN panel and MagicAngle technology providing a comfortable wide-angle viewing experience\n\n\n\n\nVersatile connectivity options including VGA, DVI, and DisplayPort 1.2 inputs\n\n\n\n\nVESA compatibility enables easy mounting to a wall or monitor stand, along with a fully adjustable stand included with height, tilt, swivel, and pivot features\n\n\n\n\nEye Saver Mode and Flicker-Free technology help minimize eye strain during long working hours\n\n\n\n\n3-Year Business Warranty with extended warranties available for purchase, TAA Compliant for Federal Government Customers",
    "desc": "The Samsung S27E450D 27” desktop business monitor offers the ideal balance between value and features for everyday business use. Offering impressive picture quality at an accessible price point, this business desktop monitor excels across a variety of commercial applications. The Full HD 1920 x 1080 LED low-glare TN panel displays a sharp, bright, and beautiful image, while Mega infinity dynamic contrast ratio ensures subtle detail even in lighter and darker areas of the picture. The monitor is also made with up to 30% recycled plastic, and with a low-energy consumption of less than 0.005W in standby, and true 0W in off mode, the S27E450D is ideal for eco-conscious businesses looking to reduce their carbon footprint and save on energy costs. Additionally, users can adjust the monitor for their ideal ergonomic comfort with the fully adjustable stand including height, tilt, swivel and pivot feature, which lets you use the monitor in portrait mode, or mount the monitor on any VESA compatible mount or stand. To top it off, your investment is secured with a 3-year business warranty.",
    "breadCrumbs": "Electronics›Computers & Accessories›Monitors",
    "NumberOfQuestions": 7,
    "reviewsCount": "21 ratings",
    "stars": "3.7",
    "details": {
      "Standing screen display size": "27 Inches",
      "Max Screen Resolution": "1920 x 1080 Pixels",
      "Brand": "Samsung Business",
      "Series": "S27E450D",
      "Item model number": "LS27E45KDHG/GO",
      "Item Weight": "13.9 pounds",
      "Product Dimensions": "25.2 x 8.8 x 15.7 inches",
      "Item Dimensions  L x W x H": "25.2 x 8.8 x 15.7 inches",
      "Color": "Black",
      "Manufacturer": "Samsung",
      "ASIN": "B010N07D4W",
      "Is Discontinued By Manufacturer": "No",
      "Date First Available": "June 30, 2015",
      "Customer Reviews": "3.7 out of 5 stars\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n21 ratings\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n3.7 out of 5 stars",
      "Best Sellers Rank": "#22,010 in Electronics (See Top 100 in Electronics)\n\n\n#648 in Computer Monitors"
    },
    "images": [
      "https://images-na.ssl-images-amazon.com/images/I/91%2BDRhesUGL._AC_SL1500_.jpg",
      "https://images-na.ssl-images-amazon.com/images/I/81P-%2B%2BINuIL._AC_SL1500_.jpg",
      "https://images-na.ssl-images-amazon.com/images/I/91WIergLFwL._AC_SL1500_.jpg",
      "https://images-na.ssl-images-amazon.com/images/I/81Or5w16-DL._AC_SL1500_.jpg",
      "https://images-na.ssl-images-amazon.com/images/I/81SEgFF5GiL._AC_SL1500_.jpg",
      "https://images-na.ssl-images-amazon.com/images/I/91jMSIRM08L._AC_SL1500_.jpg",
      "https://images-na.ssl-images-amazon.com/images/I/61gnj0tv%2BDL._AC_SL1500_.jpg",
      "https://images-na.ssl-images-amazon.com/images/I/31S4T6bcjaL._AC_.jpg"
    ],
    "NumberOfReviews": 13,
    "reviews": [
      {
        "userName": "A. J. Kim",
        "reviewTitle": "Nice replacement for my Samsung 22\"",
        "reviewedIn": "Reviewed in the United States on May 9, 2017",
        "reviewDescription": "There are probably much better resolution monitors out there which is why not giving 5-stars, but I bought this monitor for other reasons. I was looking for 27\" monitors that had DVI connector to replace my Samsung 22\" monitors. Surprising, I was having a hard time finding monitors that had DVI connections, most seem to be HDMI, USB, or DisplayPort. This one has a DVI, D-Sub, and DisplayPort. I didn't want a curved monitor. I didn't want speakers in the monitor if at all possible. It was my intention to mount on a monitor arm, so I needed it to have VESA mounting. I didn't care for much of a stand or any frills it might have on it. One big reason I liked this monitor is its has a AC connection with a real power cord and not a DC connection with a power pack that would be sitting on the floor. I like Samsung products. I have several home electronic devices that are Samsung that have been really nice, quality products. If one of my Samsung 22\" monitors hadn't started to flicker when I powered on, I would have kept using them. I probably had them for more than 10yrs. Since one monitor was starting to go bad, I thought maybe it was time to upgrade along with a computer hardware upgrade I'm also planning for in next couple of months."
      },
      {
        "userName": "Michael R",
        "reviewTitle": "There are better choices",
        "reviewedIn": "Reviewed in the United States on April 25, 2020",
        "reviewDescription": "Thick bezel. Color not vivid. Has to be turned on after the computer No sync  Display menu limited. For the price. I liked my Acer better. 60 Hz refresh.  Many connection options. Upon boot, the monitor has trouble finding the correct connection and cycles through the options several times looking for the connection. It turns itself off after a period of not using instead of standby.  The screen is muted in color."
      },
      {
        "userName": "David White",
        "reviewTitle": "Amazon search sucks",
        "reviewedIn": "Reviewed in the United States on May 17, 2020",
        "reviewDescription": "Monitor is fine.  Problem is that when you search Amazon for \"monitor with speakers\", it should be able to show only those.  This monitor DOES NOT have speakers!  Totally frustrating!!!"
      }
    ]
  }
}

Proxy

The actor needs proxies to function correctly. We don't recommend running it on a free account for more than a sample of results. If you plan to run it for more than a few results, subscribing to the Apify platform will give you access to a large pool of proxies.

Asin crawling

One of the features of the scraper is that it can get price offers for a list of ASINs. If this what you need, you can specify the ASINs in the input along with the combination of countries to get results for.

"asins": [{
      "asin":"B07JG7DS1T",
      "countries":["de","it","es","gb","us","fr","in","ca"]
  }]

With this setup, the scraper will check whether that ASIN is available for all countries and get all seller offers for it.

Direct URLs crawling

If you already have your ASINs and don't want to crawl them manually, you can enqueue the requests from the input.

Here is a sample object to get itemDetail info:

{
    "url": "https://www.amazon.com/dp/B07P6Y8L3F",
    "userData": {
        "label": "detail",
        "keyword": "B07P6Y8L3F",
        "asin": "B07P6Y8L3F",
        "detailUrl": "https://www.amazon.com/dp/B07P6Y8L3F",
        "sellerUrl": "https://www.amazon.com/gp/offer-listing/B07P6Y8L3F"
    }
}

Here is a sample object to get seller info:

{
  "url": "https://www.amazon.de/gp/offer-listing/B07XRR7N5V/",
  "userData": {
      "label": "seller",
      "asin": "B07XRR7N5V",
      "detailUrl": "https://www.amazon.de/dp/B07XRR7N5V/",
      "sellerUrl": "https://www.amazon.de/gp/offer-listing/B07XRR7N5V/",
      "country": "DE"
  }
}

Additional options

maxResults - If you want to limit the number of results to be extracted, set this value with that number of results, otherwise keep it blank or 0. It doesn't work 100% precisely, in that, if you specify five results, it will create more records because of concurrency.

Compute unit consumption

Using raw requests - 0.0884 CU when extracting 20 results from keyword search Using a browser - 0.6025 CU when extracting 20 results from keyword search

Supported countries

You can specify the country where you want to scrape items. We currently support these countries:

If you want us to add another country, please email [email protected]

Changelog

Changes related to new versions are listed in the CHANGELOG file.

actor-amazon-crawler's People

Contributors

adonishi avatar gippy avatar lucie20 avatar meanaverage avatar metalwarrior665 avatar novotnyj avatar pocesar avatar vaclavrut avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

actor-amazon-crawler's Issues

Not able to delivery to US

The dropdown to select delivery country does not have United States. So what if I want to deliver to a specific state in the US?

Add support for input of URLs

To be able to add a link to:

{
 "url":"link",
 "label":"category" // item
}

And we will get all the items for given urls.

Searching for ISBN Books works on site, returns nothing on crawler

Example: https://www.amazon.co.uk/s?k=9780312944926&ref=nb_sb_noss

INPUT:

{
  "scraper": true,
  "country": "UK",
  "category": "stripbooks-intl-ship",
  "searchType": "keywords",
  "search": "9780312944926",
  "proxy": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  },
  "maxReviews": 0,
  "delivery": ""
}

Leads to log entry and 1 dataset item with "No items for this keyword":

2020-08-25T16:03:45.241Z INFO: Found 0 on a site, going to crawl them. URL: https://www.amazon.co.uk/s?k=9780312944926&i=&ref=nb_sb_noss
2020-08-25T16:03:45.447Z INFO: BasicCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down.

Output the input search term

Good day,
I would like to output the search term in the CSV/Excel file when it has been run, so I know which search term resulted in which output results. Is this possible please?
Thanks

Can't set deliveryTo to US City or State

I'm using your crawler and trying to set the deliveryTo inside US, but can't find a way to do this. Can only set to outside of US (e.g. Canada)

How can I do this? Could you please shed some lights?

Can't run crawler on apify

Hi @VaclavRut,

I have a project that uses your crawler in the past. We've decided to stop it few months ago and stop the Apify subscription. Recently we restart the project and try to run your crawler in Apify without subscription, but I keep getting error.

Input

{
  "country": "US",
  "directUrls": [
    {
      "url": "https://www.amazon.com/dp/B07P6Y8L3F",
      "userData": {
        "label": "detail",
        "keyword": "B07P6Y8L3F",
        "asin": "B07P6Y8L3F",
        "detailUrl": "https://www.amazon.com/dp/B07P6Y8L3F",
        "sellerUrl": "https://www.amazon.com/gp/offer-listing/B07P6Y8L3F"
      }
    }
  ],
  "maxResults": 1,
  "proxy": {
    "useApifyProxy": false
  }
}

Output

2020-05-16T05:34:52.631Z ACTOR: Creating Docker container.

2020-05-16T05:34:57.465Z ACTOR: Starting Docker container.
2020-05-16T05:34:58.769Z 
2020-05-16T05:34:58.769Z > [email protected] start /usr/src/app
2020-05-16T05:34:58.770Z > node ./src/main.js
2020-05-16T05:34:58.771Z 
2020-05-16T05:35:00.372Z INFO: System info {"apifyVersion":"0.19.1","apifyClientVersion":"0.5.26","osType":"Linux","nodeVersion":"v12.16.1"}
2020-05-16T05:35:00.373Z WARNING: You are using an outdated version (0.19.1) of Apify SDK. We recommend you to update to the latest version (0.20.4).
2020-05-16T05:35:00.374Z          Read more about Apify SDK versioning at: https://help.apify.com/en/articles/3184510-updates-and-versioning-of-apify-sdk
2020-05-16T05:35:00.491Z INFO: Going to enqueue 1 requests from input.
2020-05-16T05:35:00.492Z https://www.amazon.com/dp/B07P6Y8L3F
2020-05-16T05:35:02.277Z INFO: AutoscaledPool state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}}
2020-05-16T05:35:02.392Z ERROR: BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.amazon.com/dp/B07P6Y8L3F","retryCount":1,"id":"CxCejBi58nyfOEt"}
2020-05-16T05:35:02.393Z   Error: Request for https://www.amazon.com/dp/B07P6Y8L3F aborted due to abortFunction
2020-05-16T05:35:02.393Z     at DuplexWrapper.<anonymous> (/usr/src/app/node_modules/@apify/http-request/src/index.js:167:25)
2020-05-16T05:35:02.394Z     at DuplexWrapper.emit (events.js:311:20)
2020-05-16T05:35:02.394Z     at EventEmitter.<anonymous> (/usr/src/app/node_modules/got/source/as-stream.js:60:9)
2020-05-16T05:35:02.395Z     at EventEmitter.emit (events.js:311:20)
2020-05-16T05:35:02.395Z     at module.exports (/usr/src/app/node_modules/got/source/get-response.js:22:10)
2020-05-16T05:35:02.396Z     at ClientRequest.handleResponse (/usr/src/app/node_modules/got/source/request-as-event-emitter.js:155:5)
2020-05-16T05:35:02.396Z     at Object.onceWrapper (events.js:418:26)
2020-05-16T05:35:02.397Z     at ClientRequest.emit (events.js:323:22)
2020-05-16T05:35:02.397Z     at ClientRequest.origin.emit (/usr/src/app/node_modules/@szmarczak/http-timer/source/index.js:37:11)
2020-05-16T05:35:02.398Z     at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:603:27)
2020-05-16T05:35:02.398Z     at HTTPParser.parserOnHeadersComplete (_http_common.js:119:17)
2020-05-16T05:35:02.398Z     at Socket.socketOnData (_http_client.js:476:22)
2020-05-16T05:35:02.399Z     at Socket.emit (events.js:311:20)
2020-05-16T05:35:02.399Z     at Socket.Readable.read (_stream_readable.js:512:10)
2020-05-16T05:35:02.400Z     at Socket.read (net.js:618:39)
2020-05-16T05:35:02.401Z     at flow (_stream_readable.js:989:34)
2020-05-16T05:35:02.401Z     at resume_ (_stream_readable.js:970:3)
2020-05-16T05:35:02.402Z     at processTicksAndRejections (internal/process/task_queues.js:84:21)
2020-05-16T05:35:05.483Z ERROR: BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.amazon.com/dp/B07P6Y8L3F","retryCount":2,"id":"CxCejBi58nyfOEt"}
2020-05-16T05:35:05.484Z   Error: Request for https://www.amazon.com/dp/B07P6Y8L3F aborted due to abortFunction
2020-05-16T05:35:05.484Z     at DuplexWrapper.<anonymous> (/usr/src/app/node_modules/@apify/http-request/src/index.js:167:25)
2020-05-16T05:35:05.485Z     at DuplexWrapper.emit (events.js:311:20)
2020-05-16T05:35:05.485Z     at EventEmitter.<anonymous> (/usr/src/app/node_modules/got/source/as-stream.js:60:9)
2020-05-16T05:35:05.486Z     at EventEmitter.emit (events.js:311:20)
2020-05-16T05:35:05.494Z     at module.exports (/usr/src/app/node_modules/got/source/get-response.js:22:10)
2020-05-16T05:35:05.494Z     at ClientRequest.handleResponse (/usr/src/app/node_modules/got/source/request-as-event-emitter.js:155:5)
2020-05-16T05:35:05.495Z     at Object.onceWrapper (events.js:418:26)
2020-05-16T05:35:05.495Z     at ClientRequest.emit (events.js:323:22)
2020-05-16T05:35:05.496Z     at ClientRequest.origin.emit (/usr/src/app/node_modules/@szmarczak/http-timer/source/index.js:37:11)
2020-05-16T05:35:05.496Z     at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:603:27)
2020-05-16T05:35:05.496Z     at HTTPParser.parserOnHeadersComplete (_http_common.js:119:17)
2020-05-16T05:35:05.497Z     at Socket.socketOnData (_http_client.js:476:22)
2020-05-16T05:35:05.501Z     at Socket.emit (events.js:311:20)
2020-05-16T05:35:05.501Z     at Socket.Readable.read (_stream_readable.js:512:10)
2020-05-16T05:35:05.502Z     at Socket.read (net.js:618:39)
2020-05-16T05:35:05.502Z     at flow (_stream_readable.js:989:34)
2020-05-16T05:35:05.503Z     at resume_ (_stream_readable.js:970:3)
2020-05-16T05:35:05.504Z     at processTicksAndRejections (internal/process/task_queues.js:84:21)
2020-05-16T05:35:08.654Z ERROR: BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.amazon.com/dp/B07P6Y8L3F","retryCount":3,"id":"CxCejBi58nyfOEt"}
2020-05-16T05:35:08.656Z   Error: Request for https://www.amazon.com/dp/B07P6Y8L3F aborted due to abortFunction
2020-05-16T05:35:08.657Z     at DuplexWrapper.<anonymous> (/usr/src/app/node_modules/@apify/http-request/src/index.js:167:25)
2020-05-16T05:35:08.658Z     at DuplexWrapper.emit (events.js:311:20)
2020-05-16T05:35:08.659Z     at EventEmitter.<anonymous> (/usr/src/app/node_modules/got/source/as-stream.js:60:9)
2020-05-16T05:35:08.660Z     at EventEmitter.emit (events.js:311:20)
2020-05-16T05:35:08.660Z     at module.exports (/usr/src/app/node_modules/got/source/get-response.js:22:10)
2020-05-16T05:35:08.661Z     at ClientRequest.handleResponse (/usr/src/app/node_modules/got/source/request-as-event-emitter.js:155:5)
2020-05-16T05:35:08.662Z     at Object.onceWrapper (events.js:418:26)
2020-05-16T05:35:08.663Z     at ClientRequest.emit (events.js:323:22)
2020-05-16T05:35:08.663Z     at ClientRequest.origin.emit (/usr/src/app/node_modules/@szmarczak/http-timer/source/index.js:37:11)
2020-05-16T05:35:08.664Z     at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:603:27)
2020-05-16T05:35:08.664Z     at HTTPParser.parserOnHeadersComplete (_http_common.js:119:17)
2020-05-16T05:35:08.666Z     at Socket.socketOnData (_http_client.js:476:22)
2020-05-16T05:35:08.666Z     at Socket.emit (events.js:311:20)
2020-05-16T05:35:08.667Z     at Socket.Readable.read (_stream_readable.js:512:10)
2020-05-16T05:35:08.667Z     at Socket.read (net.js:618:39)
2020-05-16T05:35:08.668Z     at flow (_stream_readable.js:989:34)
2020-05-16T05:35:08.668Z     at resume_ (_stream_readable.js:970:3)
2020-05-16T05:35:08.669Z     at processTicksAndRejections (internal/process/task_queues.js:84:21)
2020-05-16T05:35:11.922Z INFO: Request https://www.amazon.com/dp/B07P6Y8L3F failed 4 times
2020-05-16T05:35:11.924Z ERROR: BasicCrawler: runTaskFunction error handler threw an exception. This places the crawler and its underlying storages into an unknown state and crawling will be terminated. This may have happened due to an internal error of Apify's API or due to a misconfigured crawler. If you are sure that there is no error in your code, selecting "Restart on error" in the actor's settingswill make sure that the run continues where it left off, if programmed to handle restarts correctly.
2020-05-16T05:35:11.924Z   ReferenceError: $ is not defined
2020-05-16T05:35:11.926Z     at BasicCrawler.handleFailedRequestFunction (/usr/src/app/src/main.js:161:46)
2020-05-16T05:35:11.926Z     at BasicCrawler._requestFunctionErrorHandler (/usr/src/app/node_modules/apify/build/crawlers/basic_crawler.js:475:17)
2020-05-16T05:35:11.927Z     at processTicksAndRejections (internal/process/task_queues.js:97:5)
2020-05-16T05:35:11.928Z     at async BasicCrawler._runTaskFunction (/usr/src/app/node_modules/apify/build/crawlers/basic_crawler.js:413:9)
2020-05-16T05:35:11.928Z     at async AutoscaledPool._maybeRunTask (/usr/src/app/node_modules/apify/build/autoscaling/autoscaled_pool.js:463:7)
2020-05-16T05:35:11.929Z ERROR: AutoscaledPool: runTaskFunction failed.
2020-05-16T05:35:11.930Z   ReferenceError: $ is not defined
2020-05-16T05:35:11.931Z     at BasicCrawler.handleFailedRequestFunction (/usr/src/app/src/main.js:161:46)
2020-05-16T05:35:11.931Z     at BasicCrawler._requestFunctionErrorHandler (/usr/src/app/node_modules/apify/build/crawlers/basic_crawler.js:475:17)
2020-05-16T05:35:11.932Z     at processTicksAndRejections (internal/process/task_queues.js:97:5)
2020-05-16T05:35:11.933Z     at async BasicCrawler._runTaskFunction (/usr/src/app/node_modules/apify/build/crawlers/basic_crawler.js:413:9)
2020-05-16T05:35:11.947Z     at async AutoscaledPool._maybeRunTask (/usr/src/app/node_modules/apify/build/autoscaling/autoscaled_pool.js:463:7)
2020-05-16T05:35:11.948Z INFO: Crawler final request statistics: {"avgDurationMillis":null,"perMinute":0,"finished":0,"failed":1,"retryHistogram":[null,null,null,1]}
2020-05-16T05:35:11.949Z ERROR: The function passed to Apify.main() threw an exception:
2020-05-16T05:35:11.950Z   ReferenceError: $ is not defined
2020-05-16T05:35:11.950Z     at BasicCrawler.handleFailedRequestFunction (/usr/src/app/src/main.js:161:46)
2020-05-16T05:35:11.951Z     at BasicCrawler._requestFunctionErrorHandler (/usr/src/app/node_modules/apify/build/crawlers/basic_crawler.js:475:17)
2020-05-16T05:35:11.952Z     at processTicksAndRejections (internal/process/task_queues.js:97:5)
2020-05-16T05:35:11.952Z     at async BasicCrawler._runTaskFunction (/usr/src/app/node_modules/apify/build/crawlers/basic_crawler.js:413:9)
2020-05-16T05:35:11.953Z     at async AutoscaledPool._maybeRunTask (/usr/src/app/node_modules/apify/build/autoscaling/autoscaled_pool.js:463:7)
2020-05-16T05:35:11.953Z npm ERR! code ELIFECYCLE
2020-05-16T05:35:11.954Z npm ERR! errno 91
2020-05-16T05:35:11.955Z npm ERR! [email protected] start: `node ./src/main.js`
2020-05-16T05:35:11.955Z npm ERR! Exit status 91
2020-05-16T05:35:11.956Z npm ERR! 
2020-05-16T05:35:11.956Z npm ERR! Failed at the [email protected] start script.
2020-05-16T05:35:11.957Z npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
2020-05-16T05:35:11.957Z 
2020-05-16T05:35:11.958Z npm ERR! A complete log of this run can be found in:
2020-05-16T05:35:11.959Z npm ERR!     /root/.npm/_logs/2020-05-16T05_35_11_942Z-debug.log

Could you help us to give guidance on the issue, is it something to do with the crawler, or it is something to do with our Apify subscription.

Add optional Puppeteer

Unless we solve the larger blocking of Cheerio, it would be good to add Puppeteer as an option as maybe even default.

Issues with Crawling Video Game items, Books

There are issues for specific items (Im guessing due to particular presentations) for video games and books.
Examples on amazon.fr :

Item price

There's no way to retrieve items price, there's only a price for "other sellers" prices, but not for the obtained image, neither prime, is there a way for getting this data?

This ASIN is not available for this country.

Hello!

I keep getting this output in fetching sellers data (price, etc.):
"This ASIN is not available for this country." In "STATUS" column

Input are:

{ "scraper": true, "country": "IT", "category": "aps", "searchType": "asins", "search": "B06Y45ZZDN,B00E4KXRTS,B073QT6S2Z,B07M6718BV,B00YBYMQN0,B07MNZ9DSH,B074N258GS,B00ID52G2Y,B00ID50NOM,B00E4KYCZ6,B07MDK4GHJ,B00E4KYNIW,B07C9BDQWV,B07JXNCRCK,B07HR21HSG,B07HQYP688,B07HR1GJS9,B07HR19T7Y,B07HR7R8NN,B07HQYCP9Y,B01CQLIW78,B015GYQQCK,B015GYQZMQ,B015GYR854,B015GYRFJS,B015GYRC28,B015GYRM82,B015GYQUDK,B015RMV8BU,B015GYQM06,B015RMVAYU,B015GYRIHM,B015GYRQCO,B015GYQFQW,B01GHP57S6,B0746SCP9H,B0725QRFNW,B0711SGZP4,B0751M89P7,B0746P8ZZ7,B074N2PHP9,B07CS44BML,B07CS444QJ,B07CS6L2MH,B07CS351D2,B07CS7J5LY,B07CS444QK,B07HD3BBF7,B07HD7BN26,B07XNNQZ3Z,B07XJKSL21,B07XNM7BQT,B07XNHCHXY,B07Y3G7YL2,B07DCZ6XY7,B073HGNMH2,B073HBNC2D,B073H9NVTH,B073H8VF7V,B073HBZRZ4,B073H9TC5V,B073H9TBPJ,B073HBZRX3,B00X9XKHDU,B00VP2SSFG,B016C2APSG,B01E3ETMUW,B07DF3MLH6,B00KLXFUW4,B00NPXEEC4,B07BSSK2D9,B07C9PY9GC,B07BSRJCGH,B07MWYQDDZ,B07SSBLH13,B07SSBKVTQ,B07SSBK3D8,B07SM7VV68,B07SQWM1FM,B074Y8LM6T,B07VB2KH24,B00WZRJSPY,B00WZRP6K0,B00WZRMTPA,B079YSXCD8,B00WZRQNAC,B017OF39W4,B00WZQTCS8,B00WZRY1M4,B01IW02HX2,B00B4YVU4G,B07VF83LS2,B00VF9Z7OE", "proxy": { "useApifyProxy": false }, "delivery": "IT,GLUXCountryList_112", "maxReviews": 0

Thanks in advance!

Huge issue with price

Since the last update 20 days ago price and priceparsed are not retrieved anymore which make the crawler useless !!
Please correct ASAP

Visit detail of the product

if you need this feature, let me know what details you would need, like....

  • dimension of the product,
  • all image,
  • description,
    ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.