Giter VIP home page Giter VIP logo

tripadvisor-scraper's People

Contributors

apify-alexey avatar davidlukacapify avatar dtrungtin avatar gustavotr avatar lhotanok avatar maxcopell avatar metalwarrior665 avatar olehveselov92 avatar petrpatek avatar pocesar avatar theovasilis avatar zpelechova avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tripadvisor-scraper's Issues

Cannot scrape restaurants anymore

Hi,

I've been using the tripadvisor scraper for restaurants and would always uncheck hotels and didn't need to enter a check in date. Since the last time I used it I am required to enter a check in date even though I don't need the hotel data. Whichever city I try to run I get the same error: "The main function of the actor threw an exception."

Not very technical so hope my explanation makes sense

Unable to get application to start. TypeError: Cannot destructure property 'locationFullName' of 'input' as it is null.

When I run node index in powershell, I get the following error. How/Where do I set "locationFullName" so that the app runs?

INFO System info {"apifyVersion":"0.20.3","apifyClientVersion":"0.6.0","osType":"Windows_NT","nodeVersion":"v12.16.3"}
WARN Neither APIFY_LOCAL_STORAGE_DIR nor APIFY_TOKEN environment variable is set, defaulting to APIFY_LOCAL_STORAGE_DIR="C:\Users\chris\documents\Scraper\apify_storage"
ERROR The function passed to Apify.main() threw an exception:
TypeError: Cannot destructure property 'locationFullName' of 'input' as it is null.
at validateInput (C:\Users\chris\documents\Scraper\src\tools\general.js:208:9)
at C:\Users\chris\documents\Scraper\src\main.js:35:5
at async run (C:\Users\chris\documents\Scraper\node_modules\apify\build\actor.js:238:13)
PS C:\Users\chris\documents\Scraper>

Only able to get 20 reviews

When scraping the data of a restaurant I'm only able to get 20 reviews

When looking through the code I found that it gets reviews 20 at the time with a while true I belive there is a issue with the exit condition where it always exit on the first iteration of the loop

https://github.com/maxCopell/tripadvisor-scraper/blob/master/src/tools/general.js

 while (true) {

    //...
    //...

    if (reviews.length < limit || result.length >= maxReviews || shouldSlice) break;
}

My input

{
  "locationId": "4879161",
  "includeRestaurants": true,
  "includeAttractions": false,
  "includeHotels": false,
  "includeReviews": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "maxReviews": 0,
  "maxItems": 1,
  "language": "en",
  "currency": "CAD",
  "debugLog": true,
  "checkInDate": "",
  "includeTags": false
}

Default run is missing

I would very much appreciate if when I open a task could just run it to see if the actor is working.

API returns Status code 400 - Client key not set

The Tripadvisor scraper makes calls to an internal Tripadvisor API like

https://api.tripadvisor.com/api/internal/1.14/location/187275/hotels?currency=USD&lang=en&limit=1

This call returns now an "Error: Status code 400" in the logs, and if called directly in the browser it shows an "UnauthorizedException" error

{
   errors: [{
      type: "UnauthorizedException",
      message: "client key not set",
      code: "160"
   }]
}

Apparently this breaks the scraper 😞

Hotel prices not returned

Thanks for making this to make scraping TripAdvisor so much easier! The API docs show a sample output for hotels containing an array of prices from various providers. However, an empty prices array is always returned for me. Even the example run does not contain that data.

I noticed that the call to getPlacePrices has been commented out. Are there plans to enable support for retrieving hotel prices?

// placePrices = await getPlacePrices(id, randomDelay);

Error with hotelId or restaurantId

``I keep getting this error when trying to fetch data from a single hotel or restaurant.

2021-12-16T21:08:43.851Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com","retryCount":2,"id":"nMLwWw9hmTQOlju"} 2021-12-16T21:08:43.854Z TypeError: Cannot destructure property 'location_id' of 'placeInfo' as it is undefined. 2021-12-16T21:08:43.856Z at processHotel (/usr/src/app/src/tools/hotel-tools.js:20:26)

Json template i use ($val are replaced):

{ "maxItems": 1, "includeRestaurants": true, "includeHotels": false, "includeAttractions": false, "includeTags": false, "includeReviews": true, "maxReviews": $max, "lastReviewDate": "2018-01-01", "locationId": "$id", "restaurantId": "$url", "language": "en", "currency": "USD", "proxyConfiguration": { "useApifyProxy": true }, "debugLog": false }

The scraper API calls keep returning 400 error.

Is this scraper no longer maintained? It relies on the tripadvisor API which is forbidden without a proper token.

Here are the logs from APIFY

2022-04-09T14:25:21.681Z ACTOR: Pulling Docker image from repository.
2022-04-09T14:25:22.895Z ACTOR: Creating Docker container.
2022-04-09T14:25:22.926Z ACTOR: Starting Docker container.
2022-04-09T14:25:26.230Z INFO  System info {"apifyVersion":"2.2.2","apifyClientVersion":"2.2.0","osType":"Linux","nodeVersion":"v16.14.0"}
2022-04-09T14:25:26.335Z INFO  Input validation OK
2022-04-09T14:25:26.670Z INFO  BasicCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}}
2022-04-09T14:25:28.282Z WARN  Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n    at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n    at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n    at Object.onceWrapper (node:events:640:26)\n    at ClientRequest.emit (node:events:520:28)\n    at Socket.socketOnData (node:_http_client:522:11)\n    at Socket.emit (node:events:520:28)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n    at TCP.callbackTra... [line-too-long]
2022-04-09T14:25:28.294Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":1,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:25:28.298Z   RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:25:28.300Z       at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:25:28.302Z       at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:25:28.304Z       at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:25:28.306Z       at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:25:28.307Z       at Object.onceWrapper (node:events:640:26)
2022-04-09T14:25:28.309Z       at ClientRequest.emit (node:events:520:28)
2022-04-09T14:25:28.311Z       at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:25:28.313Z       at Socket.emit (node:events:520:28)
2022-04-09T14:25:28.315Z       at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:25:28.317Z       at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:25:28.319Z       at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:25:28.321Z       at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:25:28.323Z       at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:25:31.561Z WARN  Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n    at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n    at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n    at Object.onceWrapper (node:events:640:26)\n    at ClientRequest.emit (node:events:520:28)\n    at Socket.socketOnData (node:_http_client:522:11)\n    at Socket.emit (node:events:520:28)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n    at TCP.callbackTra... [line-too-long]
2022-04-09T14:25:31.577Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":2,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:25:31.580Z   RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:25:31.582Z       at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:25:31.584Z       at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:25:31.587Z       at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:25:31.589Z       at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:25:31.591Z       at Object.onceWrapper (node:events:640:26)
2022-04-09T14:25:31.593Z       at ClientRequest.emit (node:events:520:28)
2022-04-09T14:25:31.595Z       at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:25:31.597Z       at Socket.emit (node:events:520:28)
2022-04-09T14:25:31.599Z       at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:25:31.601Z       at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:25:31.604Z       at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:25:31.606Z       at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:25:31.608Z       at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:25:34.705Z WARN  Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n    at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n    at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n    at Object.onceWrapper (node:events:640:26)\n    at ClientRequest.emit (node:events:520:28)\n    at Socket.socketOnData (node:_http_client:522:11)\n    at Socket.emit (node:events:520:28)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n    at TCP.callbackTra... [line-too-long]
2022-04-09T14:25:34.714Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":3,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:25:34.716Z   RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:25:34.718Z       at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:25:34.720Z       at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:25:34.722Z       at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:25:34.724Z       at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:25:34.726Z       at Object.onceWrapper (node:events:640:26)
2022-04-09T14:25:34.728Z       at ClientRequest.emit (node:events:520:28)
2022-04-09T14:25:34.730Z       at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:25:34.732Z       at Socket.emit (node:events:520:28)
2022-04-09T14:25:34.734Z       at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:25:34.736Z       at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:25:34.739Z       at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:25:34.741Z       at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:25:34.743Z       at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:25:37.877Z WARN  Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n    at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n    at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n    at Object.onceWrapper (node:events:640:26)\n    at ClientRequest.emit (node:events:520:28)\n    at Socket.socketOnData (node:_http_client:522:11)\n    at Socket.emit (node:events:520:28)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n    at TCP.callbackTra... [line-too-long]
2022-04-09T14:25:37.890Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":4,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:25:37.892Z   RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:25:37.895Z       at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:25:37.899Z       at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:25:37.901Z       at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:25:37.904Z       at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:25:37.906Z       at Object.onceWrapper (node:events:640:26)
2022-04-09T14:25:37.909Z       at ClientRequest.emit (node:events:520:28)
2022-04-09T14:25:37.913Z       at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:25:37.915Z       at Socket.emit (node:events:520:28)
2022-04-09T14:25:37.918Z       at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:25:37.921Z       at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:25:37.923Z       at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:25:37.925Z       at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:25:37.927Z       at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:12.037Z WARN  Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n    at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n    at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n    at Object.onceWrapper (node:events:640:26)\n    at ClientRequest.emit (node:events:520:28)\n    at Socket.socketOnData (node:_http_client:522:11)\n    at Socket.emit (node:events:520:28)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n    at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:12.046Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":5,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:26:12.048Z   RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:26:12.050Z       at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:26:12.052Z       at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:26:12.054Z       at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:26:12.056Z       at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:26:12.058Z       at Object.onceWrapper (node:events:640:26)
2022-04-09T14:26:12.060Z       at ClientRequest.emit (node:events:520:28)
2022-04-09T14:26:12.062Z       at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:26:12.065Z       at Socket.emit (node:events:520:28)
2022-04-09T14:26:12.067Z       at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:26:12.070Z       at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:26:12.072Z       at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:26:12.075Z       at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:26:12.077Z       at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:26.675Z INFO  BasicCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}}
2022-04-09T14:26:26.723Z INFO  Statistics: BasicCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":60111,"retryHistogram":[]}
2022-04-09T14:26:34.587Z WARN  Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n    at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n    at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n    at Object.onceWrapper (node:events:640:26)\n    at ClientRequest.emit (node:events:520:28)\n    at Socket.socketOnData (node:_http_client:522:11)\n    at Socket.emit (node:events:520:28)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n    at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:34.594Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":6,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:26:34.597Z   RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:26:34.598Z       at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:26:34.600Z       at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:26:34.602Z       at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:26:34.604Z       at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:26:34.606Z       at Object.onceWrapper (node:events:640:26)
2022-04-09T14:26:34.608Z       at ClientRequest.emit (node:events:520:28)
2022-04-09T14:26:34.611Z       at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:26:34.613Z       at Socket.emit (node:events:520:28)
2022-04-09T14:26:34.615Z       at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:26:34.617Z       at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:26:34.618Z       at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:26:34.620Z       at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:26:34.622Z       at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:37.924Z WARN  Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n    at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n    at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n    at Object.onceWrapper (node:events:640:26)\n    at ClientRequest.emit (node:events:520:28)\n    at Socket.socketOnData (node:_http_client:522:11)\n    at Socket.emit (node:events:520:28)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n    at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:37.935Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":7,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:26:37.940Z   RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:26:37.943Z       at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:26:37.945Z       at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:26:37.947Z       at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:26:37.949Z       at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:26:37.952Z       at Object.onceWrapper (node:events:640:26)
2022-04-09T14:26:37.954Z       at ClientRequest.emit (node:events:520:28)
2022-04-09T14:26:37.956Z       at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:26:37.959Z       at Socket.emit (node:events:520:28)
2022-04-09T14:26:37.961Z       at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:26:37.963Z       at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:26:37.966Z       at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:26:37.968Z       at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:26:37.970Z       at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:41.050Z WARN  Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n    at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n    at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n    at Object.onceWrapper (node:events:640:26)\n    at ClientRequest.emit (node:events:520:28)\n    at Socket.socketOnData (node:_http_client:522:11)\n    at Socket.emit (node:events:520:28)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n    at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:41.057Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":8,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:26:41.060Z   RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:26:41.062Z       at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:26:41.064Z       at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:26:41.066Z       at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:26:41.068Z       at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:26:41.070Z       at Object.onceWrapper (node:events:640:26)
2022-04-09T14:26:41.072Z       at ClientRequest.emit (node:events:520:28)
2022-04-09T14:26:41.074Z       at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:26:41.076Z       at Socket.emit (node:events:520:28)
2022-04-09T14:26:41.078Z       at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:26:41.080Z       at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:26:41.082Z       at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:26:41.084Z       at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:26:41.086Z       at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:44.198Z WARN  Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n    at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n    at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n    at Object.onceWrapper (node:events:640:26)\n    at ClientRequest.emit (node:events:520:28)\n    at Socket.socketOnData (node:_http_client:522:11)\n    at Socket.emit (node:events:520:28)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n    at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:44.206Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":9,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:26:44.208Z   RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:26:44.210Z       at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:26:44.212Z       at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:26:44.214Z       at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:26:44.216Z       at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:26:44.218Z       at Object.onceWrapper (node:events:640:26)
2022-04-09T14:26:44.220Z       at ClientRequest.emit (node:events:520:28)
2022-04-09T14:26:44.222Z       at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:26:44.225Z       at Socket.emit (node:events:520:28)
2022-04-09T14:26:44.227Z       at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:26:44.229Z       at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:26:44.231Z       at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:26:44.233Z       at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:26:44.235Z       at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:47.414Z WARN  Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n    at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n    at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n    at Object.onceWrapper (node:events:640:26)\n    at ClientRequest.emit (node:events:520:28)\n    at Socket.socketOnData (node:_http_client:522:11)\n    at Socket.emit (node:events:520:28)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n    at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:47.423Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":10,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:26:47.426Z   RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:26:47.428Z       at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:26:47.430Z       at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:26:47.433Z       at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:26:47.443Z       at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:26:47.446Z       at Object.onceWrapper (node:events:640:26)
2022-04-09T14:26:47.448Z       at ClientRequest.emit (node:events:520:28)
2022-04-09T14:26:47.450Z       at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:26:47.452Z       at Socket.emit (node:events:520:28)
2022-04-09T14:26:47.454Z       at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:26:47.456Z       at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:26:47.457Z       at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:26:47.459Z       at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:26:47.461Z       at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:50.973Z WARN  Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n    at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n    at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n    at Object.onceWrapper (node:events:640:26)\n    at ClientRequest.emit (node:events:520:28)\n    at Socket.socketOnData (node:_http_client:522:11)\n    at Socket.emit (node:events:520:28)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n    at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:51.009Z INFO  Request https://www.tripadvisor.com/ failed too many times
2022-04-09T14:26:51.152Z INFO  BasicCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down.
2022-04-09T14:26:51.304Z INFO  BasicCrawler: Final request statistics: {"requestsFinished":0,"requestsFailed":1,"retryHistogram":[null,null,null,null,null,null,null,null,null,null,1],"requestAvgFailedDurationMillis":35,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":35,"requestsTotal":1,"crawlerRuntimeMillis":84691}
2022-04-09T14:26:51.307Z INFO  Requests failed: 1
2022-04-09T14:26:51.309Z INFO  Crawler finished.

Duplicate and missing values for Singapore tripadvisor scrape

Hi, I recently downloaded Singapore hotel data, and found that there were missing hotels and some were duplicated. Not sure why this is so but would appreciate any help. I screenshot a picture of the data when I was going through it using Power BI. I'm not a developer but my developer gave feedback and I went to explore it on my own. I'm thinking of signing up for Apify but want to make sure that the data I get is usable without much cleaning. Thanks!
2021-10-13 21 24 58

Another screenshot where a hotel was recorded 6 times.

2021-10-13 22 31 16

handleRequestFunction failed

Hi there,
first of all congrats on the amazing work you've done so far.
When searching for restaurants, I'm suddenly encountering an issue which doesn't allow the scraper to return a lot of data.
I tried with different cities as input but still this keeps on happening.

Here's the error details:

ERROR: BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue TypeError: Cannot read property 'replace' of null at getSecurityToken (/usr/src/app/src/tools/general.js:33:26) at getClient (/usr/src/app/src/tools/general.js:197:31)

Thanks in advance

Restaurants and Hotels database request

Hi Maximillian.

I work on a wholesale food distributor in Rio de Janeiro, Brazil and i am interested in a database of all restaurants, markets and hotels in my state.
I saw your profile in github and looked that you have expertise in tripadvisor and google scraper. I have tried your scraper in apify but it doesnt work very well.

How much do your charge for an excel sheet with all restaurants and hotels in Rio de Janeiro State, Brazil?

Best,
Lucas Saúde.

Change .tld

Hey, thanks for this great scraper! Is there a way to change the top level domain (to get it in the needed language) ?

Greetings

Scraping of Attractions Reviews is not working

Hi
Running a vanilla query for scrapping attraction reviews on a specific location gets an error of type "Could not get reviews for attraction xyz due to session.getCookieString is not a function".
The attractions of the location are correctly identified but the reviews are not retrieved.

Can you please fix this?

Also hotels and restaurants reviews scrapping works perfectly

My JSON settings:
{
"locationFullName": "Kabul",
"locationId": "660089",
"lastReviewDate": "2010-01-01",
"includeRestaurants": false,
"includeAttractions": true,
"includeHotels": false,
"includeReviews": true,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Log summary:
2021-05-31T09:45:07.452Z ACTOR: Pulling Docker image from repository.
2021-05-31T09:45:07.558Z ACTOR: Creating Docker container.
2021-05-31T09:45:07.656Z ACTOR: Starting Docker container.
2021-05-31T09:45:11.066Z INFO System info {"apifyVersion":"0.20.3","apifyClientVersion":"0.6.0","osType":"Linux","nodeVersion":"v12.18.3"}
2021-05-31T09:45:11.089Z WARN You are using an outdated version (0.20.3) of Apify SDK. We recommend you to update to the latest version (1.1.2).
.....
2021-05-31T09:45:11.133Z INFO Input validation OK
2021-05-31T09:45:11.148Z INFO Processing locationId: 660089
...
2021-05-31T09:45:17.125Z INFO Found 20 attractions
2021-05-31T09:45:17.126Z INFO Processing detail for Babur Tomb attraction
.....
2021-05-31T09:45:17.169Z INFO Processing detail for Bibi Mahroo Hill attraction
2021-05-31T09:45:17.170Z ERROR Could not get reviews for attraction Babur Tomb due to session.getCookieString is not a function
...
2021-05-31T09:45:17.176Z ERROR Could not get reviews for attraction Bibi Mahroo Hill due to session.getCookieString is not a function
2021-05-31T09:45:17.550Z ERROR Could not process attraction... Data item at index 0 is not serializable to JSON.
2021-05-31T09:45:17.581Z Cause: Parameter "item" of type Object must be provided
2021-05-31T09:45:17.754Z INFO BasicCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down.
2021-05-31T09:45:17.859Z INFO Crawler final request statistics: {"avgDurationMillis":1560,"perMinute":34,"finished":1,"failed":0,"retryHistogram":[1]}
2021-05-31T09:45:17.860Z INFO Requests failed: 0
2021-05-31T09:45:17.861Z INFO Crawler finished.
nZWGLC2Ua16iDl4vz (1).log

It's great, but could the following Hotel data be added to the scrape?

I'm just trying out the free scraper, mostly it's working great, but I've noticed the following while scraping Hotel Monge, Paris

  1. The summary text is missing i.e. "Located in the heart of the fifth arrondissement of Paris, between the Jardin des Plantes and Notre Dame Cathedral..." etc
  2. The 'Tripadvisor Best of the Best award 2022' is not included in the awards array in the json, i.e. the awards array is empty.
  3. The amenities aren't included in the json, e.g. parking, WiFi, 24-hour front desk, etc, i.e. the amenities array in the json is empty.
  4. It would also be useful to have the room types in the json too, this isn't listed currently

Thanks.

Actor sometimes hangs on

Run with this input hanged for 6 minutes with 5 requests still in the queue.

{
  "lastReviewDate": "2019-06-06",
  "locationFullName": "Como",
  "includeRestaurants": true,
  "includeAttractions": false,
  "includeHotels": false,
  "includeReviews": false,
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "locationId": ""
}

Traveler Rating

Hi Max,

The data I'm looking for is the review counts by Traveler Rating. I want to be able to scrape this data weekly so I can track the change in rating.

  • Excellent 359
  • Very Good 273
  • Average 50
  • Poor 20
  • Terrible 18

Unable to run the scraper : Apify.main() threw an exception

Hello,
I am trying to run the script on Apitfy plateform. I got this error.

2020-09-13T07:44:10.864Z ERROR The function passed to Apify.main() threw an exception: 2020-09-13T07:44:10.866Z TypeError: Cannot read property 'data' of undefined 2020-09-13T07:44:10.867Z at getLocationId (/usr/src/app/src/tools/api.js:58:29) 2020-09-13T07:44:10.869Z at processTicksAndRejections (internal/process/task_queues.js:97:5)
No proxy is used, I selected a city and a date as input.
Thank you.
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.