Giter VIP home page Giter VIP logo

amazon-scraper-python's Introduction

amazon-scraper-python

Travis Coveralls github PyPI Docker Build Status License

Description

This package allows you to search for products on Amazon and extract some useful information (ratings, number of comments).

I wrote a French blog post about it here

Requirements

  • Python 3
  • pip3

Installation

pip3 install -U amazonscraper

Command line tool amazon2csv.py

After the package installation, you can use the amazon2csv.py command in the terminal.

After passing a search request to the command (and an optional maximum number of products), it will return the results as csv :

amazon2csv.py --keywords="Python programming" --maxproductnb=2
Product title,Rating,Number of customer reviews,Product URL,Image URL,ASIN
"Python Crash Course: A Hands-On, Project-Based Introduction to Programming",4.5,370,https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036,https://images-na.ssl-images-amazon.com/images/I/51F48HFHq6L.jpg,1593276036
"A Smarter Way to Learn Python: Learn it faster. Remember it longer.",4.7,384,https://www.amazon.com/Smarter-Way-Learn-Python-Remember-ebook/dp/B077Z55G3B,https://images-na.ssl-images-amazon.com/images/I/51fNZfTUPXL.jpg,B077Z55G3

You can also pass a search url (if you added complex filters for example), and save it to a file :

amazon2csv.py --url="https://www.amazon.com/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=python+scraping" > output.csv

You can then open it with your favorite spreadsheet editor (and play with the filters) :

snapshot amazon2csv

More info about the command in the help :

amazon2csv.py --help

Using the amazonscraper Python package

# -*- coding: utf-8 -*-
import amazonscraper

results = amazonscraper.search("Python programming", max_product_nb=2)

for result in results:
    print("{}".format(result.title))
    print("  - ASIN : {}".format(result.asin))
    print("  - {} out of 5 stars, {} customer reviews".format(result.rating, result.review_nb))
    print("  - {}".format(result.url))
    print("  - Image : {}".format(result.img))
    print()

print("Number of results : %d" % (len(results)))

Which will output :

Python Crash Course: A Hands-On, Project-Based Introduction to Programming
  - ASIN : 1593276036
  - 4.5 out of 5 stars, 370 customer reviews
  - https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036
  - Image : https://images-na.ssl-images-amazon.com/images/I/51F48HFHq6L.jpg

A Smarter Way to Learn Python: Learn it faster. Remember it longer.
  - ASIN : B077Z55G3B
  - 4.7 out of 5 stars, 384 customer reviews
  - https://www.amazon.com/Smarter-Way-Learn-Python-Remember-ebook/dp/B077Z55G3B
  - Image : https://images-na.ssl-images-amazon.com/images/I/51fNZfTUPXL.jpg

Number of results : 2

Attributes of the Product object

Attribute name Description
title Product title
rating Rating of the products (number between 0 and 5, False if missing)
review_nb Number of customer reviews (False if missing)
url Product URL
img Image URL
asin Product ASIN (Amazon Standard Identification Number)

Docker

You can use the amazon2csv tool with the Docker image

You may execute :

docker run -it --rm thibdct/amazon2csv --keywords="Python programming" --maxproductnb=2

🤘 The easy way 🤘

I also built a bash wrapper to execute the Docker container easily.

Install it with :

curl -s https://raw.githubusercontent.com/tducret/amazon-scraper-python/master/amazon2csv \
> /usr/local/bin/amazon2csv && chmod +x /usr/local/bin/amazon2csv

You may replace /usr/local/bin with another folder that is in your $PATH

Check that it works :

On the first execution, the script will download the Docker image, so please be patient

amazon2csv --help
amazon2csv --keywords="Python programming" --maxproductnb=2

You can upgrade the app with :

amazon2csv --upgrade

and even uninstall with :

amazon2csv --uninstall

TODO

  • If no product was found with the CSS selectors, it may be a new Amazon page style => change user agent and get the new page. Loop on all the user agents and check all the CSS selectors again
  • Find a way to get the products without css selectors

amazon-scraper-python's People

Contributors

andreabisello avatar bitofbreeze avatar jpeacock29 avatar kevinl95 avatar tducret avatar tducretcnes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amazon-scraper-python's Issues

No module named 'click'

I pip installed 'click' and it is in my pip list version 7.1.2 but I keep getting the following error:

Traceback (most recent call last):
File "/VSCode/amazon2csv/amazon2csv.py", line 3, in
import click
ModuleNotFoundError: No module named 'click'

How to obtain all the text reviews of a product

Sorry but i'm new in the python world.

I'm with Jupyter Notebook and i can't obtain the text review for one product, i don't even know if it posible with this package.

Can you help me please.

Thank you very much!

amazon2csv.py unable to generate output.csv with data extracting from Amazon

Hi author/ tducret,

I download all your files and attempted to run python setup.py install. Next, i tried running amazon2csv.py followed your instruction steps found in https://github.com/tducret/amazon-scraper-python which says " You can also pass a search url (if you added complex filters for example), and save it to a file ".

However, on Windows command prompt windows session when i run this command " amazon2csv.py --url="https://www.amazon.com/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=python+scraping" > output.csv ", my output.csv file is generated however there's no data found inside the output.csv file. Any advise what steps have i missed out which resulted to the output.csv file being empty (ie: no data extracting from url = https://www.amazon.com/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=python+scraping" ) ?

See below screenshot as a proof / reference

image

Is there a way to make capture these info?

Hi,

Have been searching for Amazon scrappers and this project is the most effective and efficient, especially on the capability to work on "keyword search" - great work!

I had forked this and was trying to make some changes so that it works with Amazon in my region (Amazon.com/au & Amazon.co.jp). But unfortunately it turned out that I do not have the skill to do so.
Would be very good to see if there is a way to alter parameters like this.

I also shared the same thought in the other issue thread regarding "Price", and I was also thinking to get the data of "Seller" and "Stock level" as well for a thorough analysis.

Thanks so much again for having this fascinating project.

Kenneth

pip3 install broken

When I used latest version of pip to install, I encountered the following error. Pip changed its internal implementation recently, which is probably why this broke

pip3 --version

pip 21.2.4 from /.../lib/python3.9/site-packages/pip (python 3.9)

pip3 install -U amazonscraper

Collecting amazonscraper
  Downloading amazonscraper-0.1.2.tar.gz (8.6 kB)
    ERROR: Command errored out with exit status 1:
     command: /Users/liuxiaolu/Work/amazon/analyzer/env/bin/python3.9 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/_1/9b6tfd017bx3878sxkcsyhk40000gn/T/pip-install-8a51jct1/amazonscraper_f415d4529b4844dcb21f497a53af1347/setup.py'"'"'; __file__='"'"'/private/var/folders/_1/9b6tfd017bx3878sxkcsyhk40000gn/T/pip-install-8a51jct1/amazonscraper_f415d4529b4844dcb21f497a53af1347/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/_1/9b6tfd017bx3878sxkcsyhk40000gn/T/pip-pip-egg-info-t8errgvi
         cwd: /private/var/folders/_1/9b6tfd017bx3878sxkcsyhk40000gn/T/pip-install-8a51jct1/amazonscraper_f415d4529b4844dcb21f497a53af1347/
    Complete output (7 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/_1/9b6tfd017bx3878sxkcsyhk40000gn/T/pip-install-8a51jct1/amazonscraper_f415d4529b4844dcb21f497a53af1347/setup.py", line 22, in <module>
        requirements = [str(ir.req) for ir in install_reqs]
      File "/private/var/folders/_1/9b6tfd017bx3878sxkcsyhk40000gn/T/pip-install-8a51jct1/amazonscraper_f415d4529b4844dcb21f497a53af1347/setup.py", line 22, in <listcomp>
        requirements = [str(ir.req) for ir in install_reqs]
    AttributeError: 'ParsedRequirement' object has no attribute 'req'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/c0/15/bab4563fe795fadce45ac42eea0ed0988d7f478dc9c4ba8d845f0c2d2d4a/amazonscraper-0.1.2.tar.gz#sha256=b683d98fabe0f0548a28707bf399a1e32840fdd4f6117fba8152c6bbd4dc6bc5 (from https://pypi.org/simple/amazonscraper/) (requires-python:>=3). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Downloading amazonscraper-0.1.1.tar.gz (8.3 kB)
    ERROR: Command errored out with exit status 1:
     command: /Users/liuxiaolu/Work/amazon/analyzer/env/bin/python3.9 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/_1/9b6tfd017bx3878sxkcsyhk40000gn/T/pip-install-8a51jct1/amazonscraper_2dd4acea14d8428e8993c5ab3b4911e3/setup.py'"'"'; __file__='"'"'/private/var/folders/_1/9b6tfd017bx3878sxkcsyhk40000gn/T/pip-install-8a51jct1/amazonscraper_2dd4acea14d8428e8993c5ab3b4911e3/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/_1/9b6tfd017bx3878sxkcsyhk40000gn/T/pip-pip-egg-info-kj603xnr
         cwd: /private/var/folders/_1/9b6tfd017bx3878sxkcsyhk40000gn/T/pip-install-8a51jct1/amazonscraper_2dd4acea14d8428e8993c5ab3b4911e3/
    Complete output (7 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/_1/9b6tfd017bx3878sxkcsyhk40000gn/T/pip-install-8a51jct1/amazonscraper_2dd4acea14d8428e8993c5ab3b4911e3/setup.py", line 22, in <module>
        requirements = [str(ir.req) for ir in install_reqs]
      File "/private/var/folders/_1/9b6tfd017bx3878sxkcsyhk40000gn/T/pip-install-8a51jct1/amazonscraper_2dd4acea14d8428e8993c5ab3b4911e3/setup.py", line 22, in <listcomp>
        requirements = [str(ir.req) for ir in install_reqs]
    AttributeError: 'ParsedRequirement' object has no attribute 'req'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/71/5f/16139dbe286630c2aeda864c3974da01cf016731f50d3f4108dda5102831/amazonscraper-0.1.1.tar.gz#sha256=1254df358f7329d3d8e6c5d66a44b362d7cd3c699e1e5dd2848a76b4f05c4d39 (from https://pypi.org/simple/amazonscraper/) (requires-python:>=3). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Japanese Amazon Review Extraction

The review parser for Amazon JP is incorrectly parsing the rating values, as the number comes after the phrase of "5つ星のうち". Hence, all of the parsed values returns: "5つ星のうち".

screen shot 2019-02-24 at 10 27 25 pm

German Amazon character issue

Thank you very much, this app is very easy to use and powerful.

I have a small question, why do the German Amazon information I grabed have a lot of "?" garbled.

How can I solve this? @tducret

Carlos
del

variant, Lambda

Does this work with product variant? plz show example if so
+
Can I use this script with Lambda

Getting no products when searched

I'm running this code in my PyCharm
`import amazonscraper

results = amazonscraper.search("coffee", max_product_nb=2)

for result in results:
print("{}".format(result.title))
print(" - ASIN : {}".format(result.asin))
print(" - {} out of 5 stars, {} customer reviews".format(result.rating, result.review_nb))
print(" - {}".format(result.url))
print(" - Image : {}".format(result.img))
print()

print("Number of results : %d" % (len(results)))`

and the output comes like this

`C:\Users\bordi\PycharmProjects\amazon\venv\Scripts\python.exe C:/Users/bordi/PycharmProjects/amazon/Scrapper.py
Number of results : 0

Process finished with exit code 0`

Please help me with this problem

Images

Would be awesome if we could get images of a product (link to an image) into the CSV file. If possible all of them (could be in the same field in CSV, just separated with a delimiter or something?), or at least the first (main) image link.

EDIT:
I took a look at it a bit more and you could get (all) the thumbnail photos from the side and just change the url afterwards to get the full size image. For example, this item (https://www.amazon.com/gp/product/B0040EGNIU?pf_rd_p=1581d9f4-062f-453c-b69e-0f3e00ba2652&pf_rd_r=SB6K7AFYBC9WW0PTX71F), first thumbnail image has a link (https://images-na.ssl-images-amazon.com/images/I/41%2B1lH%2BGP0L._SS40_.jpg), if we change the SS40 to SX522 we get a full sized image (https://images-na.ssl-images-amazon.com/images/I/41%2B1lH%2BGP0L._SX522_.jpg)

Thumbnail links can be found like this: $("div#altImages ul li.item.imageThumbnail img").
Hope I helped a bit. Thanks!

EDIT 2:
Just found out that we can change the link to SL1500, which gives us an even better picture.
Maybe even implement a way to choose which resolution we want, although we can do this by changing links ourselves later too.
Thanks again!

Doesn't work for smile.amazon.com

Don't have time to investigate at the moment, but when using a search_url like https://smile.amazon.com/s/field-keywords=,%207.5%20Fluid%20Ounces,%206%20Pack it gives requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

IP change

Hi there , I am using Crawlera proxy rotation and I need to edit the settings.py. However, I don't see any option in that scraper to plug in proxy credentials. Can you pls help me with this ?
Many Thanks

Is it possible to scrape the Product Comments?

Hi - very cool project.

I'm just throwing this out there - it would be very useful if the product's comments/reviews could be scraped. Then, from the CSV file, I could run some text analysis or a word cloud, to see what the customers are saying about the product.

I'm not a skilled enough programmer to attempt to add this myself but would be excited to see it added.

Thanks -Ian

Is it possible to obtain results for more than 15 products?

I am only able to get results of 15 products when I perform a search, even though I have set max_product_nb to a value greater than 15. Performing a manual search using the same keyword returns over 100 products on the amazon.com website. I'll appreciate any comments. Thank you.

Is there a way I can compile this project into a .exe or a .dll and call it from c# code.

I am a c# developer and I am looking for an option to use it. Last month I work on FFmpeg binary. call the exe with the parameter and it does the work and returns the result (reading console output).

I am looking for using project similar to like that. Calling the exe with parameter and get the output. @tducret please check if there is a better way to use it.

I have used python but don't like to translate the code to C#. Anything else to use it in c#.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.