Giter VIP home page Giter VIP logo

tellerlin / amazonme Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sushil-rgb/amazonme

0.0 0.0 0.0 917 KB

Introducing the AmazonMe webscraper - a powerful tool for extracting data from Amazon.com using the Requests and Beautifulsoup library in Python. This scraper allows users to easily navigate and extract information from Amazon's website.

License: GNU General Public License v3.0

Python 100.00%

amazonme's Introduction

AmazonMe

Welcome to AmazonMe, a web scraper designed to extract information from the Amazon website and store it in a MongoDB databse. This repository contains the code for the scraper, which utilizes the Requests and BeautifulSoup libraries to automate the scraping process. The scraper also leverages asyncio concurrency to efficiently extract thousands of data points from the website.

Install necessary requirments:

It's always a good practice to install a virtual environment before installing necessary requirements:

python.exe -m venv environmentname
environmentname/scripts/activate

Install necessary requirements:

  pip install -r requirements.txt

Usage

  async def main():
        base_url = ""
        # Type True if you want to use proxy:
        proxy = False
        if proxy:
            mongo_to_db = await export_to_mong(base_url, f"http://{rand_proxies()}")
        else:
            mongo_to_db = await export_to_mong(base_url, None)
        # sheet_name = "Dinner Plates"  # Please use the name of the collection in your MongoDB database to specify the name of the spreadsheet you intend to export.
        # sheets = await mongo_to_sheet(sheet_name)  # Uncomment this to export to excel database.
        return mongo_to_db

To run the script, go to terminal and type:

  python main.py

Demo of the scraper scraping the content from Amazon

Discord bot

Features

Upon executing the program, the scraper commences its operation by extracting the following fields and storing the required product information in Mongo databases.

  • Product
  • Asin
  • Description
  • Breakdown
  • Price
  • Deal Price
  • You Saved
  • Rating
  • Rating count
  • Availability
  • Hyperlink
  • Image url
  • Image lists
  • Store
  • Store link

Supported domains:

  • ".com" (US)
  • ".co.uk" (UK)
  • ".com.mx" (Mexico)
  • ".com.br" (Brazil)
  • ".com.au" (Australia)
  • ".com.jp" (Japan)
  • ".com.be" (Belgium)
  • ".in" (India)
  • ".fr" (France)
  • ".se" (Sweden)
  • ".de" (Germany)
  • ".it" (Italy)

MongoDB Integration

Newly added to AmazonMe is the integration with MongoDB, allowing you to store the scraped data in a database for further analysis or usage. The scraper can now save the scraped data directly to a MongoDB database.

To enable MongoDB integration, you need to follow these steps:

  1. Make sure you have MongoDB installed and running on your machine or a remote server.
  2. Install the pymongo package by running the following command:

    python pip install pymongo

  3. In the script or module where you handle the scraping and data extraction, import the pymongo With the MongoDB integration, you can easily query and retrieve the scraped data from the database, perform analytics, or use it for other purposes.

Note

Please note that the script is designed to work with Amazon and may not work with other types of websites. Additionally, the script may be blocked by the website if it detects excessive scraping activity, so please use this tool responsibly and in compliance with Amazon's terms of service

If you have any issues or suggestions for improvements, please feel free to open an issue on the repository or submit a pull request.

License

This project is licensed under GPL-3.0 license. This scraper is provided as-is and for educational purposes only. The author is not repsonsible for any damages or legal issues that may result from its user. Use it at your own risk. Thank you for using the AmazonMe!

amazonme's People

Contributors

sushil-rgb avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.