Giter VIP home page Giter VIP logo

codingforentrepreneurs / scrape-websites-with-python-fastapi-celery-nosql Goto Github PK

View Code? Open in Web Editor NEW
84.0 5.0 28.0 535 KB

Learn how to scrape websites with Python, Selenium, Requests HTML, Celery, FastAPI, & NoSQL with Cassandra via AstraDB.

License: Apache License 2.0

Jupyter Notebook 99.50% Python 0.50%
python fastapi requests-html celery scheduled-tasks cassandra cassandra-driver astradb

scrape-websites-with-python-fastapi-celery-nosql's Introduction

Scrape Websites with Python & NoSQL

Learn how to scrape websites with Python, Selenium, Requests HTML, Celery, FastAPI, & NoSQL.

Here's what each tool is used for:

  • Python 3.9 download - programming the logic.
  • AstraDB sign up - highly perfomant and scalable database service by DataStax. AstraDB is a Cassandra NoSQL Database. Cassandra is used by Netflix, Discord, Apple, and many others to handle astonding amounts of data.
  • Selenium docs - an automated web browsing experience that allows:
    • Run all web-browser actions through code
    • Loads JavaScript heavy websites
    • Can perform standard user interaction like clicks, form submits, logins, etc.
  • Requests HTML docs - we're going to use this to parse an HTML document extracted from Selenium
  • Celery docs - Celery providers worker processes that will allow us to schedule when we need to scrape websites. We'll be using redis as our task queue.
  • FastAPI docs - as a web application framework to Display and monitor web scraping results from anywhere

This series is broken up into 4 parts:

  • Scraping How to scrape and parse data from nearly any website with Selenium & Requests HTML.
  • Data models how to store and validate data with cassandra-driver, pydantic, and AstraDB.
  • Worker & Scheduling how to schedule periodic tasks (ie scraping) integrated with Redis & AstraDB
  • Presentation How to combine the above steps in as robust web application service

Setup your system.

Below is a preflight checklist to ensure you system is fully setup to work with this course. All guides and setup can be found in the setup directory of this repo.

Preflight checklist

  • [] Install Selenium & Chromedriver - setup guide
  • [] Install Redis - setup guide
  • [] Create a virtual environment & install dependencies
  • [] Setup an account with DataStax
  • [] Create your first AstraDB and get API credentials
  • [] Use cassandra-driver to verify your connection to AstraDB

scrape-websites-with-python-fastapi-celery-nosql's People

Contributors

codingforentrepreneurs avatar teamcfe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

scrape-websites-with-python-fastapi-celery-nosql's Issues

Getting ('Multiple objects found') Error when trying to display all items with same ASIN in Fastapi

THIS IS THE ROUTE I'M EXECUTING

@app.get("/all-entries/{asin}",)

def display_entries(asin):
data = list(table_2.objects.get(asin=asin))
data['events'] = list(table_2.objects().filter(asin=asin))
return data

#THIS IS THE COMPLETE TRACEBACK
Traceback (most recent call last):
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\uvicorn\protocols\http\h11_impl.py", line 377, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\uvicorn\middleware\proxy_headers.py", line 75, in call
return await self.app(scope, receive, send)
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\fastapi\applications.py", line 261, in call
await super().call(scope, receive, send)
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\starlette\applications.py", line 112, in call
await self.middleware_stack(scope, receive, send)
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\starlette\middleware\errors.py", line 181, in call
raise exc
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\starlette\middleware\errors.py", line 159, in call
await self.app(scope, receive, _send)
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\starlette\exceptions.py", line 82, in call
raise exc
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\starlette\exceptions.py", line 71, in call
await self.app(scope, receive, sender)
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in call
raise e
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\starlette\routing.py", line 656, in call
await route.handle(scope, receive, send)
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\starlette\routing.py", line 259, in handle
await self.app(scope, receive, send)
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\starlette\routing.py", line 61, in app
response = await func(request)
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\fastapi\routing.py", line 227, in
app
raw_response = await run_endpoint_function(
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\fastapi\routing.py", line 162, in
run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\starlette\concurrency.py", line 39, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\anyio\to_thread.py", line 28, in run_sync
return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\anyio_backends_asyncio.py", line 818, in run_sync_in_worker_thread
return await future
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\anyio_backends_asyncio.py", line 754, in run
result = context.run(func, *args)
File "C:\Users\alek\Desktop\WEBSCRAPING\PRATICA\PROJECTS\Scrape-Websites-with-Python-FastAPI-Celery-NoSQL.\app\API.py", line 60, in display_product
return dict(table_2.objects.get(asin=asin))
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\cassandra\cqlengine\query.py", line 763, in get
return self.filter(*args, **kwargs).get()
File "C:\Users\alek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\cassandra\cqlengine\query.py", line 770, in get
raise self.model.MultipleObjectsReturned('Multiple objects found')
cassandra.cqlengine.models.MultipleObjectsReturned: Multiple objects found

PLEASE HELP!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.