Giter VIP home page Giter VIP logo

semmet95 / moths-of-aurora_backend Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 3.64 MB

Moths of Aurora app's backend hosted on an AWS EC2 instance using Firebase Realtime Database and Cloud Functions.

Home Page: https://play.google.com/store/apps/details?id=amit.apps.aurora_raw3

JavaScript 7.58% Python 92.42%
aws-ec2 cronjob-scheduler selenium-python twython pyrebase rarcega-instagram-scraper genius-lyrics-search beautifulsoup4 firebase-functions firebase-realtime-database scrapers

moths-of-aurora_backend's Introduction

moths-of-aurora_backend

Moths of Aurora app's backend hosted on an AWS EC2 instance using Firebase Realtime Database and Cloud Functions.

The backend's purpose is to scrape data from different social media acounts of the artist Aurora Aksnes, viz, Facebook, Instagram, Twitter and Youtube and update the database as well as generate notification for the Android app when new activity is detected. It also includes scripts that scrape the artist's official site to get the live shows data, and a script that uses Genius api to scrape and store lyrics of all the songs released by her that are available on their site.

Cloud_Function

I have only included the file that you need to modify when implementing Firebase Cloud Functions. The code sets listener on various data nodes and pushes a notification to all the registered devices using the app whenever those data nodes are updated.

EC2_data

This directory contains all the scrapers, the cronjob file and the requirements file. This content of this directory is stored on the AWS EC2 instance I'm using. All the scrapers download data from their corresponding sites and update a shared firebase realtime database if the downloaded data is not the same as the already stored data.

fb-scraper

I couldn't create a Facebook App because my requests kept getting denied. So I had to use selenium and scrape the mobile version of the Facebook site to get all the data. Thanks to this article for the idea. The script gets the url of the account's profile picture using FB's graph api. Then it uses a chromedriver file (included in the repository, for linux) to first login to facebook then visit the FB page and scrape all the posts by visiting them one by one. You can uncomment options.add_argument('--headless') to perform this operation in the background. The script scrapes each post's:

  • time of creation
  • thumbnail url
  • link
  • message

insta-scraper

This script scrapes instagram posts from the artist's instagram page. I used the code from this repo and modified it a bit for my own use.

twitter-scraper

This script uses the Twython library to scrape tweets. I had to create twitter developer account and then create an app to get the required credentials. You can check out the instructions following the link.

youtube-scraper

Now this script uses a Heroku app to scrape youtube for videos related to the artist. I took the code from this repo and changed the url to var url = 'https://www.youtube.com/results?search_query=aurora+aksnes+-"mobile+legends"+-"camille"&search_sort=video_date_uploaded'; (check out the sheltered-tundra-55930 folder) to get the search results using the relevant keywords. Now, this js script is hosted on Heroku and it returns search results based on the number of pages to scrape provied in the GET request. In the script scraped-data-formatter.py, I'm getting ssearch results from 5 pages. The following details of the vidoes are stored in the database:

  • duration
  • title
  • upload date
  • uploader
  • video url
  • number of views

lyrics-scraper

This script is used to scrape all the songs and their lyrics stored in the Genius database using their api. I had to find the artist id for Aurora Aksnes first though. The scripts doesn't check for new content before inserting the scraped data into the database because no cloud function is set for it's corresponding data node.

ticket-scraper

Last one, phew. This one scrapes Aurora's official site for data on her live shows which incude:

  • date
  • fest at which the live is to be held
  • location
  • ticket links

crontab

The scripts have to be run regularly to keep the database updated. This is where cron jobs come in. It runs all the above scrapes at regular interval with frequency depending on the corresponding activity. For instance, the twitter scraper is run more frequently than the lyrics scraper because new tweets are made more frequently than new songs.

requirements.txt

All the packages you need to install to run the code.

moths-of-aurora_backend's People

Contributors

semmet95 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.