Giter VIP home page Giter VIP logo

enhancedtwitterscraper's Introduction

EnhancedTwitterScraper

Repacking of a wrapped interface of snscrape.


Requirements

Install the developer edition of snscrape.

Python

  • python 3.8
  • questionary
  • csv
  • pathlib

Use

Currently, this app only outputs files as a .csv and only scrapes twitter hashtags. Future versions will include ability to scrape by usernames.

If you're scraping a potentially large dataset, it is recommended to run multiple instances of this in multiple terminals, each pulling from different date ranges to enhance tweets found per second. For example, this pulls approximately 35-50 tweets per second, but 6 months of #Bitcoin yields over 3 million tweets. As you can see, that could take a long time to collect through only one instance!

Open the CLI and run: python TweetScraperQuest.py

You will be prompted to fill out the data you wish to scrape, as well as which attributes of a tweet you wish to collect.

  • Enter the desired file path and name, with no quotes: tweetDemo.csv
  • Currently, this does not integrate with pandas, so only select Yes when asked about output file type.
  • Enter the hashtag you wish to search for with no # in front: bitcoin
  • Selecting all tweet attributes to scrape can generate large file sizes! Otherwise, select the ones you care about.
  • Enter the start date, closest to today as 2021-10-01 (not inclusive)
  • Enter the end of the date range to scrape as 2021-01-01 (inclusive)
  • Select tweet stats frequency, 100 to 250 is good for medium to large sized data sets
  • Can include a limit to the number of tweets found.
  • Sit back and enjoy!

Credit

  • snscrape source code was utilized in this app. Follow the links in the Installation section for more details
  • MartinBeckUT's Python Wrapped TwitterScraper is the core of how snscrape was integrated into this project.
  • Idea and troubleshooting for this came up during a project with Lisa and Meghan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.