Giter VIP home page Giter VIP logo

twitter-scraper's Introduction

Twitter Scraper

This project is made of 2 scripts

  • twitter-scraper.py to get a complete list of twitter threads replies so you can have a fast and complete view of complex threads even if you are not cited in all the tweet branches
  • tweet.monitor.sh to check and be notified about new twitter threads replies

twitter-scraper.py

twitter-scraper.py reads a text file fileed with tweets URL (one per line) in the following format https://twitter.com/<SCREEN_NAME>/status/<ID>

The script checks for replies to the given tweet and writes to stdout, keeping replies indentation.

Option -s gives a short output that can be useful to diff content between different iterations of the script; this way you can check for newer replies and notify.

usage: twitter-scraper.py -f file [-s]
Options:
-f  : name of the input file that contains twitter URLs (1 per line) in the following format: https://twitter.com/<SCREEN_NAME>/status/<ID>
-s  : csv output; useful to diff content between different iterations of the script

Twitter Access Tokens

Before start playing with twitter-scraper.py you need your Twitter access tokens (keys/secrets).

Generate here https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html

Twitter's API doesn't allow you to get all the replies to a tweet, the script use it to search for replies to a given tweet and replies to any reply as well.

Limitation

Twitter search API only returns results from last 7 days. This means that search results are limited to last 7 days

Requirements

The script is tested with python 2 and 3 on Ubuntu (from 18.04 to 22.04)

To start playing with twitter-scraper.py:

  • install required pip packages:
# Python 2.7
sudo pip install python-twitter
sudo pip install pytz
# Python 3
sudo pip3 install python-twitter
sudo pip3 install pytz

Configuration

Before first run change twitter access tokens. Find and replace following placeholders (mandatory)

CONSUMER_KEY
CONSUMER_SECRET
ACCESS_TOKEN
ACCESS_TOKEN_SECRET

Modify script local_timezone var to print tweet dates in your locale (optional)

local_timezone = 'Europe/Rome'

Full list here: https://gist.github.com/heyalexej/8bf688fd67d7199be4a1682b3eec7568

tweet.monitor.sh

tweet.monitor.sh is a bash script that uses twitter-scraper.py csv output to diff between two iteration of the script and show new replies (if any).

Notice that the script execute the following command.

twitter-scraper.py -f tweet.list -s

Put your tweets in tweet.list file or edit TWEETLIST var

If curl is installed (sudo apt install curl) the tweet content is printed; in this case remember to set APIKEY and APISECRETKEY to the same value of CONSUMER_KEY and CONSUMER_SECRET in twitter-scraper.py file

tweet content via lynx

Examples

twitter-scraper.py

command line

$ ./twitter-scraper.py  -f tweet.list
[...]
=========================================
[INFO] Start scraping from tweet URL https://twitter.com/benkow_/status/1085483319347867649

/----------------------------------
| From:	 Jesse V. Burke (@Jesse_V_Burke)
| Date:	 16/01/2019 14:01:07
| URL:	 https://twitter.com/Jesse_V_Burke/status/1085522335095054336
| @benkow_ Benkow I have slides analyzing bulehero probably 6+ months ago. DM me let’s chat about this c2
\----------------------------------

/----------------------------------
| From:	 Benkøw moʞuƎq (@benkow_)
| Date:	 16/01/2019 12:17:21
| URL:	 https://twitter.com/benkow_/status/1085496220183945216
| A lot of web vulns / BF  behind that binary (looks like a worm)
\----------------------------------

  /----------------------------------
  | From:	 Benkøw moʞuƎq (@benkow_)
  | Date:	 16/01/2019 12:20:55
  | URL:	 https://twitter.com/benkow_/status/1085497120726097921
  | also Eternalblue
  \----------------------------------
[...]
$ ./twitter-scraper.py  -f tweet.list -s
date,reply,parent_thread
16/01/2019 14:01:07,https://twitter.com/Jesse_V_Burke/status/1085522335095054336,https://twitter.com/benkow_/status/1085483319347867649
16/01/2019 12:17:21,https://twitter.com/benkow_/status/1085496220183945216,https://twitter.com/benkow_/status/1085483319347867649
16/01/2019 12:20:55,https://twitter.com/benkow_/status/1085497120726097921,https://twitter.com/benkow_/status/1085483319347867649
[...]

video

asciicast

tweet.monitor.sh

command line

$ ./tweet.monitor.sh 
=== TWEET MONITOR ===
Log file found, archiving...
Executing ~/twitter-scraper/twitter-scraper.py...
Checking for new tweets...
Found  new replies

> New reply to tweet https://twitter.com/iGio90/status/1244250696125427715 on 30/03/2020 11:32:10
>> link: https://twitter.com/marketingpmi/status/1244558024548769796
---
Giovanni fai come vuoi. Di solito non si fa cosi. E se è stato sistemato in meno di 6 ore (tra l'altro di domenica) significa che dietro ci sono persone in gamba e proattive che magari potevi aiutare senza tutto sto casino. Gli errori li facciamo tutti in ICT. Cmq grazie… https://t.co/DCXmmg1GtY" 
---
[...]
Bye!

video

asciicast

Credits

Based on the initial work made by @edsu https://gist.github.com/edsu/54e6f7d63df3866a87a15aed17b51eaf

twitter-scraper's People

Contributors

gmellini avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

twitter-scraper's Issues

Lynx command not working anymore

Cannot get tweet content using lynx (if available) in tweet.monitor.sh

TWO=https://twitter.com/kridyltneg/status/1278721162751721473
lynx -dump "${TWO}" \
       | grep 'Twitter:' -m1 -A4 \
       | tr -d '\n' \
       | sed -e 's/[^"]*"//' -e 's/\[[0-9]*\][a-zA-Z]*//g' \
       | tr -s ' '

The command fails as follow

lynx -dump "${TWO}"     
[...]
   We've detected that JavaScript is disabled in your browser. Would you
   like to proceed to legacy Twitter?

   (BUTTON) Yes

   Something went wrong, but don’t fret — let’s give it another shot.
   Try again
[...]

Attached a picture from Firefox after javascript is disabled
2020-07-02_18-56

Getting only Replies from tweet creator.

If you want a users tweet thread how do you do this. From the read me, it seems this is for only getting new replies to a tweet. I'm asking if a thread reader can be gotten out of this.

Script not working

Hi,
I am trying to use the replies scraper and I am getting 0 replies for all tweets. Could you please advice?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.