martinkbeck / twitterscraper Goto Github PK
View Code? Open in Web Editor NEWRepository containing all files relevant to my basic and advanced tweet scraping articles.
Repository containing all files relevant to my basic and advanced tweet scraping articles.
Hi Martin,
Thank you so much for your Medium blog on this tool. This tool is super useful, and you did a great job describing how to use snscrape. I am just curious, do you know if you can filter retweets and replies with this module? Or if there is a way to know if the Tweet you are getting back is a RT, a reply, a part of a thread, etc.
Thanks so much in advance.
JB
Hello, thank you very much for this tutorial, it's great. I would like to ask you how I can make a query based on the code of a particular conversation, with the variable conversationID?
Thanks in advance.
good day.
import os
os.environ["http_proxy"] = "http://127.0.0.1:56916"
os.environ["https_proxy"] = "http://127.0.0.1:56916"
import snscrape.modules.twitter as sntwitter
from transformers import pipeline
import pandas as pd
from tqdm import tqdm
#抓取某一用户数据
# Creating list to append tweet data
tweets_list1 = []
# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:QCompounding').get_items()): # CharlieMunger00 Mayhem4Markets QCompounding
if i>20: #number of tweets you want to scrape
break
tweets_list1.append([tweet.date, tweet.content, tweet.user.username, tweet.likeCount, tweet.user.displayname, tweet.lang,tweet.hashtags,tweet.mentionedUsers,tweet.inReplyToUser,tweet.quotedTweet,tweet.retweetedTweet,tweet.media])
# Creating a dataframe from the tweets list above
tweets_df1 = pd.DataFrame(tweets_list1, columns=['Datetime', 'Text', 'Username', 'Like Count', 'Display Name', 'Language','hashtags','mentionedUsers','inReplyToUser','quotedTweet','retweetedTweet','media'])
tf=tweets_df1[tweets_df1['inReplyToUser'].isnull()]
from urllib.request import urlretrieve
tf=tweets_df1[tweets_df1['media'].isnull()==False]
for i in range(tf.shape[0]):
try:
kk=str(i)+'i'
urlretrieve(tf.iloc[i,-1][0].fullUrl, "d:/data/photo2/{}.jpg".format(kk))
except:
continue
`File "e:\temp\ipykernel_16024\2936908550.py", line 14, in <cell line: 14>
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:QCompounding').get_items()): # CharlieMunger00 Mayhem4Markets QCompounding
File "D:\anaconda3\envs\tensorflow\lib\site-packages\snscrape\modules\twitter.py", line 680, in get_items
for obj in self._iter_api_data('https://api.twitter.com/2/search/adaptive.json', params, paginationParams, cursor = self._cursor):
File "D:\anaconda3\envs\tensorflow\lib\site-packages\snscrape\modules\twitter.py", line 369, in _iter_api_data
obj = self._get_api_data(endpoint, reqParams)
File "D:\anaconda3\envs\tensorflow\lib\site-packages\snscrape\modules\twitter.py", line 338, in _get_api_data
self._ensure_guest_token()
File "D:\anaconda3\envs\tensorflow\lib\site-packages\snscrape\modules\twitter.py", line 301, in _ensure_guest_token
r = self._get(self._baseUrl if url is None else url, headers = {'User-Agent': self._userAgent}, responseOkCallback = self._check_guest_token_response)
File "D:\anaconda3\envs\tensorflow\lib\site-packages\snscrape\base.py", line 216, in _get
return self._request('GET', *args, **kwargs)
File "D:\anaconda3\envs\tensorflow\lib\site-packages\snscrape\base.py", line 212, in _request
raise ScraperException(msg)
ScraperException: 4 requests to https://twitter.com/search?f=live&lang=en&q=from%3AQCompounding&src=spelling_expansion_revert_click failed, giving up.`
Hi! Thanks for your super helpful Jupyter Notebook and Medium tutorial. Really appreciate the time and effort you put into this! Quick question, how do you scrape multiple users in a list? I would ideally like to iterate through a list of usernames and use your code below:
`# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:jack').get_items()):
if i>maxTweets:
break
tweets_list1.append([tweet.date, tweet.id, tweet.content, tweet.user.username])`
I tried to iterate through my list like below, but I think I'm doing something wrong.
list = [user1, user 2, user3....]
i = 0
for i, tweet in enumerate(sntwitter.TwitterSearchScraper('from:list[i]').get_items()):
Would appreciate any advice! Thank you :)
please update the CLI code python in os.system.
It seem to be not working with retweetCount, like Count. when i use these CLI code in .inypb it is not working. So please help me. Thank you
Hi, I'm trying to extract tweets combining the geocode filter and the since filter but every time I run it I end up having this error: 'Unable to find guest token'. I've run this same search using Tweepy and I do get a lot of tweets but because of the time constraint I'm very interested in making it run with this scraper. Do you know why could this be happening?
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('covid geocode:"34.052235,-118.243683,10km" since:2021-12-24').get_items()):
if i>maxTweets:
break
tweets_list1.append([tweet.url,tweet.date, tweet.id, tweet.content,
tweet.user.username,tweet.replyCount,tweet.retweetCount,
tweet.likeCount,tweet.quoteCount,tweet.source,tweet.media,
tweet.retweetedTweet,tweet.mentionedUsers])
print('Complete')
Also, if I want to append the coordinates to the dataframe or the country/city, what attribute of tweet. should I use?
Thanks a lot!
Is there any way to get tweets from a specific time span from a specific user? I've trying different things out on jupyter notebook and gotten just blank dataframes returned. Thank you.
Hi, for some reason I am getting a lot of irrelevant tweets when I run this code.
I've got the dataframe set up to show which keyword was used to scrape the tweet. I get a wall of relevant tweets from each user with the keyword listed, and then a whole bunch of irrelevant tweets for which keyword column is blank. Can anybody tell why?
# Imports
import snscrape.modules.twitter as sntwitter
import pandas as pd
# Query by text search
# Setting variables to be used below
maxTweets = 500
# Creating list to append tweet data to
tweets_list2 = []
# Creating lists from SearchWords and TwitterHandles txt files:
keywords_list = open("SearchWords.txt", mode='r', encoding='utf-8').read().splitlines()
users_list = open("TwitterHandles.txt", mode='r', encoding='utf-8').read().splitlines()
# Using TwitterSearchScraper to scrape data and append tweets to list
for n, k in enumerate(users_list):
for m, j in enumerate(keywords_list):
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('{} from:{} since:2020-07-07 until:2021-07-07'.format(keywords_list[m], users_list[n])).get_items()):
if i>maxTweets:
break
tweets_list2.append([tweet.url, tweet.date, tweet.id, tweet.content, tweet.user.username, tweet.retweetedTweet, keywords_list[m]])
# Creating a dataframe from the tweets list above
tweets_df2 = pd.DataFrame(tweets_list2, columns=['URL', 'Datetime', 'Tweet Id', 'Text', 'Username', 'Retweet', 'Keywords'])
# Display first 5 entries from dataframe
tweets_df2.head()
# Export dataframe into a CSV
tweets_df2.to_csv('text-query-tweets9.csv', sep=',', index=False)
I am getting this issue on scrapping tweets by tweepy. Can you help me to resolve this issue?
Hi! First of all, thank you very much for this tool, this is helping me with my dissertation so much!
I'm having a problem while scraping tweets about a certain topic for a specified period of time. The problem is that when I try to get tweets like from 01/05/2019 to 31/05/2019 I only get tweets up to the 30th.
For my dissertation I needed 10k tweets a day for the past 3 years, I built a function and it worked perfectly, I extracted around 300k tweets a month but in the end, I found out that I always miss the last day of each month.
In order to add all the missing days, I need to use the tool to download tweets from the last day of each month, but from what I understood I always specify a "since" date and an "until" date. So for getting 1st May tweets I need to specify from 01/05/19 to 02/05/19 and I will get tweets from the 1st of May. The problem is that if I am dealing with the 31st of May I cannot specify an until date as it would be the 32nd that doesn't exist..
Am I missing something? How can I just get tweets from a specific day?
ps. if I set the same date as since and until date it doesn't work
Thank you in advance
Hello @MartinBeckUT I read you article on medium, btw it is a fantastic one. I tried your code but the csv and json files are blank
Despite several attempts, none of them work. Always getting the same error.
An error occured during an HTTP request: HTTP Error 404: Not Found
Try to open in browser: https://twitter.com/search?q=Hello%20from%3Abarackobama%20since%3A2011-01-01%20until%3A2016-12-20&src=typd
An exception has occurred, use %tb to see the full traceback.
SystemExit
Hi Martin
Thank you for the good work you are doing.
I was wondering if there is a limit to the number of tweets one can scrape with snscrape. What about the date, any limit?
Thank you.
Is it possible to scrape the number of likes and retweets for each tweet?
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.