Giter VIP home page Giter VIP logo

getoldtweets3's People

Contributors

aogier avatar bmjr avatar bubavv avatar dorfman avatar fernandoramacciotti avatar jefferson-henrique avatar jfabdo avatar jtaylor351 avatar mattiasostmar avatar mawic avatar michaelkarpe avatar mottl avatar ndw avatar phaerus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

getoldtweets3's Issues

Document is empty...sometimes

Describe the bug
Most of the times, seems for all but dates in 2015, I get "Document is empty". See below for examples. Any clue whats going on? Many thanks in advance.

WORKS
`import GetOldTweets3 as got

tweetCriteria = got.manager.TweetCriteria().setQuerySearch('trump')
.setSince("2015-09-14")
.setUntil("2015-09-19")
.setMaxTweets(1)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
print(tweet.text)`

DOES NOT WORK
`import GetOldTweets3 as got

tweetCriteria = got.manager.TweetCriteria().setQuerySearch('trump')
.setSince("2019-05-14")
.setUntil("2019-05-18")
.setMaxTweets(1)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
print(tweet.text)`

Yields:
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/GetOldTweets3/manager/TweetManager.py", line 70, in getTweets
scrapedTweets = PyQuery(json['items_html'])
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pyquery/pyquery.py", line 255, in init
elements = fromstring(context, self.parser)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pyquery/pyquery.py", line 99, in fromstring
result = getattr(lxml.html, meth)(context)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/lxml/html/init.py", line 875, in fromstring
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/lxml/html/init.py", line 764, in document_fromstring
"Document is empty")
lxml.etree.ParserError: Document is empty

How can i get Tweets that contains at least one of the keywords provided in QuerySearch()

I wonder if there is a way I can retrieve tweets with at least one of the keywords provided by the query search method.
I've tried this:

tweetCriteria = got.manager.TweetCriteria().setQuerySearch('europe refugees')\
                                           .setSince("2015-05-01")\
                                           .setUntil("2015-09-30")\
                                           .setMaxTweets(10)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
print(tweet.text)

But I get in return the tweets that contain the two keywords europe and refugees. I want the first 10 tweets that contain either europe or refugees. Is this possible ?

Issue with --near

Hi, when i try to collect tweets from specific location it doesn't take into account the country. For example i tried --near "Vienna, Austria" and it returned tweets from California.

Retweets and favorites

hi,
thank you for great tool, it was very helpful
I am wondering, if there is any future plane to add new features such as collect all historical retweets and favorites?

Help in running command line

I have never done this kind of thing before and usually never had to refer to an external library and hope I can get help here. So, I am trying to make sure I can run the command line to extract tweets to csv directly. My error is as follows:

GetOldTweets3 -h
Traceback (most recent call last):
File "", line 1, in
NameError: name 'h' is not defined

The pip will install the GetOldTweets3 folder into the site-packages. If I could get a tutorial on how and where to run command lines, that would be really helpful.

Just Can Get Tweets for This Month

I try to get using command
GetOldTweets3 --username "barackobama" --since 2015-09-10 --until 2015-09-12 --maxtweets 10
and I got nothing
Then I try to delete the time range like this
GetOldTweets3 --username "barackobama"
I got the tweet, but just for this month
How can I fix this problem?
I need tweets for last 10 years

cannot run pip install GetOldTweets3

Hi,
I tried to execute "pip install GetOldTweets3" in linux server. But it is not successfull and i got this message:

Could not find a version that satisfies the requirement GetOldTweets3 (from versions: )
No matching distribution found for GetOldTweets3

need help to solve this. thank you.

Multithreaded Date Range Based Download

Hello!

I've been modifying the source code to build a multi threaded crawler that downloads tweets in a given date range. I'm using these architecture to download a huge amount of tweets in a cluster. Since it seems to work pretty good I thought about sharing my code. Would you guys find useful this?

PD: I'm using these to download 60M of tweet, that are like 20GB of data. Using a non multithreaded scheme my program would have spent almost a month to download all the data. With the multithreaded scheme I can download it faster.

Cheers,
Victor

tweet.to property is wrong if the Tweet is a response to multiple people

Update to the latest version of GetOldTweets3 before committing the issue!

Describe the bug
tweet.to property is wrong if the Tweet is a response to multiple people

Solution
TweetManager.py needs to be changed from:
tweet.to = usernames[1] if len(usernames) == 2 else None
to:
tweet.to = usernames[1] if len(usernames) >= 2 else None

error in extracting all tweets

hello!
I was trying to extract tweets by running
python3 /Users/Jham/src/getoldtweets3/bin/GetOldTweets3 --usernames-from-file userlist.txt --since 2017-01-01 --until 2017-12-31

but I keep getting the following error:
Found 94 usernames in userlist.txt
Downloading tweets...
Saved 600Traceback (most recent call last):
File "/Users/Jham/src/getoldtweets3/bin/GetOldTweets3", line 206, in main
got.manager.TweetManager.getTweets(tweetCriteria, receiveBuffer, debug=debug)
File "/Users/Jham/src/getoldtweets3/GetOldTweets3/manager/TweetManager.py", line 88, in getTweets
rawtext = TweetManager.textify(tweetPQ("p.js-tweet-text").html(), tweetCriteria.emoji)
File "/Users/Jham/src/getoldtweets3/GetOldTweets3/manager/TweetManager.py", line 190, in textify
if "u-hidden" in attr["class"]:
KeyError: 'class'

'class'

Done. Output file generated "output_got.csv".

Thank you!!

How to export all tweets in csv file?

Update to the latest version of GetOldTweets3 before committing the issue!

Describe the bug
A clear and concise description of what the bug is.

debug.log
Run GetOldTweets with the --debug option:

GetOldTweets ... --debug > debug.log

Upload debug.log to somewhere like http://gist.github.com or https://pastebin.com and provide with the link to your debug.log.

For general issues with running GetOldTweets3
If you have a general question please provide with OS, Python version and the method you have used to install GetOldTweets3

Can I get tweets' "via" information?

Hi, there!
This program is super cool!

I am using this program for analysis. To enhance reliability of that, I want to get tweets of only via "blah-blah". Is it possible? or, Is there any method in this program?

Possible to include re-tweets?

Hi there, thanks for this version - this is a very helpful tool.

I am doing a search by username bound by start and end dates and it works well. But I noticed that re-tweets are not returned, only the specific user's tweets..

Is it possible to make it so that tweets AND retweets are returned? Don't need a full release for this but if you could point me out where in the code I'd make the mods, would be super appreciated. I spent 12+ hours going through it but I didn't find a way to do it. Thanks a MILLION!

Can i get the location of tweet?

Update to the latest version of GetOldTweets3 before committing the issue!

Describe the bug
A clear and concise description of what the bug is.

debug.log
Run GetOldTweets with the --debug option:

GetOldTweets ... --debug > debug.log

Upload debug.log to somewhere like http://gist.github.com or https://pastebin.com and provide with the link to your debug.log.

For general issues with running GetOldTweets3
If you have a general question please provide with OS, Python version and the method you have used to install GetOldTweets3

Too Many Requests

I wonder if there is a way to break down the download into pieces and pause between two pieces to avoid the "Too Many Requests" error? I am getting tweets for one highly used word, and I want to break it into batches of 10,000 tweets and pause in the between batches.

Not all tweets collected

Hello,
Thanks to the contributors for this work !

I am having issues collecting all tweets related to a hashtag using the --querysearch argument.
Interestingly, the low number of tweets i am retrieving seem to be sampled from the whole database (i am obtaining tweets from as soon as 2011 even though my topic of interest has been buzzing in september 2018), and the number of tweets and the tweets themselves retrieved using a given query are still the same.

Going through Twitter search page by hand gives me the expected results, though, i.e. a lot of recent tweets.

I tried to use another ISP but the results remain the same.

Do you have any idea of where this could come from ?
I am using Python 3.7.2.

Thanks in advance,
Best,
J.

Problem getting started (SyntaxError)

Hi, I wanted to use GetOldTweets3 to download the tweets of a userlist and went through the installation using sudo pip install -e git+https://github.com/Mottl/GetOldTweets3#egg=GetOldTweets3 but whatever I do, even if I just want to call the -h, it just gives me an error:

Traceback (most recent call last):
File "/usr/local/bin/GetOldTweets3", line 6, in <module>
exec(compile(open(__file__).read(), __file__, 'exec'))
File "/GetOldTweets3-master/src/getoldtweets3/bin/GetOldTweets3", line 144
nonlocal cnt
^
SyntaxError: invalid syntax

The old GetOldTweets by Jefferson-Henrique works without a problem and I'm at a bit of a loss since I don't know anything about the nonlocal statement in this context. Could somebody maybe help me out here?

Always prints only one tweet. The output csv is not created.

Hi. I ran the code exactly as in the instructions.
However, it always print only one tweet no matter what, which doesn't make sense given the user and the period.
It doesn't matter whether I set the maximum number of tweets or not using .setMaxTweets(1)

Also, the output csv file is not created anywhere. How can I create the csv file containing the results?

Here are my code:

import GetOldTweets3 as got tweetCriteria = got.manager.TweetCriteria().setUsername("washingtonpost") \ .setSince("2019-01-01") \ .setUntil("2019-01-05") tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0] print(tweet.text)

Here are the output:

C:/............../GetOldTweets.py
How to be likable, by @petridisheshttps://wapo.st/2F9uL1J

Process finished with exit code 0

Problem with VPN

Hi,

First, let me say thank you for your work! you have made my life so much easier in the last couple month.
Secondly, the program has been working flawlessly until today. Let me explain the issue. I use windows 10, pycharm, and run getoldtweet bash on ubuntu. The program still works when I am not connected to a VPN. However, in the past (until yesterday) I was able to run GetOldTweets3 without any issues when I was connected to my VPN.
I have attached the debug file and any help would be wonderful. I have not changed anything so I am not sure what is happening...

> GetOldTweets3 0.0.10
> Downloading tweets...
> https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=Possession%20since%3A2012-05-30%20until%3A2012-09-04&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
> Host: twitter.com
> User-Agent: Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko
> Accept: application/json, text/javascript, */*; q=0.01
> Accept-Language: en-US,en;q=0.5
> X-Requested-With: XMLHttpRequest
> Referer: https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=Possession%20since%3A2012-05-30%20until%3A2012-09-04&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_stat
> e=false
> Connection: keep-alive
> An error occured during an HTTP request: <urlopen error [Errno -3] Temporary failure in name resolution>
> Try to open in browser: https://twitter.com/search?q=Possession%20since%3A2012-05-30%20until%3A2012-09-04&src=typd
> 
> Done. Output file generated "output_got.csv".

https://gist.github.com/Appotrooper/adc89b1976e0dc96a4caa726103e9742

Thanks in advance!

Similar alternative for getting Followers / Followings list ?

Great library fo immense utility. Kudos to the developers.
I was wondering if there's a similar utility tool to grab the followers of a user too? It'd be extremely important application wise since the rate limit on downloading Followers/ Followings are even more painful than that of the tweets.
if there's already an existing solution or other alternatives, would you let me know?

How can I specify the time?

Update to the latest version of GetOldTweets3 before committing the issue!

Describe the bug
A clear and concise description of what the bug is.

debug.log
Run GetOldTweets with the --debug option:

GetOldTweets ... --debug > debug.log

Upload debug.log to somewhere like http://gist.github.com or https://pastebin.com and provide with the link to your debug.log.

For general issues with running GetOldTweets3
If you have a general question please provide with OS, Python version and the method you have used to install GetOldTweets3

Download stops after a lot of tweets

I tried to download tweets with guery-search 'bitcoin' since 2018-02-18 until 2018-02-19. The issue is that the script stoped before the end of the until parameter

The log was too big to put it all, so I deleted the log of the first 31000 tweets.

You can find the log here

Can this be because twitter detects a bot downloading a lot of tweets?

I get 0 tweets

Hello,

When I request tweets I don't receive any one. Yesterday it worked perfectly well

Do you know what can be the problem?

empty cells not divided by commas

In a CSV file generated, it appears that at least some of the cells that are empty are not marked properly - as a result the column for "favorites" contains what should be in "text", and so on - the database is unworkable as a result.

I'm wondering if there is a way to correct this on my side. It appears that it is broken in instances when a long www address is part of the text (with /?mbid=social_facebook&amp&utm_brand=p4k&amp... etc.) syntax

How to use this?

Hi. I have no background on porgramming, but I have a project on getting old tweets, but I can't seem to make it work. Can anyone kindly show /tell me how to do it step by step? Thank you.

Words from queries that contain the $ symbol are ignored

I need to search for tweets that contain a company name (e.g. Apple) AND the stock ticker symbol (e.g. $APPL). However, if I use the following command:

GetOldTweets3 --querysearch "Apple $APPL lang:en" --maxtweets 10

then the word $APPL is not present in any of the tweets in the output doc.

The same happens with any word that starts with $ symbol. The tool somehow ignores such words. What can I do to solve this issue?

Thank you in advance and thanks for the very useful tool.

Issue when using until for bound date search

My scraper uses a bound search to extract tweets for individual dates using the tweet criteria "QuerySearch", "Since" and "Until". Yesterday, it stopped returning a list if tweets and instead returned only an empty list. After some trying around, it looks like setting the "until" property leads to this faulty behaviour:

GetOldTweets3 --querysearch "europe refugees" --since 2015-09-10 --until 2016-09-11 --maxtweets 10 --output "out.csv" --debug
/home/lks/anaconda3/bin/GetOldTweets3 --querysearch europe refugees --since 2015-09-10 --until 2016-09-11 --maxtweets 10 --output out.csv --debug
GetOldTweets3 0.0.10
Downloading tweets...
https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=europe%20refugees%20since%3A2015-09-10%20until%3A2016-09-11&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Host: twitter.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
Referer: https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=europe%20refugees%20since%3A2015-09-10%20until%3A2016-09-11&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Connection: keep-alive
{"min_position":"thGAVUV0VFVBYBFgESNQAVACUAVQAVAAA=","has_more_items":false,"items_html":"\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n","new_latent_count":0,"focused_refresh_interval":30000}
---


Done. Output file generated "out.csv".

When the --until option is not provided (or .setUntil() is not used) the scraper works as expected.

Exact number of tweet from different location

I try to run my query like this

tweetCriteria = got.manager.TweetCriteria().setQuerySearch('#Barcelona')
.setSince("2017-01-01")
.setUntil("2018-12-31")
.setNear("'Medan, Indonesia'")
.setMaxTweets(100000000)

I got 1559 tweets for this code. Than I try to change the location using this query

tweetCriteria = got.manager.TweetCriteria().setQuerySearch('#Barcelona')
.setSince("2017-01-01")
.setUntil("2018-12-31")
.setNear("'Jambi, Indonesia'")
.setMaxTweets(100000000)

I got 1559 tweets too. Logically, this is not true. Am I doing something wrong?

How to use GetOldTweets3

Hello.
Excuse me if this question is so stupid, buy how I use the GetOldTweets3 command??
I'm inside the folder GetOldTweets3-master and I run the example:
GetOldTweets3 --username "barackobama" --maxtweets 1
or
GetOldTweets3 -h

But I have: "GetOldTweets3 it is not recognized as an internal or external command,
program or batch file executable"

Exactly what I have to write where it said "GetOldTweets3" to run the code??

Thank you.

Issue to run the script

Hi,

I have an error when run GetOldTweets because the script not found and say: module 'GetOldTweets3' has no attribute 'manager'

Geocode search

Hi there,
I am trying to scrape tweets from a geographic coordinates point (not by place name!). Twitter search allows it with a query: "geocode:latitude,longitude,radius around the point". As there is no argument in the parser to specify geocode, I tried to just put my geocode into the query and search. Looks like Twitter generally gets it. But sometimes it gives inconsistent results:

query = 'geocode:30.3255568815976,-81.7671865745302,0.578km'
tweetCriteria = got.manager.TweetCriteria().setQuerySearch(query)
storage = got.manager.TweetManager.getTweets(tweetCriteria)

or

GetOldTweets3 --querysearch "geocode:30.3255568815976,-81.7671865745302,0.578km"

gives 5 tweets, but there are about 90 if I search the same query in Twitter manually.

Could this be happening because I use set.QuerySearch to search by coordinates instead of, say, set.Geocode (if this existed, which would be awesome, by the way)? If not, do you have any ideas why this is happening? Thanks!

Log file here

Could the program run multiple queries in parallel?

Hello!!

The program is awesome, congrats!

I am using the program to run very heavy queries that take a long time to be completed, and I was wondering if the program could be used to run queries on parallel. and if so, direction on what needs to be changed.

Some errors in output when searching for accented words

Hi Mottl, thanks to come alive again this scripts.

I am here for a possible bug (or misunderstanding) when searching for accented words. Please take a look to my log.

I, [2018-12-14T11:46:09.782525 #4968]  INFO -- :  - Processing Term: Caja de compensaci贸n 18
I, [2018-12-14T11:46:09.782637 #4968]  INFO -- :    CMD: /bin/bash -l -c 'cd ~/venv/get-old-tweets3_mottl && source bin/activate && GetOldTweets3 --output "/home/ubuntu/artool-utils/releases/20181207142540/public/dl/Twitter_20181214-084609_750-#2.csv" --querysearch "Caja de compensaci贸n 18" --since "2018-12-11" --until "2018-12-15"'
I, [2018-12-14T11:46:10.055077 #4968]  INFO -- pid 5243 exit 0: Downloading tweets...
Traceback (most recent call last):
  File "/home/ubuntu/venv/get-old-tweets3_mottl/bin/GetOldTweets3", line 171, in main
    got.manager.TweetManager.getTweets(tweetCriteria, receiveBuffer, debug=debug)
  File "/home/ubuntu/venv/get-old-tweets3_mottl/lib/python3.5/site-packages/GetOldTweets3/manager/TweetManager.py", line 65, in getTweets
    json = TweetManager.getJsonReponse(tweetCriteria, refreshCursor, cookieJar, proxy, user_agent, debug=debug)
  File "/home/ubuntu/venv/get-old-tweets3_mottl/lib/python3.5/site-packages/GetOldTweets3/manager/TweetManager.py", line 180, in getJsonReponse
    url = url % (urllib.parse.quote(urlGetData.strip()), urlLang, urllib.parse.quote(refreshCursor))
  File "/usr/lib/python3.5/urllib/parse.py", line 706, in quote
    string = string.encode(encoding, errors)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 18: surrogates not allowed

'utf-8' codec can't encode character '\udcc3' in position 18: surrogates not allowed

Done. Output file generated "/home/ubuntu/artool-utils/releases/20181207142540/public/dl/Twitter_20181214-084609_750-#2.csv".

I, [2018-12-14T11:46:10.055220 #4968]  INFO -- :  - Processing Term: Caja de compensacion 18
I, [2018-12-14T11:46:10.055336 #4968]  INFO -- :    CMD: /bin/bash -l -c 'cd ~/venv/get-old-tweets3_mottl && source bin/activate && GetOldTweets3 --output "/home/ubuntu/artool-utils/releases/20181207142540/public/dl/Twitter_20181214-084609_750-#3.csv" --querysearch "Caja de compensacion 18" --since "2018-12-11" --until "2018-12-15"'
I, [2018-12-14T11:46:10.491267 #4968]  INFO -- pid 5337 exit 0: Downloading tweets...

Done. Output file generated "/home/ubuntu/artool-utils/releases/20181207142540/public/dl/Twitter_20181214-084609_750-#3.csv".

As you can see the term "Caja de compensaci贸n 18" has an accented "贸" wich is giving the mentioned error backtrace. while the other term is fine.

Anyway, the tweets seem to be downloaded just fine too.

get age of birth date

This is a wonderful tool!
Is it also possible to retrieve the age or birth date of the users somehow?

Download Stop - list index out of range for timestamps not in UTC

Hi All,

I tried this program and it works when I tried to get data from a tweet in UK.
The problem I want to get a tweet from Asia countries (timestamps of tweets in the CSV are not in UTC) and after download some tweets, it stops immediately.

The example command:

GetOldTweets3 --near "Jakarta Pusat, DKI Jakarta" --within 100mi --since 2018-11-29 --until 2018-11-30

and the result says: "Saved 92600 list index out of range"

Why does this error happen? Is there anything that I need to modify if I download data from not not in UTC country? thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    馃枛 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 馃搳馃搱馃帀

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google 鉂わ笍 Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.