Comments (7)
@andersolarsson Those are actually verbatim retweets. By default (and currently there is no way to fix in V2 API), they are trimmed. Actually, the "full text" of the source tweets are available from the saved json files (if you have specified data_path
). We are still determining how to deliver the so-called 'tidy' data frame (#112). But please stay tuned. Below is a little preview.
require(academictwitteR)
#> Loading required package: academictwitteR
require(tidyverse)
#> Loading required package: tidyverse
random_path <- academictwitteR:::.gen_random_dir()
test <- get_all_tweets("#standwithhongkong", data_path = random_path,
start_tweets = "2021-06-21T00:00:00Z",
end_tweets = "2021-06-21T10:00:00Z", verbose = FALSE)
test$text[1:3]
#> [1] "#StandWithHongKong https://t.co/pGonWTd3We"
#> [2] "RT @GordonGChang: The people in #HongKong are resisting. Let’s pitch in against our common enemy, #China’s regime. #StandWithHongKong"
#> [3] "RT @chjackie797: From 2013. Still applicable in 2021. How democracy has progress in Hong Kong\n\n#StandWithHongKong \n#AppleDaily \n#PressFreed…"
## files containing includes
files <- list.files(random_path, pattern = "^users_", full.names = TRUE)
includes_section <- jsonlite::read_json(files, simplifyVector = TRUE)
source_tweets <- data.frame(source_id = includes_section$tweets$id, source_text = includes_section$tweets$text)
get_id <- function(x) {
if (is.null(x)) {
return(NA)
}
x$id[1]
}
source_id <- purrr::map_chr(test$referenced_tweets, get_id)
tibble(text = test$text, source_id = source_id) %>%
left_join(source_tweets, by = "source_id")
#> # A tibble: 100 x 3
#> text source_id source_text
#> <chr> <chr> <chr>
#> 1 "#StandWithHongKong https://t… 14068912572… "One of the largest circulating …
#> 2 "RT @GordonGChang: The people… 14067963147… "The people in #HongKong are res…
#> 3 "RT @chjackie797: From 2013. … 14068453938… "From 2013. Still applicable in …
#> 4 "RT @GordonGChang: The people… 14067963147… "The people in #HongKong are res…
#> 5 "RT @chjackie797: The latest … 14068839531… "The latest news from Next Media…
#> 6 "RT @the852spirit: 1100: 🚨🚨… 14068086786… "1100: 🚨🚨🚨 BRB . Handling som…
#> 7 "RT @KokdamonLam: #AppleDaily… 14069121634… "#AppleDaily #StandWithHongKong …
#> 8 "RT @GordonGChang: The people… 14067963147… "The people in #HongKong are res…
#> 9 "RT @GordonGChang: The people… 14067963147… "The people in #HongKong are res…
#> 10 "#StandWithHongKong #AppleDai… <NA> <NA>
#> # … with 90 more rows
Created on 2021-06-21 by the reprex package (v2.0.0)
from academictwitter.
from academictwitter.
Hello,
First of all, thank you for a very good package! As for my question, a problem similar to the one mentioned by JijoC-13 above is present also for "get_user_tweets" - tweet text appears to be shortened and I can't figure out if there is some sort of logic to this abbreviation. The link above - ropensci/rtweet#575 - suggest that tweet_mode = "extended" could be used somehow, however academictwitteR does not seem to recognize this argument.
Am I missing something here, or is this currently an unresolved issue - how to get the full tweet text using academictwitteR ?
from academictwitter.
Hello again,
Thank you for reply. Perhaps I should clarify what I am trying to do. The code below:
get_user_tweets(users = "account_name_here",
start_tweets = "2009-01-01T00:00:00Z",
end_tweets = "2014-12-31T00:00:00Z",
bearer_token = bearer_token,
data_path = "../output/",
bind_tweets = TRUE)
… is my attempt to get all posts from a specified account name. If I can somehow get just the created_at column along with the full text and the various public_metrics-columns (e.g. public_metrics.like.count, public_metrics.reply.count et.c.), that would solve what I'm working on now. I take it from your reply that a tidy format is of academictwitteR data is being worked on (which sounds amazing) but is there a way for me to get the mentioned variables in a data.frame format? Thanks
from academictwitter.
@andersolarsson is it what you want?
require(academictwitteR)
#> Loading required package: academictwitteR
require(tidyverse)
#> Loading required package: tidyverse
random_path <- academictwitteR:::.gen_random_dir()
test <- get_all_tweets(users = "katypearce", data_path = random_path,
start_tweets = "2021-06-21T00:00:00Z",
end_tweets = "2021-06-21T10:00:00Z", verbose = FALSE)
test
#> author_id possibly_sensitive entities.mentions
#> 1 93782410 FALSE 3, 17, Livingstone_S
#> 2 93782410 FALSE NULL
#> 3 93782410 FALSE 0, 12, jeanburgess
#> 4 93782410 FALSE 0, 11, profknowak
#> 5 93782410 FALSE 0, 3, UW
#> entities.annotations referenced_tweets source
#> 1 NULL retweeted, 1406633438266941445 Twitter Web App
#> 2 NULL NULL Twitter Web App
#> 3 26, 29, 0.5926, Person, Burr replied_to, 1406793556572852227 Twitter Web App
#> 4 NULL replied_to, 1406785045608677381 Twitter Web App
#> 5 NULL replied_to, 1406763823802036230 Twitter Web App
#> public_metrics.retweet_count public_metrics.reply_count
#> 1 31 0
#> 2 0 0
#> 3 0 2
#> 4 0 0
#> 5 0 0
#> public_metrics.like_count public_metrics.quote_count created_at
#> 1 0 0 2021-06-21T06:20:46.000Z
#> 2 4 0 2021-06-21T05:04:43.000Z
#> 3 0 0 2021-06-21T03:50:17.000Z
#> 4 0 0 2021-06-21T01:33:33.000Z
#> 5 0 0 2021-06-21T00:02:33.000Z
#> id
#> 1 1406859620744777728
#> 2 1406840484048216064
#> 3 1406821751820931072
#> 4 1406787341365039105
#> 5 1406764440473665536
#> text
#> 1 RT @Livingstone_S: “This is the first time we’ve had a society in which almost by default, everything is recorded and shared and aggregated…
#> 2 The best part about being allergic to insect stings is being able to have a good reason to make someone else remove a nest or whatever.
#> 3 @jeanburgess Incorrect is Burr-Guess right? It is a smoother Burjis?
#> 4 @profknowak True.
#> 5 @UW Thanks but isn't it closed now?
#> lang conversation_id in_reply_to_user_id
#> 1 en 1406859620744777728 <NA>
#> 2 en 1406840484048216064 <NA>
#> 3 en 1406793556572852227 18202677
#> 4 en 1406759806526840835 3903374964
#> 5 en 1406763823802036230 27103822
test %>% select(created_at, text) %>% bind_cols(test$public_metrics)
#> created_at
#> 1 2021-06-21T06:20:46.000Z
#> 2 2021-06-21T05:04:43.000Z
#> 3 2021-06-21T03:50:17.000Z
#> 4 2021-06-21T01:33:33.000Z
#> 5 2021-06-21T00:02:33.000Z
#> text
#> 1 RT @Livingstone_S: “This is the first time we’ve had a society in which almost by default, everything is recorded and shared and aggregated…
#> 2 The best part about being allergic to insect stings is being able to have a good reason to make someone else remove a nest or whatever.
#> 3 @jeanburgess Incorrect is Burr-Guess right? It is a smoother Burjis?
#> 4 @profknowak True.
#> 5 @UW Thanks but isn't it closed now?
#> retweet_count reply_count like_count quote_count
#> 1 31 0 0 0
#> 2 0 0 4 0
#> 3 0 2 0 0
#> 4 0 0 0 0
#> 5 0 0 0 0
Created on 2021-06-21 by the reprex package (v2.0.0)
from academictwitter.
Hello again,
So in my installation of academictwitteR, get_all_tweets() does not seem to have a users argument. Trying:
test <- get_user_tweets(users = "sdriks", # actual user name tried
start_tweets = "2020-06-21T00:00:00Z",
end_tweets = "2021-06-21T10:00:00Z",
verbose = FALSE,
bearer_token = bearer_token)
… but this does not solve the issue with the truncated text. If I understand you correctly, a solution to get the full text of the tweet is in the works? Thank you very much for your assistance so far!
from academictwitter.
The @andersolarsson 's question and part of @JijoC-13 's question are solved in v0.2 (usernames and full text) with bind_tweets(output_format = "tidy")
. The language filtering problem is actually the deficiency of the Twitter API. We can't solve.
from academictwitter.
Related Issues (20)
- [FR] Auto-Splitting long Queries
- [FR] Improved Error handling for Functions utilizing make_query()
- [BUG] Diminishing data on get_user_following() & get_user_followers() HOT 4
- [BUG] deprecated_functions
- [FR] <title>Package not downloading
- [FR] <title>Status Code 401
- [FR] Make `start_tweet` and `end_tweet` optional
- [FR] Get Quote Tweets For
- bind_tweets function not working HOT 2
- URL as a column HOT 4
- [BUG] get_user_timeline inconsistently returns Error in make_query(url = endpoint_url, ...) something went wrong. Status code: 400
- [FR] Get lists/list
- [FR] <Reply to specific tweet>
- [BUG] `Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.`
- [FR] Function to get tweet metrics HOT 1
- AcademicTwitteR not allowing get realDonaldTrump tweets.
- Tweets with Images
- Help: Boolean search tips in build_query
- [FR] <Parameter author.verified> HOT 1
- [BUG] <Getting error 400> HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from academictwitter.