Entire Tweet, Location, and Language The retrieved tweets by the

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

ref <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Entire Tweet, Location, and Language in "get_all_tweets" about academictwitter HOT 7 CLOSED

cjbarrie commented on June 16, 2024

Entire Tweet, Location, and Language in "get_all_tweets"

from academictwitter.

Comments (7)

chainsawriot commented on June 16, 2024 1

@andersolarsson Those are actually verbatim retweets. By default (and currently there is no way to fix in V2 API), they are trimmed. Actually, the "full text" of the source tweets are available from the saved json files (if you have specified data_path). We are still determining how to deliver the so-called 'tidy' data frame (#112). But please stay tuned. Below is a little preview.

require(academictwitteR)
#> Loading required package: academictwitteR
require(tidyverse)
#> Loading required package: tidyverse
random_path <- academictwitteR:::.gen_random_dir()
test <- get_all_tweets("#standwithhongkong", data_path = random_path, 
                       start_tweets = "2021-06-21T00:00:00Z", 
                       end_tweets = "2021-06-21T10:00:00Z", verbose = FALSE)

test$text[1:3]
#> [1] "#StandWithHongKong https://t.co/pGonWTd3We"                                                                                                      
#> [2] "RT @GordonGChang: The people in #HongKong are resisting. Let’s pitch in against our common enemy, #China’s regime. #StandWithHongKong"           
#> [3] "RT @chjackie797: From 2013. Still applicable in 2021. How democracy has progress in Hong Kong\n\n#StandWithHongKong \n#AppleDaily \n#PressFreed…"
## files containing includes
files <- list.files(random_path, pattern = "^users_", full.names = TRUE)
includes_section <- jsonlite::read_json(files, simplifyVector = TRUE)
source_tweets <- data.frame(source_id = includes_section$tweets$id, source_text = includes_section$tweets$text)

get_id <- function(x) {
  if (is.null(x)) {
    return(NA)
  }
  x$id[1]
}

source_id <- purrr::map_chr(test$referenced_tweets, get_id)

tibble(text = test$text, source_id = source_id) %>% 
  left_join(source_tweets, by = "source_id")
#> # A tibble: 100 x 3
#>    text                           source_id    source_text                      
#>    <chr>                          <chr>        <chr>                            
#>  1 "#StandWithHongKong https://t… 14068912572… "One of the largest circulating …
#>  2 "RT @GordonGChang: The people… 14067963147… "The people in #HongKong are res…
#>  3 "RT @chjackie797: From 2013. … 14068453938… "From 2013. Still applicable in …
#>  4 "RT @GordonGChang: The people… 14067963147… "The people in #HongKong are res…
#>  5 "RT @chjackie797: The latest … 14068839531… "The latest news from Next Media…
#>  6 "RT @the852spirit: 1100: 🚨🚨… 14068086786… "1100: 🚨🚨🚨 BRB . Handling som…
#>  7 "RT @KokdamonLam: #AppleDaily… 14069121634… "#AppleDaily #StandWithHongKong …
#>  8 "RT @GordonGChang: The people… 14067963147… "The people in #HongKong are res…
#>  9 "RT @GordonGChang: The people… 14067963147… "The people in #HongKong are res…
#> 10 "#StandWithHongKong #AppleDai… <NA>          <NA>                            
#> # … with 90 more rows

^{Created on 2021-06-21 by the reprex package (v2.0.0)}

from academictwitter.

chainsawriot commented on June 16, 2024

ref ropensci/rtweet#575

from academictwitter.

andersolarsson commented on June 16, 2024

Hello,

First of all, thank you for a very good package! As for my question, a problem similar to the one mentioned by JijoC-13 above is present also for "get_user_tweets" - tweet text appears to be shortened and I can't figure out if there is some sort of logic to this abbreviation. The link above - ropensci/rtweet#575 - suggest that tweet_mode = "extended" could be used somehow, however academictwitteR does not seem to recognize this argument.

Am I missing something here, or is this currently an unresolved issue - how to get the full tweet text using academictwitteR ?

from academictwitter.

andersolarsson commented on June 16, 2024

Hello again,

Thank you for reply. Perhaps I should clarify what I am trying to do. The code below:

get_user_tweets(users = "account_name_here",
start_tweets = "2009-01-01T00:00:00Z",
end_tweets = "2014-12-31T00:00:00Z",
bearer_token = bearer_token,
data_path = "../output/",
bind_tweets = TRUE)

… is my attempt to get all posts from a specified account name. If I can somehow get just the created_at column along with the full text and the various public_metrics-columns (e.g. public_metrics.like.count, public_metrics.reply.count et.c.), that would solve what I'm working on now. I take it from your reply that a tidy format is of academictwitteR data is being worked on (which sounds amazing) but is there a way for me to get the mentioned variables in a data.frame format? Thanks

from academictwitter.

chainsawriot commented on June 16, 2024

@andersolarsson is it what you want?

require(academictwitteR)
#> Loading required package: academictwitteR
require(tidyverse)
#> Loading required package: tidyverse
random_path <- academictwitteR:::.gen_random_dir()
test <- get_all_tweets(users = "katypearce", data_path = random_path, 
                       start_tweets = "2021-06-21T00:00:00Z", 
                       end_tweets = "2021-06-21T10:00:00Z", verbose = FALSE)

test
#>   author_id possibly_sensitive    entities.mentions
#> 1  93782410              FALSE 3, 17, Livingstone_S
#> 2  93782410              FALSE                 NULL
#> 3  93782410              FALSE   0, 12, jeanburgess
#> 4  93782410              FALSE    0, 11, profknowak
#> 5  93782410              FALSE             0, 3, UW
#>           entities.annotations               referenced_tweets          source
#> 1                         NULL  retweeted, 1406633438266941445 Twitter Web App
#> 2                         NULL                            NULL Twitter Web App
#> 3 26, 29, 0.5926, Person, Burr replied_to, 1406793556572852227 Twitter Web App
#> 4                         NULL replied_to, 1406785045608677381 Twitter Web App
#> 5                         NULL replied_to, 1406763823802036230 Twitter Web App
#>   public_metrics.retweet_count public_metrics.reply_count
#> 1                           31                          0
#> 2                            0                          0
#> 3                            0                          2
#> 4                            0                          0
#> 5                            0                          0
#>   public_metrics.like_count public_metrics.quote_count               created_at
#> 1                         0                          0 2021-06-21T06:20:46.000Z
#> 2                         4                          0 2021-06-21T05:04:43.000Z
#> 3                         0                          0 2021-06-21T03:50:17.000Z
#> 4                         0                          0 2021-06-21T01:33:33.000Z
#> 5                         0                          0 2021-06-21T00:02:33.000Z
#>                    id
#> 1 1406859620744777728
#> 2 1406840484048216064
#> 3 1406821751820931072
#> 4 1406787341365039105
#> 5 1406764440473665536
#>                                                                                                                                           text
#> 1 RT @Livingstone_S: “This is the first time we’ve had a society in which almost by default, everything is recorded and shared and aggregated…
#> 2      The best part about being allergic to insect stings is being able to have a good reason to make someone else remove a nest or whatever.
#> 3                                                                         @jeanburgess Incorrect is Burr-Guess right? It is a smoother Burjis?
#> 4                                                                                                                            @profknowak True.
#> 5                                                                                                          @UW Thanks but isn't it closed now?
#>   lang     conversation_id in_reply_to_user_id
#> 1   en 1406859620744777728                <NA>
#> 2   en 1406840484048216064                <NA>
#> 3   en 1406793556572852227            18202677
#> 4   en 1406759806526840835          3903374964
#> 5   en 1406763823802036230            27103822

test %>% select(created_at, text) %>% bind_cols(test$public_metrics)
#>                 created_at
#> 1 2021-06-21T06:20:46.000Z
#> 2 2021-06-21T05:04:43.000Z
#> 3 2021-06-21T03:50:17.000Z
#> 4 2021-06-21T01:33:33.000Z
#> 5 2021-06-21T00:02:33.000Z
#>                                                                                                                                           text
#> 1 RT @Livingstone_S: “This is the first time we’ve had a society in which almost by default, everything is recorded and shared and aggregated…
#> 2      The best part about being allergic to insect stings is being able to have a good reason to make someone else remove a nest or whatever.
#> 3                                                                         @jeanburgess Incorrect is Burr-Guess right? It is a smoother Burjis?
#> 4                                                                                                                            @profknowak True.
#> 5                                                                                                          @UW Thanks but isn't it closed now?
#>   retweet_count reply_count like_count quote_count
#> 1            31           0          0           0
#> 2             0           0          4           0
#> 3             0           2          0           0
#> 4             0           0          0           0
#> 5             0           0          0           0

^{Created on 2021-06-21 by the reprex package (v2.0.0)}

from academictwitter.

andersolarsson commented on June 16, 2024

Hello again,

So in my installation of academictwitteR, get_all_tweets() does not seem to have a users argument. Trying:

test <- get_user_tweets(users = "sdriks", # actual user name tried
start_tweets = "2020-06-21T00:00:00Z",
end_tweets = "2021-06-21T10:00:00Z",
verbose = FALSE,
bearer_token = bearer_token)

… but this does not solve the issue with the truncated text. If I understand you correctly, a solution to get the full text of the tweet is in the works? Thank you very much for your assistance so far!

from academictwitter.

chainsawriot commented on June 16, 2024

The @andersolarsson 's question and part of @JijoC-13 's question are solved in v0.2 (usernames and full text) with bind_tweets(output_format = "tidy"). The language filtering problem is actually the deficiency of the Twitter API. We can't solve.

from academictwitter.

Entire Tweet, Location, and Language in "get_all_tweets" about academictwitter HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent