Giter VIP home page Giter VIP logo

Comments (7)

chainsawriot avatar chainsawriot commented on June 16, 2024 1

@andersolarsson Those are actually verbatim retweets. By default (and currently there is no way to fix in V2 API), they are trimmed. Actually, the "full text" of the source tweets are available from the saved json files (if you have specified data_path). We are still determining how to deliver the so-called 'tidy' data frame (#112). But please stay tuned. Below is a little preview.

require(academictwitteR)
#> Loading required package: academictwitteR
require(tidyverse)
#> Loading required package: tidyverse
random_path <- academictwitteR:::.gen_random_dir()
test <- get_all_tweets("#standwithhongkong", data_path = random_path, 
                       start_tweets = "2021-06-21T00:00:00Z", 
                       end_tweets = "2021-06-21T10:00:00Z", verbose = FALSE)

test$text[1:3]
#> [1] "#StandWithHongKong https://t.co/pGonWTd3We"                                                                                                      
#> [2] "RT @GordonGChang: The people in #HongKong are resisting. Let’s pitch in against our common enemy, #China’s regime. #StandWithHongKong"           
#> [3] "RT @chjackie797: From 2013. Still applicable in 2021. How democracy has progress in Hong Kong\n\n#StandWithHongKong \n#AppleDaily \n#PressFreed…"
## files containing includes
files <- list.files(random_path, pattern = "^users_", full.names = TRUE)
includes_section <- jsonlite::read_json(files, simplifyVector = TRUE)
source_tweets <- data.frame(source_id = includes_section$tweets$id, source_text = includes_section$tweets$text)

get_id <- function(x) {
  if (is.null(x)) {
    return(NA)
  }
  x$id[1]
}

source_id <- purrr::map_chr(test$referenced_tweets, get_id)

tibble(text = test$text, source_id = source_id) %>% 
  left_join(source_tweets, by = "source_id")
#> # A tibble: 100 x 3
#>    text                           source_id    source_text                      
#>    <chr>                          <chr>        <chr>                            
#>  1 "#StandWithHongKong https://t… 14068912572… "One of the largest circulating …
#>  2 "RT @GordonGChang: The people… 14067963147… "The people in #HongKong are res…
#>  3 "RT @chjackie797: From 2013. … 14068453938… "From 2013. Still applicable in …
#>  4 "RT @GordonGChang: The people… 14067963147… "The people in #HongKong are res…
#>  5 "RT @chjackie797: The latest … 14068839531… "The latest news from Next Media…
#>  6 "RT @the852spirit: 1100: 🚨🚨… 14068086786… "1100: 🚨🚨🚨 BRB . Handling som…
#>  7 "RT @KokdamonLam: #AppleDaily… 14069121634… "#AppleDaily #StandWithHongKong …
#>  8 "RT @GordonGChang: The people… 14067963147… "The people in #HongKong are res…
#>  9 "RT @GordonGChang: The people… 14067963147… "The people in #HongKong are res…
#> 10 "#StandWithHongKong #AppleDai… <NA>          <NA>                            
#> # … with 90 more rows

Created on 2021-06-21 by the reprex package (v2.0.0)

from academictwitter.

chainsawriot avatar chainsawriot commented on June 16, 2024

ref ropensci/rtweet#575

from academictwitter.

andersolarsson avatar andersolarsson commented on June 16, 2024

Hello,

First of all, thank you for a very good package! As for my question, a problem similar to the one mentioned by JijoC-13 above is present also for "get_user_tweets" - tweet text appears to be shortened and I can't figure out if there is some sort of logic to this abbreviation. The link above - ropensci/rtweet#575 - suggest that tweet_mode = "extended" could be used somehow, however academictwitteR does not seem to recognize this argument.

Am I missing something here, or is this currently an unresolved issue - how to get the full tweet text using academictwitteR ?

from academictwitter.

andersolarsson avatar andersolarsson commented on June 16, 2024

Hello again,

Thank you for reply. Perhaps I should clarify what I am trying to do. The code below:

get_user_tweets(users = "account_name_here",
start_tweets = "2009-01-01T00:00:00Z",
end_tweets = "2014-12-31T00:00:00Z",
bearer_token = bearer_token,
data_path = "../output/",
bind_tweets = TRUE)

… is my attempt to get all posts from a specified account name. If I can somehow get just the created_at column along with the full text and the various public_metrics-columns (e.g. public_metrics.like.count, public_metrics.reply.count et.c.), that would solve what I'm working on now. I take it from your reply that a tidy format is of academictwitteR data is being worked on (which sounds amazing) but is there a way for me to get the mentioned variables in a data.frame format? Thanks

from academictwitter.

chainsawriot avatar chainsawriot commented on June 16, 2024

@andersolarsson is it what you want?

require(academictwitteR)
#> Loading required package: academictwitteR
require(tidyverse)
#> Loading required package: tidyverse
random_path <- academictwitteR:::.gen_random_dir()
test <- get_all_tweets(users = "katypearce", data_path = random_path, 
                       start_tweets = "2021-06-21T00:00:00Z", 
                       end_tweets = "2021-06-21T10:00:00Z", verbose = FALSE)

test
#>   author_id possibly_sensitive    entities.mentions
#> 1  93782410              FALSE 3, 17, Livingstone_S
#> 2  93782410              FALSE                 NULL
#> 3  93782410              FALSE   0, 12, jeanburgess
#> 4  93782410              FALSE    0, 11, profknowak
#> 5  93782410              FALSE             0, 3, UW
#>           entities.annotations               referenced_tweets          source
#> 1                         NULL  retweeted, 1406633438266941445 Twitter Web App
#> 2                         NULL                            NULL Twitter Web App
#> 3 26, 29, 0.5926, Person, Burr replied_to, 1406793556572852227 Twitter Web App
#> 4                         NULL replied_to, 1406785045608677381 Twitter Web App
#> 5                         NULL replied_to, 1406763823802036230 Twitter Web App
#>   public_metrics.retweet_count public_metrics.reply_count
#> 1                           31                          0
#> 2                            0                          0
#> 3                            0                          2
#> 4                            0                          0
#> 5                            0                          0
#>   public_metrics.like_count public_metrics.quote_count               created_at
#> 1                         0                          0 2021-06-21T06:20:46.000Z
#> 2                         4                          0 2021-06-21T05:04:43.000Z
#> 3                         0                          0 2021-06-21T03:50:17.000Z
#> 4                         0                          0 2021-06-21T01:33:33.000Z
#> 5                         0                          0 2021-06-21T00:02:33.000Z
#>                    id
#> 1 1406859620744777728
#> 2 1406840484048216064
#> 3 1406821751820931072
#> 4 1406787341365039105
#> 5 1406764440473665536
#>                                                                                                                                           text
#> 1 RT @Livingstone_S: “This is the first time we’ve had a society in which almost by default, everything is recorded and shared and aggregated…
#> 2      The best part about being allergic to insect stings is being able to have a good reason to make someone else remove a nest or whatever.
#> 3                                                                         @jeanburgess Incorrect is Burr-Guess right? It is a smoother Burjis?
#> 4                                                                                                                            @profknowak True.
#> 5                                                                                                          @UW Thanks but isn't it closed now?
#>   lang     conversation_id in_reply_to_user_id
#> 1   en 1406859620744777728                <NA>
#> 2   en 1406840484048216064                <NA>
#> 3   en 1406793556572852227            18202677
#> 4   en 1406759806526840835          3903374964
#> 5   en 1406763823802036230            27103822

test %>% select(created_at, text) %>% bind_cols(test$public_metrics)
#>                 created_at
#> 1 2021-06-21T06:20:46.000Z
#> 2 2021-06-21T05:04:43.000Z
#> 3 2021-06-21T03:50:17.000Z
#> 4 2021-06-21T01:33:33.000Z
#> 5 2021-06-21T00:02:33.000Z
#>                                                                                                                                           text
#> 1 RT @Livingstone_S: “This is the first time we’ve had a society in which almost by default, everything is recorded and shared and aggregated…
#> 2      The best part about being allergic to insect stings is being able to have a good reason to make someone else remove a nest or whatever.
#> 3                                                                         @jeanburgess Incorrect is Burr-Guess right? It is a smoother Burjis?
#> 4                                                                                                                            @profknowak True.
#> 5                                                                                                          @UW Thanks but isn't it closed now?
#>   retweet_count reply_count like_count quote_count
#> 1            31           0          0           0
#> 2             0           0          4           0
#> 3             0           2          0           0
#> 4             0           0          0           0
#> 5             0           0          0           0

Created on 2021-06-21 by the reprex package (v2.0.0)

from academictwitter.

andersolarsson avatar andersolarsson commented on June 16, 2024

Hello again,

So in my installation of academictwitteR, get_all_tweets() does not seem to have a users argument. Trying:

test <- get_user_tweets(users = "sdriks", # actual user name tried
start_tweets = "2020-06-21T00:00:00Z",
end_tweets = "2021-06-21T10:00:00Z",
verbose = FALSE,
bearer_token = bearer_token)

… but this does not solve the issue with the truncated text. If I understand you correctly, a solution to get the full text of the tweet is in the works? Thank you very much for your assistance so far!

from academictwitter.

chainsawriot avatar chainsawriot commented on June 16, 2024

The @andersolarsson 's question and part of @JijoC-13 's question are solved in v0.2 (usernames and full text) with bind_tweets(output_format = "tidy"). The language filtering problem is actually the deficiency of the Twitter API. We can't solve.

from academictwitter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.