Giter VIP home page Giter VIP logo

comparing-sentiment's Introduction

comparing-sentiment

Using {targets}

This project uses {targets} for workflow management. To install {targets}:

library(remotes)
install_github("wlandau/targets")

To run the project pipeline, enter and run the following function:

Dependencies

  • tidytext::get_sentiments("bing")
  • tidytext::get_sentiments("loughran")
  • tidytext::get_sentiments("nrc")
  • tidytext::get_sentiments("afinn")

targets::tar_make()

Directory structure

  • Raw data: data-raw
  • Aggregated data and data from external sentiment software: data
    • For more information, refer to create-study-data.R
  • Functions: r/functions.R

Study data

Please contact the authors for access to the study data.

comparing-sentiment's People

Contributors

conradborchers avatar jrosen48 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

comparing-sentiment's Issues

make create_raw_data faster?

I think this pattern may be making this function slower as more files are read and merged:

d <- rbind(d, tweets_dl)

I think the issue is one of modifying on copy; that each time rbind() is called, the entire data is copied over to a new object. And so you get some kind of logarithmic (or square root) slowing in how long each iteration takes over time. Related (see copy-on-modify).

I wonder if an apply family function may be faster here (I have a slight preference for the {purrr} functions just because of their documentation/consistency across related functions, but I'm sure an lapply() or a related base function would work just fine---if it is indeed the case that a for loop is slower here.

status_id variable is behaving correctly?

I renamed josh4liwc.rds to data_for_liwc.rds

> data_for_liwc %>% mutate(nchar_status_id = nchar(status_id)) %>% count(nchar_status_id)
# A tibble: 7 x 2
  nchar_status_id      n
            <int>  <int>
1               9      2
2              10     75
3              11    343
4              16     28
5              17    653
6              18 383774
7              19 196607

Just checking - what's up with the length of the status_id variable? not really a problem as we could also join on text, I think. And it may not be a problem in any case. I labeled this as a bug but it may not be.

identifying common threads

unlike #10, which seems to work fine, I am not yet confident that the code to identify common threads is working fine. Issues/things on my mind as I work on this:

  • There seem to be some original tweets that are not in the dataset with the thread IDs
  • I am not sure how tweets that belong to multiple threads are handled

Generally, the code is hard to read and debug - my fault. I will work on this more, but keep it in the code for now.

Error when accessing dictionaries

I think all of the dictionaries have to be installed interactively, otherwise an error will return:

tidytext::get_sentiments("afinn")
tidytext::get_sentiments("loughran")
tidytext::get_sentiments("nrc")
tidytext::get_sentiments("bing") # I seem to have had this one installed already; or is it by default?

Is there a way to make this non-interactive, OR could we add this to the readme?

master branch after merge

Things got weird - is it because of how the merge happened recursively? There seem to be a bunch of duplicate targets. Do you have a sense for which we can delete? I can also just dive in and give it a whirl. Sorry for working on master, making this merge hard.

recursively adding replies

I believe these lines work correctly to recursively search and add replies to the original dataset:

  tar_target(file_name_for_sample_of_tweets, here::here("data", "sample-of-tweets.rds"), format = "file"),
  tar_target(sample_of_tweets_for_thread_finding, read_rds(file_name_for_sample_of_tweets)),
  tar_target(extracted_status_ids, extract_status_ids(sample_of_tweets_for_thread_finding)),
  tar_target(replies_that_were_recursively_searched, get_replies_recursive(extracted_status_ids)),
  tar_target(original_tweets_with_replies_added, combine_original_with_reply_tweets(sample_of_tweets_for_thread_finding, replies_that_were_recursively_searched)),

For the sample used - sample-of-tweets.rds, which I uploaded the the data directory of our new OneDrive folder - around 200 tweets are added.

Just flagging this here, as I am going to tag this in on a related issue for something that's not yet working reliably - returning an ID for what thread a tweet belongs to.

Just tagging you here @conradborchers, nothing to do.

can we combine two functions?

wondering if we can combine two functions in cases where we pass the file name and then read the file, e.g.:

What is presently:

tar_target(ss_scale_file, here::here("data-sentiment", "sentistrength_scale.txt"), format="file"),
tar_target(ss_scale_data, read.table(ss_scale_file, sep="\t", header = T, quote="")),

change to:

tar_target(ss_scale_file, read.table(here("data-sentiment", "sentistrength_scale.txt"), sep="\t", header = T, quote=""), format="file"),

This part of the manual seems to suggest this is possible (that we can still tell {targets} that this function loads a file). Thanks for input when you get chance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.