Giter VIP home page Giter VIP logo

schrute's Introduction

schrute

R build status pkgdown metacran downloads lifecycle CRAN_Status_Badge

Analyze and have fun with the text from the best series of all time

Companion App

the quotable office let’s you search for your favorite office quotes

Installation

You can install the released version of schrute from CRAN with:

install.packages("schrute")

Usage

The schrute package has one and only one purpose: share the complete script transcription for The Office (US) television show. Users are encouraged to use the tidy text data for exploration, learning and fun.

Check out the data like so:

library(schrute)
library(tibble)

tibble::glimpse(schrute::theoffice)
#> Rows: 55,130
#> Columns: 12
#> $ index            <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
#> $ season           <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ episode          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ episode_name     <chr> "Pilot", "Pilot", "Pilot", "Pilot", "Pilot", "Pilot",…
#> $ director         <chr> "Ken Kwapis", "Ken Kwapis", "Ken Kwapis", "Ken Kwapis…
#> $ writer           <chr> "Ricky Gervais;Stephen Merchant;Greg Daniels", "Ricky…
#> $ character        <chr> "Michael", "Jim", "Michael", "Jim", "Michael", "Micha…
#> $ text             <chr> "All right Jim. Your quarterlies look very good. How …
#> $ text_w_direction <chr> "All right Jim. Your quarterlies look very good. How …
#> $ imdb_rating      <dbl> 7.6, 7.6, 7.6, 7.6, 7.6, 7.6, 7.6, 7.6, 7.6, 7.6, 7.6…
#> $ total_votes      <int> 3706, 3706, 3706, 3706, 3706, 3706, 3706, 3706, 3706,…
#> $ air_date         <chr> "2005-03-24", "2005-03-24", "2005-03-24", "2005-03-24…

Or view the short vignette with:

vignette("theoffice")

Watch and learn

Julia Silge and David Robinson, creators of the tidyText package both used the {schrute} package for a #tidyTuesday analysis. Watch their videos and learn from the masters:

Other languages

This dataset is also available in python and julia

schrute's People

Contributors

bradlindblad avatar hypercompetent avatar skent259 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

schrute's Issues

Release schrute 0.2.0

Prepare for release:

  • devtools::check()
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Polish NEWS
  • Polish pkgdown reference index
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • Update cran-comments.md
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Keep all the curse words

The scripts that underlie {schrute} have certain words censored. For example:

Hey, send me that link to the monkey s*x video. I'm going to forward it like it's hot.

or

This is bull****!

We're all adults here, remove the censoring.

Release schrute 1.0.0

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Check if any deprecation processes should be advanced, as described in Gradual deprecation
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('major')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • git push
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • git push
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Release schrute 0.1.0

Prepare for release:

  • Check that description is informative
  • Check licensing of included files
  • usethis::use_cran_comments()
  • devtools::check()
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • Polish pkgdown reference index
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • Update cran-comments.md
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • usethis::use_news_md()
  • Update install instructions in README
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Finale seems incorrect (not sure about the others)

Hi, I haven't gotten around to looking through all the episodes, but I went through the last episode and it seems jumbled up and/or incomplete. Not sure if it's just jumbled up or both. The last line in the data cuts off with Jim and Pam's conversation with Dwight about leaving Dunder Mifflin.

Add Directors and Writers?

Hi Brad,

Thanks for assembling this really fun dataset! Something I was curious about after exploring the data in the schrute R package was how directors and writers contribute to the similarities or differences in word use per episode.

So, I went through IMDB and grabbed the credited directors and writers there for each episode in this Gist: office_writers.csv. Where there are multiple people credited, they're semicolon separated within the associated column.

I'll leave it up to you to determine if/how you'd like to incorporate these additional columns into your data structure.

Cheers,
-Lucas

air_date (and imdb_rating) information appears incorrect

Hi,

I noticed that some of the air_date and imdb_rating information from theoffice data set in this package is incorrect. Just downloaded version 0.2.0 from CRAN, and was digging in.

You can see this clearly from Season 4, but it occurs elsewhere. For example, S4 E03 says it was published on October 11:

theoffice %>% 
  filter(season == "04", episode == "03") %>%
  select(season, episode, episode_name, air_date, imdb_rating) %>%
  unique()

## # A tibble: 1 x 5
##   season episode episode_name                        air_date   imdb_rating
##   <chr>  <chr>   <chr>                               <fct>            <dbl>
## 1 04     03      Dunder Mifflin Infinity (Parts 1&2) 2007-10-11         8.5

But IMDB reports a publish date of October 4 (https://www.imdb.com/title/tt0386676/episodes?season=4).

I think the issue is that multi-part episodes increment the episode count by 2 in this dataset, whereas pulling the IMDB data directly would only increment the episode count by 1. This creates a miss-match on the join and would also explain the many office episodes with NA at the end of the season.

I found a (fairly manual) way to correct this that I'm willing to share if needed. I also have a 'corrected' version of the data at https://github.com/skent259/tidy-tuesday/blob/master/2020/2020-03-17/theoffice-data.rds. Note this is my first GitHub issue, so I'm not sure the best way to help with this. Please let me know if you need more information.

Release schrute 1.0.1

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • git push
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • git push

Missing data for some episodes

Hi! I just came across this package and have been using it in R. Thanks for curating this!

However, it seems like the dataset is missing data for some episodes?

For example: Season 3, Episode 11

subset(theoffice, theoffice$season == 3 & theoffice$episode == 11)

returns a tibble of 0x12

Same for these other episodes:

  • S4 E2
  • S4 E4
  • S4 E6
  • S4 E8
  • S5 E2
  • S5 E15
  • S6 E5
  • S6 E18
  • S7 E12
  • S9 E23

Release schrute 0.2.2

Prepare for release:

  • devtools::build_readme()
  • Check current CRAN check results
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Polish NEWS
  • Review pkgdown reference index for, e.g., missing topics

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.