Giter VIP home page Giter VIP logo

data-edu / data-science-in-education Goto Github PK

View Code? Open in Web Editor NEW
257.0 18.0 78.0 237.88 MB

Repository for the second edition of 'Data Science in Education Using R' by Emily A. Bovee, Ryan A. Estrellado, Joshua M. Rosenberg, and Isabella C. Velásquez

Home Page: http://www.datascienceineducation.com/

R 0.14% HTML 95.91% CSS 1.58% Shell 0.01% TeX 1.31% JavaScript 1.05%
r rstats education education-data bookdown

data-science-in-education's Introduction

Data Science in Education Using R

Note from Our Publisher

The authors of this text and the publisher Taylor and Francis are pleased to make Data Science in Education Using R available via bookdown at datascienceineducation.com. They request that readers access the book via the website or in print form only and do not download or reproduce copies in any other form. Any attempt to do so will be considered a contravention of the publisher’s terms of availability.

Reading the Book

We wrote this book for you and are excited to share it! You can read the current version at datascienceineducation.com. The print version is available now through Routledge.

The Aims of This Book

School districts, government agencies, and education businesses are generating data at a dizzying pace. They're serving it to teachers, administrators, and education consultants in a mind-boggling variety of formats. Educators and educational data practitioners wanting to use data to improve the lives of students know the questions they want to ask, but the available data is often not ready to be analyzed. Sometimes educators need to use high-cost proprietary systems to access and prepare data before using it to answer their questions.

Educational data rarely comes in a “ready-to-analyze” format. As a result, it's hard for enthusiastic practitioners to feel a connection between their questions and the data needed to answer them. To get value from the data-deluge, some educational data practitioners are adopting data science tools, like R. R is an Open Source programming language for data analysis. When data science meets education, the numbers confined to websites and PDF reports are set free. Teachers, administrators, and consultants apply programming and statistics to prepare data, transform it, visualize it, and analyze it to answer questions that make a difference for their students.

Our book focuses on data science in education, which we define as using data science techniques like preparing, exploring, visualizing, and modeling data, in order to support schooling at all levels. We want to make a case for learning about data science through field-specific examples. Understanding the unique challenges and starting to use a common field-specific language is important for mastering data science in education. We feel that discussing data science using education-specific scenarios more effectively speaks to the needs of educators.

Technology is transforming both the administrative and student-facing sides of education. It's becoming increasingly important for educators - not just people hired to analyze data - to understand what stories this new data tells them them about their students. Our book empowers educators from elementary school to higher education to transform educational data into actionable insights so it helps them serve their students and institutions. We wrote our book to be used as a main textbook in a graduate data science in education course. We also wrote it as a practical reference for data scientists working with education data.

By the end of this book the reader will understand:

  • The diversity of data analysis skills and applications in the education field
  • Special considerations that come with analyzing education data
  • That good data analysis has a basic workflow
  • The wonderful opportunity we have to shape the usefulness of data science in our education jobs

And, the reader will be able to:

  • Reflect on and define their role as a data analyst and educator
  • Identify and apply solutions to education data’s unique challenges, such as cleaning datasets and working with aggregate student data
  • Apply a basic analytic workflow through practice with education datasets
  • Be thoughtful, empathetic, and effective when introducing data science techniques in their education jobs

Chapters

  1. Introduction: Data Science in Education - You’re Invited to the Party!

  2. How to Use This Book

  3. What Does Data Science in Education Look Like?

  4. Special Considerations

  5. Getting Started with R and R Studio

  6. Foundational Skills

  7. Walkthrough 1: The Education Dataset Science Pipeline With Online Science Class Data

  8. Walkthrough 2: Approaching Gradebook Data From a Data Science Perspective

  9. Walkthrough 3: Using School-Level Aggregate Data to Illuminate Educational Inequities

  10. Walkthrough 4: Longitudinal Analysis With Federal Students With Disabilities Data

  11. Walkthrough 5: Text Analysis With Social Media Data

  12. Walkthrough 6: Exploring Relationships Using Social Network Analysis With Social Media Data

  13. Walkthrough 7: The Role (and Usefulness) of Multi-Level Models

  14. Walkthrough 8: Predicting Students’ Final Grades Using Machine Learning Methods with Online Course Data

  15. Introducing Data Science Tools To Your Education Job

  16. Teaching Data Science

  17. Learning More

  18. Additional Resources

  19. Conclusion: Where to Next?

  20. Appendices

Contributing

This project started in the #dataedu Slack channel. You can join the workspace here.

Community members can contribute by making changes through a pull request. We encourage community members to do their pull requests on separate branches. This helps us keep all the changes synced up.

Git Issue Labels

To help contributors participate, we're using labels so community members can identify tasks they want to help with. When working on an issue, assign yourself to the issue. This helps us keep track of the work and lets us know who to contact for more collaboration. The labels are:

  • good first issue: These are requests for changes that we think would be fun and achievable if you're new to git and GitHub.

  • discussion: Sometimes we need help talking through a topic to help us make a good design choice for our readers. These issues won't always result in a change, but they help us clarify what's best for the final product.

  • test code: These issues are for running code and giving feedback about how it went. If there were problems, you can help us by letting us know what happened.

  • bug: The code isn't running as expected and needs fixing.

  • help wanted: Need help getting code to run or writing a section. We'll make sure the problem we're working on is clearly described in the issue.

  • writing: New content needed. At least one author will be assigned to writing issues, but we welcome collaboration! Feel free to message the author on Slack or in the issue comments to coordinate.

  • review draft: These are requests to read through a draft chapter and provide feedback on the experience, including reability.

Contact Us

If you have questions, comments, or ideas you can reach the authors by email at [email protected] or on Twitter:

Citation

Bovee, E. A., Estrellado, R. A., Motsipak, J., Rosenberg, J. M., & Velásquez, I. C. (under contract). Data science in education using R. London, England: Routledge. Nb. All authors contributed equally. http://www.datascienceineducation.com/

data-science-in-education's People

Contributors

bdgibbo avatar bretsw avatar duttashi avatar efreer20 avatar federicomarini avatar ivelasq avatar jkaupp avatar jmostipak avatar joshuarosenberg avatar jrosen48 avatar jsonbecker avatar kierisi avatar menakak avatar nkenner avatar pursuitofdatascience avatar ramorel avatar restrellado avatar williambork33 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-science-in-education's Issues

Data Science in Education or Educational Data Science?

I saw that some chapters refer to "data science in education" (Chapter 2) and some chapters refer to "educational data science" (Chapter 4: Unique Challenges). Are they the same thing? If yes, should we norm on one term? If no, should we have definitions for them early on?

Create a style guide

This was an idea we discussed during our call on August 7, 2018. The idea was to create a style guide to make our code consistent. We should also use the style guide to select one package when there are multiple packages that have similar purposes.

error in random forest chapter: missing package

in compiling for bookdown the following error is showing up for chunks on lines:

  • 171-175
  • 185-195
  • 201-211
  • 278-285
  • 320-332
Error: Required package is missing

I've set the code chunk to error = TRUE in order to facilitate rendering the text, but these will need to be addressed.

drafting walkthrough using data from students in an online science class

I'd like to draft a walkthrough using some data from students in an online science class.

The data, which I collected with a colleague, is anonymized--or at least I think it is, though I'd like feedback on this; see this PR, and is from around 650 students from around 20 science classes in a large, state-wide, public high school. The high school provides individual courses to students who cannot otherwise take them; it's not a full high school, if that makes sense. It includes "pre" and "post" survey data on students interest in science, value of science, and their perceptions of their self-confidence/competence. It also includes a few "trace" measures, namely, how much time they spent in the LMS.

I would like to use this for one of the three planned walkthroughs and possibly for the section of the advanced uses chapter on multi-level or mixed effects models

Question about ideas for chapter 11 (students doing data science)

This chapter is an odd duck: every other chapter focuses upon teachers/administrators/researchers/policymakers using data science techniques (and R), whereas this focuses more upon teaching individuals at the K-12 age.

I wonder if in some ways we can suggest that some of the ideas and techniques used in this book could be used to teach youth/students, i.e. some of the ideas potential ideas for chapter 10 discussed in #36. Thoughts?

In addition to possibly recapitulating (and adapting for younger R learners), some possible distinctions that may make this chapter different could be:

  • K-12 students often need to meet standards
  • There aren't stand-alone data science standards (though there are many elements in the science, math, and even computer science standards, i.e. modeling and information literacy)

Discuss and develop narrative structure

During yesterday's call we noticed the narrative structure of the book coming together. We talked about how groups of chapters could serve the main points. These are the basic ideas:

Part 1: Chapters 1-4

  • Educators who use data science techniques face unique challenges
  • It takes a lot of people in different roles to create a good learning environment. Different education roles mean different datasets, techniques, and challenges

Part 2: Chapters 5-9

  • Readers can learn and practice steps in the analysis process
  • The walk-throughs use education datasets that feel familiar to education workers. Readers can work through examples that feel more practical than commonly used datasets like mtcars or iris
  • Resources like R for Data Science, blog posts, and data science forums are part of the ongoing learning process

A note on the tone of this section: This section can serve the same purpose as blog posts like this one. Readers can get practical instruction, then deepen their learning by reading more in-depth materials.

Part 3: Chapters 10-12

  • Now that educators are familiar with some data science techniques, it's time to tackle applying and sharing those techniques at work. This is its own skill.
  • Connect the skills taught in Part 2 to the unique challenges discussed in Part 1
  • Explore how to use programming, statistics, and content knowledge in an education workplace that is new to data science
  • Explore how educators can teach today's students to prepare them for using data in future work places

We can use this issue thread to discuss and develop these ideas. I'll follow the conversation here and will eventually move the narrative structure into the readme. Then writers can use the structure to focus their work on the main point of a chapter. They can also zoom out and see how well the chapter contributes to its section of the book.

Feel free to discuss!

addition of work works with data content

From Slack:

wonder if we could explicitly discuss who works with data and how they do so (maybe in the intro.), i.e. what types of data do administrators OR data analysts OR researchers primarily work with. And maybe we can suggest specific chapters that will be particularly relevant. Part of this could be delimiting who works with data but who is not a focus of our audience (i.e., parents, and, to a lesser extent cuz we touch on this, teachers and students)

Maybe this can be in chapter 2 under “pick a chapter that’s useful for you”?

possible next steps for walkthrough 1

Walkthrough 1 (06-walkthrough-1.Rmd) has a foundation, but has some room for improvement, especially with respect to the newer parts on visualizations and models.

Right now, these focus on a fairly obvious relationship between the time students spent on the course and the percentage of possible points earned. There are other relationships that could be more interesting. A few ideas:

  • Look at differences by the five subjects (anatomy and physiology, physics, biology, chemistry, and forensic science)
    • To do this analysis, the code for the subject would have to be identified from the course-ID; this would take a little regex work
  • Look at some of the relations of the self-report variables for interest, value, and self-efficacy (perhaps with time spent on course)

We could also look at gender, though it's really sex, and I'm concerned about treating this variable as a token dichotomous/dummy code one in any case (but am open to feedback on this!).

synthesize data set for random forest chapter

  • Review the file for chapter 8 (08-walkthrough-3.Rmd) to see which variables are included - in any way at all - in the analysis
  • Record what those variables are
  • Subset the online science motivation dataset, used in that walkthrough and stored in data/online-science-motivation/processed/online-science-motivation.csv, to include only those variables used in the analysis
  • Use synthpop::syn() on the subset data frame to synthesize a new data frame
  • Replace the online science motivation dataset with the new, synthesized dataset

Can't source theme_dataedu.R and palette_dataedu.R in 10a-walkthrough-6.Rmd

The RMarkdown file gets stuck when executing this

source("r/theme_dataedu.R")
source("r/palette_dataedu.R")

I see this in the console and I type y
Importing fonts may take a few minutes, depending on the number of fonts and the speed of the system. Continue? [y/n] y

Then it returns this:

input string 4 is invalid in this localeinput string 4 is invalid in this localeinput string 4 is invalid in this localeinput string 4 is invalid in this locale

chapter on Bayesian methods?

From my friend @markubsch via Twitter:

I recently looked at the GitHub page for your data science book. Have you though of having a chapter on bayesian methods as well? I think there are some arguments that make Bayes especially interesting in education.

style guide for plots?

Hi all! I've started making plots on a chapter. Would we be interested in having a style guide for ggplots? Happy to lead on creating one!

Connect datascienceineducation.com with our Netlify-hosted gitbook?

At some point, we need to connect datascienceineducation.com with our Netlify-hosted gitbook. I think this would involve changing the nameservers to point to netlify - and then changing some Netlify setting to use our custom URL.

Seperately, would it make any sense to also purchase dsieur.com?

Ideas for potential data sets?

Hi all, curious what thoughts you all have on potential data sets to use in examples. I added a file to the planning folder with just a few bare-bones ideas (see here), but we can edit that in conjunction with discussing this here or elsewhere.

coding convention suggestions for walkthrough 6

in reading through walkthrough 6 I noticed that the code switched back and forth between base R and the tidyverse styles in cases where tidyverse solutions are available. depending on the intended audience for the text it may be beneficial to choose a single syntax structure and use it consistently throughout the text.

I'm happy to take a pass at some of the code formatting if that would be helpful!

publicly available data

Hi again,

I can most contribute on topics regarding wrangling public datasets. I've worked with NAEP and would be glad to take on that chapter. Another thing I was thinking was showing imputation with EdFacts graduation data, which is suppressed when there are small cell sizes. Let me know if this is applicable to your work and how I can get started!

Analysis of Gradebooks -- In scope?

It might be useful to include a section on EDA within the context of gradebooks. I suspect thousands of teachers and department chairs have exported a gradebook, opened the spreadsheet, and wondered what to do next. I myself have wondered what to look for when combing through summative and formative assessments.

Please ignore the vulgarity of the code, but I ventured a small tutorial for the Michigan Virtual Learning Research Institute (MVLRI) here. I would be interested to read a smarter person's take on something like this.

Examples of Data Science Roles Needed

Can you please send me a DM on Slack (RyanEs) with a short description about the data science work you do in education?

I'm working on chapter 3, which is about the role of a data scientist in education. I want to get some examples of what y'all do in your day-to-day work and what your responsibilities are as someone who does data science in the education field. I think sharing different examples would tell the story about data science in education better than trying to nail it down to single definition.

After I collect a few, I'll write them up then try and draw some conclusions from the collection of activities.

Ideas for chapter 10 (solutions for adopting data science techniques in education)

Just a few ideas for this chapter--open to (and requiring) more ideas & input:

  • Have a project/problem to work on
  • Start where people are (with the tools they use and the data they have - as well as the problems they are trying to solve with data science/R)
  • Recognize that using data science tools such as R can take a long time and an investment of effort

Discussing this burgeoning book in an AERA conference presentation?

Hi all, I have an opportunity to discuss 'open science' as part of a proposal to present at the American Educational Research Association (AERA) conference.

My first thought was to discuss some of my experiences writing this book. I've found it rewarding and perspective-broadening to write this collaboratively and in the open. Though it's not finished (and is very much still in-progress), given the chance, I'd like to share what we've done, but I'd like your thumbs up to do so. If you have thoughts (positive, negative, or other!) about this, please let me know!

Also, if folks would like to join in the presentation (helping with the proposal and, if you like and if it is accepted, presenting at the conference in Toronto in April), I would welcome that (it's not necessary to be at the conference to be involved and at least for AERA many authors aren't able to attend) .

Typo in 06-walkthrough-1.Rmd

This book is awesome. Found a typo in walkthrough 1: in the code chunk on lines 205-213:

# split course section into components
s12_course_data <- s12_course_data %>%
  separate(col = CourseSectionOrigId,
           into = c('subject', 'semester', 'section'),
           sep = '-',
           remove = FALSE)

CourseSectionOrigId should be CourseSectionOrigID, otherwise it's not found.

What to do with chapter 6?

Does anyone else find it sort of strange that Advanced Uses (06-advanced-uses.Rmd) comes before our walkthroughs? Can we move this--or integrate the content that would be in this into the walkthroughs?

fix simulated data for walkthrough 4

For walkthrough 4 (09-walkthrough-4.Rmd), there is a problem with the simulated data. The key when generating the data is that nominators with higher values of yvar2 need to have relations with - have nominated - nominees with higher levels of yvar1. This isn't the case yet.

Changes to #28 (foundational skills draft)

Thank you @ivelasq, @kierisi, and @restrellado for your thoughts on #28. Here are the things that based on the feedback that I think need to be done:

  1. The idea of having two tracks seems to have some traction: I like this and forgot that some our readers will have experience using R - or at least may have used it a few times, attended a workshop, or been involved with a collaborator who used it. I think the following sounds great:
    • one can be 'installation and setup of R and R Studio', which covers installation, projects, and packages
    • the second can be 'data loading and manipulation using the tidyverse', which covers reading/saving files, pipes, selecting, filtering, etc. chapters.
  2. I concur that there's a lot of ways to load different data but not a descriptive of what to use when; I think we can and should address this.
  3. Finally, I think we should mention more about what purpose swirl or DataCamp may serve to our audience.

I'll leave this issue open for a little while (is one week okay?) and then will plan to make these changes.

error in multilevel models part 1: missing column name

for now I've set the code chunk to ignore the error when compiling using

Quitting from lines 177-184 (06-walkthrough-1.Rmd) 
Error in eval_tidy(enquo(var), var_env) : 
  object 'CourseSectionOrigId' not found
Calls: local ... separate -> separate.data.frame -> <Anonymous> -> eval_tidy
In addition: Warning message:
funs() is soft deprecated as of dplyr 0.8.0
please use list() instead

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.