Giter VIP home page Giter VIP logo

ncme_training_session_2015's Introduction

2015 NCME Training Session

Leveraging Open Source Software and Tools for Statistics/Measurement Research

Join the chat at https://gitter.im/CenterForAssessment/NCME_Training_Session_2015 License

Welcome to the GitHub repository for the 2015 NCME Training Session: Leveraging Open Source Software and Tools for Statistics/Measurement Research. This version controlled repository contains or links to all the resources associated with the training session. During this training session you'll be introduced to ideas and technologies associated with open science and reproducible research.

Traditional work-flows associated with statistics/measurement research currently runs counter to many of the principles associated with open science and reproducible research. In this training session, we introduce participants to some widely used open source tools that overcome many of the limitations of the closed traditional statistics/measurement research work-flow. Modern development tools and practices can be utilized as part of statistics/measurement

As the title of the title of the training session suggests, participants will need a laptop computer with some open source programs installed to participate in the training session. The information that follows introduces the software/tools and provides information on how to install The programs/tools we'll use include:

One of the most difficult parts of using modern software/tools for this type of research is just getting your work environment (i.e., your laptop) set up with the tools. But once set up, the benefits are huge. Welcome to the bleeding edge. 😄

GitHub

Version control of content is fundamental to reproducible results. Version control has been used by software developers for decades as a way of collaborating and managing the chaos that ensues when multiple programmers are developing using the same software codebase. Git is a modern distributed version control system created by Linus Torvalds, the creator of the Linux operating system. GitHub is a web-based version control system based upon Git with many other bells and whistles that is extremely popular for open source development. Netflix uses GitHub for its source code development. Beyond source code development, the whole idea of “version control” has been implemented with German law where all laws GitHub in a version controlled fashion so people can examine the law and its development.

For this training session, we'll be using GitHub for version control. You'll need to do two things:

  • Create a GitHub account. Public repositories are free and
  • Install a GitHub client on your computer. Mac, Windows, or Linux. Git is a command line application, but it's easier to start with a GUI client.

RStudio

R is an open source statistical analysis/computing environment. R's rapid growth in use places it among the most popular statistical analysis software. Much of R's popularity stems from it being open source (and free!) together with its extensible nature. As a programming language, R has become prominent in recent years as a tool performing high level analytics. As a programming language, R has begun borrowing many tools from the programming world. One of the most prominent among them is RStudio a free and open source integrated development environment (IDE). An integrated development environment is a tool designed specifically for developing tools using a computer language. In the case of RStudio, the environment is specifically designed for developing "data products" using R.

For this training session, we'll be using RStudio for creating open and re-usable code for statistical analysis; You'll need to:

pandoc

pandoc is a universal document/format converter that utilizes LaTeX. Open research in the 21st century requires distribution of results across multiple media and platforms. "Create once and distribute everywhere" is the dream of content creators. To make this dream a reality requires conversion from a base content document to multiple other formats. The pandoc library allows for such conversions and is used extensively.

For this training session, we'll be using pandoc for export to multiple formats. pandoc itself requires LaTeX for some conversions. You'll need to:

  • Install pandoc for your operating system.
  • Using the instructions provided on the pandoc page, install LaTeX for your operating system.

Training Session Schedule

8:00 to 9:15 Overview of open research/dissemination and the tools/platforms that support it. First hour will be a presentation/overview and the last 15 minutes we’ll confirm everyone is set up with the appropriate software on their computers. Advanced instructions will be sent out regarding software needed for the training session.

9:15 to 9:30 Break

9:30 to 10:30 Introduction to GitHub and version control. This part of the training session will introduce users to the version control via GitHub. Users will learn how to fork a repository, modify that fork, make and accept pull requests.

10:30 to 12:00 An application of GitHub to document production. Document production and dissemination in the 21st century requires that the document be available to users in multiple formats. Users access documents using multiple media (e.g., paper and digitally) and on multiple devices (e.g., laptop, phone, e-reader) so that modern document production requires flexibility to create content once and distribute it everywhere. This part of the training session will introduce users to templates that allow users to realize the "write once, publish everywhere" goal.

12:00 to 1:00 Lunch

1:00 to 2:00 Introduction to GitHub Wikis and GitHub Pages. GitHub includes two great options for helping users understand the nature of the project you're hosting in your repository. GitHub Wikis is a markdown based wiki that allows for collaborative wiki building around a project. GitHub Pages is a the static HTML website part of all GitHub repositories. GitHub Pages allows projects to produce websites associated with projects that can include highly sophisticated components including interactive graphics, blogs, and vector math fonts. With a basic distributed version control repository as its foundation.

2:00 to 3:00 Introduction to R and its system of user created packages. R is an open source environment for statistical analysis/computing whose use has skyrocketed over the last decade. A major part of its success is its extensible nature: Users can create their own packages that extend the functionality of R. These packages can be distributed via R's [CRAN](R based package installation or utilize GitHub as the package repository. One is standing on the shoulder of giants!

3:00 to 3:15 Break

3:15 to 4:30 Introduction to creating a source code repository and package on GitHub. Reproducible and open research allows those using the research to reproduce the results. Version control repositories allow one to provide the source code used and, if using software like R, package that source code in a fashion allowing others to utilize it.

4:30 to 5:00 For the final half hour, we will discuss how open source software tools can be combined to create an open source project that can be used as the basis for multi-state/national analytics, reporting and visualization work. The presentation will be based upon the SGP package which is used by over 25 states for large scale state growth analyses. As part of the project, state project websites are set up on GitHub for hosting analyses and documentation associated with work performed so that it is reproducible using cloud based computing resources like AWS/EC2.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.