Giter VIP home page Giter VIP logo

genetic-data-analysis-1's Introduction

Analysis of Genetic Data 1: Inferring Population Structure

During this short workshop, we will apply simple numeric techniques such as principal components analysis (PCA) to investigate human genetic diversity and population structure from large-scale genetic data sets. We will investigate how large genetic data sets are commonly represented in computer files, and we will use popular command-line tools such as PLINK to prepare the "raw" genetic data for analysis. We will use R to compute principal components from the genetic data and visualize the results. This workshop is mainly intended to develop practical computing skills for researchers working with genetic data—concepts such as "genotype" and "allele frequency" will not be explained. This will be a hands-on workshop and we will do "live coding" throughout, so please bring your laptop!

Attendees will: (1) work through the steps of a basic population structure analysis in human genetics, starting with the “raw” source data, and ending with a visualization of population structure estimated from the genetic data; (2) understand how large genetic data sets are commonly represented in computer files; (3) use command-line tools (e.g., PLINK) to manipulate genetic data; (4) use R to compute principal components, and visualize the results of PCA; (5) learn through "live coding."

Prerequisites

This hands-on workshop assumes participants are already familiar with R and a UNIX-like shell environment. An RCC user account is recommended, but not required. Guest access to the RCC cluster will be available in class to those with no RCC account. All participants must bring a laptop with a Mac, Linux, or Windows operating system that they have administrative privileges on.

Notes on data files

  • 1kg.pop describes the population labels used in the 1000 Genomes data. This information comes from Supplementary Table 1 of the most recent 1000 Genomes paper (Nature, 2015, doi:10.1038/nature15393).

  • omni_samples.20141118.panel was downloaded from this FTP location: ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/hd_genotype_chip

  • 20140625_related_individuals.txt was downloaded from this FTP location: ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502. This file gives information about the 31 genotype samples that were found to be closely related. The columns in the file from left to right are: (1) sample; (2) population; (3) gender; and (4) reason for exclusion.

Other information

Credits

These materials were developed by Peter Carbonetto at the University of Chicago. Thank you to Matthew Stephens for his support and guidance.

genetic-data-analysis-1's People

Contributors

pcarbo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.