Giter VIP home page Giter VIP logo

pandasintro's Introduction

Workshop for CBioVikings

Title: Interactive Data Analysis in Python with Pandas using Jupyter Notebook

Presented by: David Lyon, Researcher @ Novo Nordisk Fonden Center for Protein Research, University of Copenhagen

email: [email protected]

Introduction Data comes in many forms, shapes and flavors. As tasty and free spirited as this may sound, the diligent data analyst often spends most of her/his time preparing and wrangling the data itself, rather than running or coding a particular model or statistical test. This is where Python and Pandas come into play, providing high-level, flexible, and efficient tools for manipulating your data as needed.

Program CBioVikings will get a short introduction on how to use Jupyter Notebook (formerly IPython Notebook), an interactive computational environment, which combines code execution, rich text, mathematics, plots and media. Then we’ll delve right into Data Analysis using Pandas, a Python library providing easy-to-use data structures and data analysis tools.

Structure Introduction 30-45 min Break 7.5 min Exercises 30-60 min

Prerequisites This evening workshop is aimed at people with basic Python skills, but "all levels" are welcome and encouraged to attend. Please install the following software before the workshop and check that it is running (or at least download it before coming).

1.) Git https://git-scm.com/

2.) Python (2.x or 3.x), Enthought or Anaconda (Python and other commonly used packages) https://www.python.org/ https://www.enthought.com/canopy-subscriptions/ (Canopy Express is FREE and very easy to set up --> recommended if you are new to Python/programming) https://www.continuum.io/downloads

The following Python packages can be installed using "pip" (a Python package manager) or found at "pypi" as well as individual web-sites. https://pip.pypa.io/en/stable/installing/ https://pypi.python.org/

EASY INSTALLATION using pip: enter the following in the terminal to install multiple packages at once including all dependencies "pip install ipython jupyter numpy pandas matplotlib seaborn" (n.b. if pip is not available write the following: "easy_install pip" depending on your installation you might need to add python and pip to your environmental variables)

3.) IPython and Jupyter http://jupyter.readthedocs.org/en/latest/install.html

4.) Numpy http://www.numpy.org/

5.) Pandas http://pandas.pydata.org/

optional: 6.) Matplotlib http://matplotlib.org/

7.) xlrd http://www.python-excel.org/

RESOURCES used for this workshop

Pandas website

http://pandas.pydata.org/

Very good (and long) tutorial.

https://github.com/fonnesbeck/statistical-analysis-python-tutorial

https://www.youtube.com/watch?v=DXPwSiRTxYY

Book by Wes McKinney

http://shop.oreilly.com/product/0636920023784.do

Pandas cheat sheet

https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf

Exercises

https://github.com/fonnesbeck/statistical-analysis-python-tutorial https://github.com/guipsamora/pandas_exercises https://github.com/ajcr/100-pandas-puzzles http://gregreda.com/2013/10/26/working-with-pandas-dataframes/ http://pandas.pydata.org/

Jupyter

http://jupyter.org/

exploratory computing with Python

http://mbakker7.github.io/exploratory_computing_with_python/

Intro to pandas data structures

http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/

PyCon 2017: Optimizing Pandas Code for Performance (talk by Sofia Heisler)

https://www.youtube.com/watch?v=HN5d490_KKk&index=9&list=WL

https://github.com/sversh/pycon2017-optimizing-pandas

pandasintro's People

Contributors

dblyon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

udumbara

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.