##Data Science Resources
Data Science is a multidisciplinary field covering at the very minimum - statistics, programming, machine learning
![Data Science Pipeline] (http://machinelearningmastery.com/wp-content/uploads/2014/05/Overview-of-the-Applied-Machine-Learning-Process.png)
- Intro to ipython - A curation of Ipython Notebooks great for introductory level to python, programming, comp sci, data science and other topics.
- How do I Become a Data Scientist? - Some more great starting points from William Chen.
- Coursera - Data Science Specialization at Coursera - many other courses available as well.
- Udacity - Online MOOCs that are the Data Science related courses. by I
- Data Science Bootcamps - A collection of all bootcamps currently on the market as of April 5, 2014 by Ikechukwu Okonkwo.
- Coursera Machine Learning Course - Andrew Ng's pinnacle Machine Learning course.
- Edx - EDX courses related to data science.
###Python
- Python @ Codecademy - If you have never used Python, right this way..
- The Python Wiki - Good resource with lots of info about Python.
- Python for Data Science Tutorial - Kaggle - Stepping into Data Science with Kaggle and installing some libraries.
- Introduction to Data Processing with Python - Just as the name says - some introductory level information and exercises.
- Git tutorial - Git for Version Control. Simple tutorial for Git from Github.
- Anyone Can Code - Languages, tutorials, cheat sheets, algorithms and data structures
There are many other languages that could be used in machine learning: Julia, R, Cython, Pig, Scala, Java, etc.
*[Bad data guide] (https://github.com/Quartz/bad-data-guide)
- Algorithms & Data Structures - Binary trees, hash tables, linked lists, big(O) notation and more.
- Algorithm & Data Structures - Well organized detailed and digestible site full of content covering data structures, algorithms, recursion and assignments!
- Big O Notation - Great details and visual of big-O notation.
- Visualizations of Data Structures - Collection of different algorithms (graph problems) and data structures (queues, heaps, hashes) that walks through the visualization to get a better intuitive understanding.
- Data Structures CheatSheet & Big Oh Notation
- Data Structures CheatSheet -smaller more readable
- Coursera: Stanford Algorithms Design & Analysis - Course on algorithm design & analysisfea
####Statistics Some primers on understanding statistics and other resources to get a deeper understanding.
-
Statistics Without the Agonizing Pain - John Rauser's really great video on statistics - funny and engaging with a good message.
-
[Thinkstat2] (http://greenteapress.com/thinkstats2) - Statistics book a great read
-
Probability Programming and Bayesian Methods for Hackers - full book all online through ipython notebooks.
-
Probabilistic Programming and Bayesian Methods for Hackers - Github Repo for the book above.
-
Khan Academy: Statistics - Tons of videos to help learn statistics concepts.
-
Statistical Distributions in iPython Notebook - Discrete, Bernoulli, Poisson, Binomial, Alpha, Beta etc. The descriptions are mathematical - will find another resource to explain.
-
[Advanced Data Analysis from an Elementary Point of View] (http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf) - Cosma Rohilla Shalizi -
Veternan
-
[An Introduction to R] (http://cran.r-project.org/doc/manuals/R-intro.pdf) - W. N. Venables, D. M. Smith, and the R Core Team -
Beginner
-
[Analyzing Linguistic Data: a practical introduction to statistics] (http://www.ualberta.ca/~baayen/publications/baayenCUPstats.pdf) - R. H. Baayan -
Beginner
-
[Applied Data Science] (http://columbia-applied-data-science.github.io/appdatasci.pdf) - Ian Langmore and Daniel Krasner -
Intermediate
-
[Concepts and Applications of Inferential Statistics] (http://vassarstats.net/textbook/) - Richard Lowry -
Beginner
-
[Forecasting: Principles and Practice] (https://www.otexts.org/fpp/) - Rob J. Hyndman and George Athanasopoulos -
Intermediate
-
[Introduction to Probability] (http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/pdf.html) - Charles M. Grinstead and J. Laurie Snell -
Beginner
-
[Introduction to Statistical Thought] (http://www.math.umass.edu/~lavine/Book/book.pdf) - Michael Lavine -
Beginner
-
[OpenIntro Statistics - Second Edition] (http://www.openintro.org/stat/textbook.php) - David M. Diez, Christopher D. Barr, and Mine Cetinkaya-Rundel -
Beginner
-
[simpleR - Using R for Introductory Statistics] (http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf) - John Verzani -
Beginner
-
[Statistics] (http://upload.wikimedia.org/wikipedia/commons/8/82/Statistics.pdf) -
Beginner
-
[Think Stats: Probability and Statistics for Programmers] (http://www.greenteapress.com/thinkstats/thinkstats.pdf) - Allen B. Downey -
Beginner
####Stats/Engineering Libraries A collection of workhorse libraries that are elemental for any python data scientist.
- Pandas Wes McKinney's pandas library for EDA on small to medium sized data sets when you don't want to put the infrastructure for SQL or when it isn't necessary. It has many other great applications other than just better than SQL on small to medium data sets.
- Numpy/Pandas/Scipy Cheatsheet - self explanatory
- SciPy - Open-source software for mathematics, science and engineering.
- NumPy - Fundamental package for scientific computing with Python.
- StatsModels - Module that allows users to explore data, estimate statistical models and perform statistical tests.
- PyMC - Bayesian estimation useful for Markov chain Monte Carlo analysis (among other things).
- [Theano] (http://deeplearning.net/software/theano/) - Deep learning theano library
- [Keras] (http://keras.io/) minimalist, highly modular neural network library on top of Theano
####Network Analysis
- [Introduction to Social Network Methods] (http://faculty.ucr.edu/~hanneman/nettext/) - Robert A. Hanneman and Mark Riddle -
Intermediate
- [Networks, Crowds, and Markets: Reasoning About a Highly Connected World] (http://www.cs.cornell.edu/home/kleinber/networks-book/) - David Easley and Jon Kleinberg -
Intermediate
- [Network Science] (http://barabasilab.neu.edu/networksciencebook/downlPDF.html) - Sarah Morrison -
Beginner
- [The Wealth of Networks] (http://www.benkler.org/Benkler_Wealth_Of_Networks.pdf) - Yochai Benkler -
Beginner
###Data Analysis ####Fundamentals
- [Fundamental Numerical Methods and Data Analysis] (http://ads.harvard.edu/books/1990fnmd.book/) - George W. Collins -
Beginner
- [Introduction to Metadata] (http://www.getty.edu/research/publications/electronic_publications/intrometadata/index.html) - Murtha Baca -
Beginner
- [Introduction to R - Notes on R: A Programming Environment for Data Analysis and Graphics] (http://cran.r-project.org/doc/manuals/R-intro.pdf) - W. N. Venables, D. M. Smith, and the R Core Team -
Beginner
- [Modeling with Data: Tools and Techniques for Scientific Computing] (http://modelingwithdata.org/about_the_book.html) - Ben Klemens -
Beginner
###Data Science Introduction
- [Data Science: An Introduction] (http://en.wikibooks.org/wiki/Data_Science:_An_Introduction) - Wikibook -
Beginner
- [Disruptive Possibilities: How Big Data Changes Everything] (http://www.amazon.com/Disruptive-Possibilities-Data-Changes-Everything-ebook/dp/B00CLH387W) - Jeffrey Needham -
Beginner
- Introduction to Data Science - Jeffery Stanton -
Beginner
- [Real-Time Big Data Analytics: Emerging Architecture] (http://www.amazon.com/Real-Time-Big-Data-Analytics-Architecture-ebook/dp/B00DO33RSW) - Mike Barlow -
Beginner
- [The Evolution of Data Products] (http://www.amazon.com/The-Evolution-Data-Products-ebook/dp/B005QEKQUY/ref=sr_1_63?s=digital-text&ie=UTF8&qid=1351898530&sr=1-63) - Mike Loukides -
Beginner
- [The Promise and Peril of Big Data] (http://www.aspeninstitute.org/sites/default/files/content/docs/pubs/The_Promise_and_Peril_of_Big_Data.pdf) - David Bollier -
Beginner
###Data Processing
- [Data-Intensive Text Processing with MapReduce] (http://lintool.github.io/MapReduceAlgorithms/MapReduce-book-final.pdf) - Jimmy Lin and Chris Dyer -
Intermediate
####Data Acquisition Libraries that are very helpful for abstracting away some of the complications of scraping or working with HTTP.
- BeautifulSoup - A python library to make web-scraping HTML easier.
- Requests - HTTP for Humans - python library that makes working with http and api's more effortless
####Data Mining
- [Data Mining and Analysis: Fundamental Concepts and Algorithms] (http://www2.dcc.ufmg.br/livros/miningalgorithms/files/pdf/dmafca.pdf) - Mohammed J. Zaki and Wagner Meira Jr. -
Intermediate
- [Data Mining and Knowledge Discovery in Real Life Applications] (http://www.intechopen.com/books/data_mining_and_knowledge_discovery_in_real_life_applications) - Julio Ponce and Adem Karahoca -
Beginner
- [Data Mining for Social Network Data] (http://link.springer.com/book/10.1007%2F978-1-4419-6287-4) - Springer -
Veteran
- [Mining of Massive Datasets] (http://infolab.stanford.edu/~ullman/mmds/book.pdf) - Anand Rajaraman, Jure Leskovec, and Jeffrey D. Ullman -
Intermediate
- [Knowledge-Oriented Applications in Data Mining] (http://www.intechopen.com/books/knowledge-oriented-applications-in-data-mining) - Kimito Funatsu -
Intermediate
- [New Fundamental Technologies in Data Mining] (http://www.intechopen.com/books/new-fundamental-technologies-in-data-mining) - Kimito Funatsu -
Intermediate
- [R and Data Mining: Examples and Case Studies] (http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf) - Yanchang Zhao -
Beginner
- [The Elements of Statistical Learning] (http://statweb.stanford.edu/~tibs/ElemStatLearn/) - Trevor Hastie, Robert Tibshirani, and Jerome Friedman -
Intermediate
- [Theory and Applications for Advanced Text Mining] (http://www.intechopen.com/books/theory-and-applications-for-advanced-text-mining) - Shigeaki Sakurai -
Intermediate
####Processing & Exploratory Data Analysis A collection of documents explaining some of the ways to do processing & EDA.
- Unix for Processing - sed & awk for data processing.
- Pandas - Already mentioned is great for data processing - cleaning, filtering and getting rid of nan's, normalizing, scaling, replacing values, etc.
- SciKit Learn for Preprocessing - Doc on sklearn's preprocessing methods.
- Regular Expressions - Regex explained.
###Databases/Frameworks A collection of databases & frameworks that are helpful for data management and are the industry standard.
- SQL - SQL Database - I linked to Postgres since that is the version I use.
- Psycopg - Python <> Postgres. Able to adapt PostgreSQL for the python environment.
- SQL Cheet Sheet
- SQLZoo - Develop your skills
- SQLSchool - Develop your skills MongoDB - NoSQL database
- PyMongo - Python Mongo Driver.
- MongoDB - cheatsheet - Cheat sheet for MongoDB
- Apache Hive - Uses Hive Query Language (HQL) - similar to SQL for data at scale.
- Hive Cheatsheet - Self Explanatory.
- ElasticSearch - For scalable, fast text search/analysis.
- Neo4j - Leading graph database.
- Redis - Key-value open source data structure server.
- Redshift - AWS petabyte-scale data warehouse solution.
- Hadoop - the definitive guide - Hadoop ecosystem.
- Spark - Lightening fast cluster computing.
- MRjob - Run MapReduce jobs on Hadoop or AWS.
###Machine Learning There is a lot of information available online about the theory, mathematical intuition, tuning for this discipline. Here are some tools that are currently available.
- A visual introduction to Machine Learning - Awesome d3 visualization to help understand machine learning.
- SciKit-Learn - Simple and efficient machine learning tools for data mining and data analysis
- NLTK - Natural Language Toolkit to work with human languages data.
- Tour of Machine Learning Algorithms - Blog post about some of the high level ML methods
- VIDEO - How to get started w/mL - Melanie Warrick @ PyCon 2014.
- Some ML methods classified - Classification for some sample ML algorithms by Melanie Warrick.
- SciKit-image - Algorithms for image processing.
- Machine Learning CheatSheet - I would actually say this is more than just a cheat sheet given that there are > 100 pages of notes.
- Awesome Machine Learning - List of machine learning libraries in all languages and also Kaggle competition source code by Joseph Misiti.
- [A Course in Machine Learning] (http://ciml.info/) - Hal Daume -
Beginner
- [A First Encounter with Machine Learning] (https://www.ics.uci.edu/~welling/teaching/273ASpring10/IntroMLBook.pdf) - Max Welling -
Beginner
- [Bayesian Reasoning and Machine Learning] (http://web4.cs.ucl.ac.uk/staff/D.Barber/textbook/031013.pdf) - David Barber -
Veteran
- [Gaussian Processes for Machine Learning] (http://www.gaussianprocess.org/gpml/chapters/) - Carl Edward Rasmussen and Christopher K. I. Williams -
Veteran
- [Introduction to Machine Learning] (http://alex.smola.org/drafts/thebook.pdf) - Alex Smola and S.V.N. Vishwanathan -
Intermediate
- [Probabilistic Programming & Bayesian Methods for Hackers] (http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/) - Cam Davidson-Pilon (main author) -
Intermediate
- [The LION Way: Machine Learning plus Intelligent Optimization] (http://www.lionsolver.com/LIONbook/) - Robert Battiti and Mauro Brunato -
Intermediate
- [Thinking Bayes] (http://www.greenteapress.com/thinkbayes/) - Allen B. Downey -
Beginner
- [Sklearn Basics] (http://nbviewer.ipython.org/github/jakevdp/sklearn_scipy2013/tree/master/notebooks/) -
Beginner
###Machine Learning Theory
- MathematicalMonk ML videos - Amazingly concise and digestible videos detailing how different machine learning algorithms function (e.g. logistical, sums, knn, Bayes, etc.)
- Logistic Regression Explained - Detailed explanation of how logistic regression works.
- Video explaining how Random Forests Algorithm works - Random Forests Algorithm explained.
- Random Forest Explained - Write up about Random Forest in layman's terms.
- Machine Learning 101 - Large set of ML resources for beginners.
###Deep Learning Getting a lot of media traction is deep learning - get your feet wet with some of these resources:
- HackerNews for Deep Learning - As the name says - a hacker news for Deep Learning
- Deeplearning4j - Deep Learning in Java.
- Neural Networks Explained - Video - High level and intuitive explanation how Neural Networks (deep learning) works.
- Deep Learning Tutorial
- What is Deep Learning
- Free Online Deep Learning Book - in-depth book about NN & deep learning
- The Brain vs Deep Learning - Blog Post
- Deep Learning Summer School 2015 Git repo
###Data Science Application ####Information Retrieval
- [Introduction to Information Retrival] (http://nlp.stanford.edu/IR-book/) - Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze -
Intermediate
######Time-Series
####Model Selection Resources about how to decide on your model.
- SciKit Learn Flow Chart for Model Selection - A helpful for a starting point selecting SKlearn algorithms.
####Model Evaluation Resources to help with understanding model evaluation.
- Evaluating ML Algorithms - Blog Post from MachineLearningMastery about how to evaluate your performance.
- Cross-Validation - Critical concept to evaluate the performance of your models.
- K-fold & Grid Search in Scikitlearn - Demo on how to implement kfold cross validation and grid-search using scikit-learn.
- Scikit-learn Cross Validation doc - Self explanatory title.
- Cross Validation - how to select your final Kaggle Model - Importance of cross-validation described specifically in how it effects Kaggle competition scores.
####Feature Engineering A critical element of Data Science to improve your performance but minimally talked about.
- Ipython Notebook for Feature engineering - Some discussion about Feature Engineering.
- CS Princeton Course - Course content on Feature Engineering.
- Blog Post about Feature Engineering / Data Exploration - Blog post about topic.
Resources on other topics that are very helpful for data scientists and product.
- A/B Testing - Blog about A/B testing.
- A/B Testing - And how you are screwing it up.
- Bloom Filters - Python notebook about bloom filters.
- Bloom filters - Bloom Filters.
- Reservoir Sampling - A primer on Reservoir Sampling.
- Reservoir Sampling Again
- Monte Carlo for the Monty Hall Problem - Hyon Chu puts on a good explanation to MC for the Monty Hall Problem.
- Markov Chain Monte Carlo - Opening the black box of MCMC.
- Multithreading and Queues - How to build multithreading and queues.
- Basics of Multithreading and queses - More about multithreading.
- Multithreading & Queuing - Another great resource for multithreading & queuing.
- Building a Recommender System - Quora answer to this question. Helpful starting point.
Collection of the best libraries that I know for easy and powerful data visualizations.
- ggplot - ggplot for python ported by the team at yhat.
- matplotlib - Awesome plotting library for python.
- d3 - Mike Bostock's viz library - the de facto gold standard for polished visualization - in js, steep learning curve but beautiful outcomes.
- bokeh - Interactive visualization library.
- d3py - Another library for data viz.
- vincent - Help with python for d3.
- seaborn - Clean statistical data visualization library.
- [Interactive Data Visualization for the Web] (http://chimera.labs.oreilly.com/books/1230000000345/index.html) - Scott Murray -
Beginner
- [Plotting and Visualization in Python] (http://nbviewer.ipython.org/urls/gist.github.com/fonnesbeck/5850463/raw/a29d9ffb863bfab09ff6c1fc853e1d5bf69fe3e4/3.+Plotting+and+Visualization.ipynb) -
Beginner
Other available Visualization Resources.
- Scott Murray's D3 Tutorials Tutorials from Interactive Data Visualization for the Web
- tributary.io - live code visualization platform designed specifically for D3.js
- plot.ly - A web visualization and data processing platform
- blockspring - Share code and visualizations through a single platform
- dot.append - Ian Johnson (enjalot) goes through several live-coding examples using D3
- Text Visualization Plots - Interactive site with different types of text visualization for different problems.
The importance of design theory in data visualization, story telling and presentations could not be understated. It can take great content and make it confusing or virtually unusable, or it can make content sing and connect with the audience. Through better understanding of design theory, UI principles, a data scientist (or anyone) can convey more understandable information to the intended audience and give a strong story to their content.
- Slidedeck on Data Storytelling & Visualization - Overview of different story structures and how to tell a story with data.
- Accelerating Understanding Through Data Visualization - Accenture White paper on Data Visualization
Collection of ipython notebooks that are helpful as examples to either using tools or to explain certain topics.
- Pandas Tutorial - Basic intro to Pandas in notebook form.
- Pandas / Stats Tutorial - Intermediate tutorial by Christopher Fonnesbeck Feb 2014.
- Scipy Tutorial - Basic Scipy Tutorial.
- Numpy Tutorial - Basic Numpy Tutorial.
- Multiple Regressions using Statsmodels - Using statsmodels for regression.
- Intro to PyMC - Intro to PyMC.
- More on PyMC - More PyMC.
- Kaggle Titanic Comp Tutorial - Kaggle Titanic Tutorial using RandomForests.
- Psycopg2 tutorial in Python - How to use Psycopg2.
- SQL in iPython - SQL in Python.
- Mongo in Python - Mongo in Python.
- Beautiful Soup Tutorial - Beautiful Soup!
- Sci-Kit Learn Basics - Machine Learning Basics with scikit-learn.
- MatPlotLib - Some of the possibilities of data-viz with MatPlotLib.
- Choosing the right priors - Bayesian - Bayesian statistics and prior selection.
- Some Basic Data Analysis in Python - Basic data analysis with python.
- Crash Course in Python for Scientists - Ipython Notebook for Scientists!
- Regular Expressions - Regex to match patterns in strings - very powerful.
- MapReduce - Classes, inheritance and map-reduce exercises.
- Recursion Notebook visualization recursion "The single most powerful idea in algorithms".
- Recursion More about Recursion and Functional Programming
- Hash Table, Bloom Filter, HyperLogLog - Explaining and demoing some of these concepts.
- Hash tables, Binary Trees
- Time Series- Arima & Arma
Collection of sites to access data if you want to build out a project or just use some of the tools for EDA.
- Data.Gov - The US government portal to open data.
- California Water Resources - California's water resource data.
- Data for Cool DS projects
- Academic Torrents - Sharing Data is hard, torrents make it easier for academics.
- Data Basin - Science based mapping and analytics platform.
- Open Energy Data Initiative - Over 800 data sets covering energy issues.
- UCI Machine Learning Datasets - Data for machine learning - lots of labeled data and description of the problem types.
- London Data Store - Lots of datasets on London, UK
- Stanford Large Network Dataset Collection - The SNAP library is being actively developed since 2004 and contains various large social and information networks.
Aim to keep track of developing trends and new tech that is helpful for the practicing Data Scientist. New might be a misnomer.
- BigML - machine learning for the everyday user, also useful for EDA.
- GraphLab - graph-based, high performance, distributed computation framework. They just implemented deep learning onto their platform.
- ModeAnalytics - platform to share analysis/data science.
- Apache Mahout - Scalable machine learning library. Not in python.
- Apache Hadoop - Open-source software for reliable, scalable, distributed computing. Not really new (10 years old at this point)
- Spinning up EC2 instances - Drew Conway's scripts to easily spin up AWS EC2 instances.
###Product Metrics Understanding product, user behavior, and product metrics is helpful for data scientists in industry. Being able to help your product manager and team execute on strategies by understanding the problem, metrics and what they understand facilitates a more fruitful relationship.
- Actionable Metrics - Funnel reports, cohort analysis, actionable metrics.
- Analytics for Product Managers - Everything a PM needs to know about analytics - or the minimum amount your PM should know about analytics as a Data Scientist.
- Startups, you are doing data science wrong! - High level explanation about how to use data science in a start-up company.
- Product Psychology - Understanding user behavior.
- Understanding Cohort Analysis - Blog about cohort analysis, conversions, customer lifetime value, etc. Great starting point understanding product metrics.
- Tech Product Management - More product focused than Data Science but can provide a good sense to view product management.
- Mind The Product - Another solid PM blog.
There are some very innovative new companies that are producing very effective tools to minimize and abstract away inefficient processes at companies. While it isn't strictly data science related, these products could be very help to integrate with your teams to improve overall productivity.
- Aha! - Clean product roadmapping software for PMs.
- Slack - Amazing team communication tool - abstracting away unnecessary e-mails.
- Harvest - Effortless time tracking for business.
- Trello - Helping organize everything - great for project management.
- Zapier - Bringing together Harvest + Slack + Trello and a lot more...
- Thoughtbot Playbook - A detailed account of how thought book runs is software consulting company talking about guiding principles, design sprints, code reviews to sales and operations. A content packed post.
- IFTTT - 'Putting the internet to work for you'. Great for small companies to automate social media, marketing or to have your own personal recipes set up.
- Github - Clearly a great product - 'Build software better, together'.
- Web Analytics & Reporting Software:
- Google Analytics - In depth real-time analytics.
- Mixpanel - provides real-time analytics and solid cohort analysis.
- Clicky - Pride themselves on ease of use.
- Evernote - Great for keeping notes
Source control and keeping accurate documentation so that you and your colleagues can follow and reproduce your work is very important. I will add some best coding practices & data science practices.
- Python Code Style - Allows for better understanding for everyone involved on the project.
- Slide Deck for BMPs - Slide deck about best practices for coding or the repo.
- Engineering Practices in Data Science A blog post about the lack of source control in Data Science. It's a challenging topic - I believe mode analytics is trying to solve it.
- Data Science @ Google - Quora answer about Data Science career trajectory @ google.
Not all Data Scientists are the same and it's critical for organizations to understand what it is they need, and how best to fill those roles and/or complement the skills of their team. Finding the organizational structure that enables the data scientists/data engineers within the organization and generates better results is also crucial. It should be given thorough consideration.
- Kind's of Data Scientist - O'Reilly's classification of 4 different data scientists.
- Data Science For Startups - Which of the Five Types of DS does your startup need? Different classification from O'Reilly.
- Building Data Science Teams - posted from 2011 about how to build data science teams.
- Data Science Team Building - The Power of Collaborative Analytics - Post post about different team org structures, difference between DS & BI.
Data Science has so many different applications and use cases within industry - many are continuously discovered. These resources provide some potential ideas.
- Kaggle Data Science Use Cases - Helpful to generate ideas for new uses in different industries
- Data Science for each Industry - Description of uses for different industries.
- Big Data Analytics News - use Cases - For Big Data but that's almost synonymous with Data Science.
More resources for community based information or hard copy books.
- Data Science Handbook - Not yet released but should be interesting providing stories from academia and industry about data science - go read the post for a better description!
- CrossValidated - A question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.
- StackOverflow - Language-independent collaboratively edited question and answer site for programmers.
- Kaggle - Model building competition and great resources for training and data.
- O'Reilly Media - A lot of content rich books available and tutorials on using the tools.
- Quora - Question and answer site - lots of data science content and career content.
- Data Science @ StackExchange - Still in beta.
###MOOCs about Data Science
- [Data Mining with Weka] (http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/) - Ian H. Witten -
Intermediate
- [Introduction to Data Science] (https://class.coursera.org/datasci-001/class) - Bill Howe (Coursera) -
Beginner
- [Introduction to Hadoop and MapReduce] (https://www.udacity.com/course/ud617) - Udacity -
Beginner
- [Machine Learning] (https://class.coursera.org/ml-003/class) - Andrew Ng (Coursera) -
Beginner
- [Machine Learning Foundatiaons (taught in Chinese)] (https://class.coursera.org/ntumlone-001) - Hsuan-Tien Lin -
Beginner
- [Machine Learning Video Library] (http://work.caltech.edu/library/#!?goback=.gde_35222_member_5810981726511443971) - Yaser Abu-Mostafa -
Intermediate
- [Natural Language Processing] (https://class.coursera.org/nlp/lecture/preview) - Dan Jurafsky and Christopher Manning (Coursera) -
Intermediate
- [Social and Economic Networks: Models and Analysis] (https://class.coursera.org/networksonline-001/class) - Matthew O. Jackson (Coursera) -
Intermediate
- [Social Network Analysis] (https://class.coursera.org/sna-003/class) - Lada Adamic (Coursera) -
Intermediate
- Data Stories @ Quroa - William Chen's (DS@Quora) blog about data science.
- FastML
- FiveThirtyEight Blog - Nate Silver's blog.
- Data Science Hanbook - Data Science Handbook Project (not quite a blog but it fits here).
- Simply Statistics Blog
- All The Things Tech
- Musings in Data Science
- Zipfian Data Science Blog - Zipfian Academy DS Blog.
- Machine Learning Mastery
- DataTau - Hackernews for Data Science.
- HackerNews
- Quora - Q&A site with lots of information about Data Science.
- ThreeStoryBlog - Design blog
- Strata - Conference and a lot of videos from previous conferences - great resource.
- GraphLab - Another great conference.
- PyData
- Strata Collection of Presentations - Most of their conference presentations available online.
- KDD Keynotes - collection of keynote presentations from the NYC conference
- All of PyData Conference Talks
- Lean Startup - A method to develop product and businesses.
- Agile Development - group of software development methods to optimize for self-organizational and cross-functional teams.
- Scrum - an iterative and incremental agile software development framework for managing product development.
- How to Start a Start-up - Series of lectures from successful entrepreneurs (i.e. Y comb, SV angels, etc.) on how to start a start up.
##Open Source Data Science Resources While the name might sound redundant this section represents other sites or repos that have aggregated information covering similar topics. Tons of great content on these sites - definitely go check them out.
There are some really great resources linked within this section covering all of Data Science, the entire data pipeline, machine-learning, statistics, python, etc. Go check them out.
- Open Data Science Masters - Clare Corthell's Open Source online blog/github with lots of resources available for data science.
- A Practical Intro to Data Science - Zipfian Academy's collection of excellent resources available.
- LearnDataScience - Nitin Borwankar's collection of IpythonNotebooks for Linear Regression, Logistic Regression, Random Forests, K-Means Clustering
- FreeDataScienceBooks - Yu Wu's free open sourced online data science books.
- Gallery of Ipython Notebooks - iPython's introduction to Python, Data Science, Economics, Comp Sci, Linguistics, and much more.
- Data Science 45 Min Intros - The team @ Gnip have a collection of repos to introduce data science topics in roughly 45 minutes per topic.
- Awesome Data Science - Collection of bloggers, twitter accounts, facebook accounts, MOOC's, datasets, tools.
- Awesome Big Data - Onur Akpolat's curated list of awesome big data frameworks, resources and papers.
- Mining the Social Web - Matthew Russell's repo related to his book that focuses on working with the Twitter, Facebook, etc.
- Harvard CS109 Github Repo
- Pete Warden's Data Science Toolkit - Collection of open data sets and open-source tools for data science in ruby but has python.
- Course Materials for Data Science Specialization - Coursera course materials.
- iPython Cookbook Materials - Excellent resources for high performance scientific computing and data science in python.
- Markable - Let's me visualize Markdown
- Markdown Cheatsheet - Self explanatory.
- LightPaper - Markdown editor that I use.
- iterm2 - Terminal application for Mac.
- Oh My Zsh - Framework for managing your ZSH config. Awesome.
###Uncategorized
-
[Data Journalism Handbook] (http://datajournalismhandbook.org/1.0/en/) - Jonathan Gray, Liliana Bounegru, and Lucy Chambers -
Beginner
-
[Building Data Science Teams] (http://assets.en.oreilly.com/1/eventseries/23/Building-Data-Science-Teams.pdf) - DJ Patil -
Beginner
-
[Information Theory, Inference, and Learning Algorithms] (http://www.inference.phy.cam.ac.uk/itprnn/book.html) - David MacKay -
Intermediate
-
[Mathematics for Computer Science] (http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-042j-mathematics-for-computer-science-fall-2010/readings/MIT6_042JF10_notes.pdf) - Eric Lehman, Thomas Leighton, and Albert R. Meyer -
Beginner
-
[The Field Guide to Data Science] (http://www.boozallen.com/media/file/The-Field-Guide-to-Data-Science.pdf) -
Beginner
-
[JuergenSchmidhuber links] (https://www.reddit.com/r/MachineLearning/comments/2xcyrl/i_am_j%C3%BCrgen_schmidhuber_ama/cp5c0py)
###Deep Learning
- [Framework comparison] (https://github.com/zer0n/deepframeworks)
- [NN theory] (http://cs224d.stanford.edu/lecture_notes/LectureNotes3.pdf)
- awesome RNN
- Fastest DN python go javascript
- RNN
###Benchmark *[deep learning library benchmark] (https://github.com/soumith/convnet-benchmarks)