Free ways to dive into machine learning with Python and Jupyter Notebook. Notebooks, courses, and other links. (First posted in 2016.)

Home Page: http://hangtwenty.github.io/dive-into-machine-learning/

License: Creative Commons Attribution 4.0 International

dive-into-machine-learning's Introduction

SupportUkraineNow.org — "Real ways you can help Ukraine"

Initiatives

Before we dive in, here are some notable projects and initiatives that might interest you as well.

Related to machine learning

AlgorithmWatch — newsletter — "a non-profit research and advocacy organization that is committed to watch, unpack and analyze automated decision-making (ADM) systems and their impact on society."
daviddao/awful-ai — "Awful AI is a curated list to track current scary usages of AI — hoping to raise awareness"
humanetech-community/awesome-humane-tech — "Promoting solutions that improve wellbeing, freedom and society"

Code against climate change

ProjectDrawdown/solutions — Project Drawdown — "Project Drawdown entered the climate conversation with the publication of the 2017 book. With The Drawdown Review in 2020, the project continues its mission to inspire and communicate solutions." Python and Jupyter Notebooks.
philsturgeon/awesome-earth
daviddao/code-against-climate-change
protontypes/open-sustainable-technology

Dive into Machine Learning

Hi there! You might find this resource helpful if:

You know Python or you're learning it 🐍
You're new to Machine Learning
You care about the ethics of ML
- 8 Responsible Machine Learning Principles
- Open Ethics Canvas
You learn by doing

For some great alternatives, jump to the end or check out Nam Vu's guide, Machine Learning for Software Engineers.

Of course, there is no easy path to expertise. Also, I'm not an expert! I just want to connect you with some great resources from experts. Applications of ML are all around us. I think it's in the public interest for more people to learn more about ML, especially hands-on, because there are many different ways to learn.

Whatever motivates you to dive into machine learning, if you know a bit of Python, these days you can get hands-on with a machine learning "Hello World!" in minutes.

Let's get started

Tools you'll need

If you prefer local installation

Python. Python 3 is the best option.
Jupyter Notebook. (Formerly known as IPython Notebook.)
Some scientific computing packages:
- numpy
- pandas
- scikit-learn
- matplotlib

You can install Python 3 and all of these packages in a few clicks with the Anaconda Python distribution. Anaconda is popular in Data Science and Machine Learning communities. (Use whichever tool works for you. If you're unsure or need more context about using conda/virtualenv/poetry/pipenv, here's a very helpful guide)

Cloud-based options

Some options you can use from your browser:

Binder is Jupyter Notebook's official choice to try JupyterLab
Deepnote allows for real-time collaboration
Google Colab provides "free" GPUs

For other options, see:

Let's go!

Learn how to use Jupyter Notebook (5-10 minutes). (You can learn by screencast instead.)

Now, follow along with this brief exercise: An introduction to machine learning with scikit-learn. Do it in ipython or a Jupyter Notebook, coding along and executing the code in a notebook.

What just happened?

You just classified some hand-written digits using scikit-learn. Neat huh?

Dive in

A Visual Introduction to Machine Learning

Let's learn a bit more about Machine Learning, and a couple of common ideas and concerns. Read "A Visual Introduction to Machine Learning, Part 1" by Stephanie Yee and Tony Chu.

It won't take long. It's a beautiful introduction ... Try not to drool too much!

"A Few Useful Things to Know about Machine Learning"

OK. Let's dive deeper.

Read "A Few Useful Things to Know about Machine Learning" by Prof. Pedro Domingos. It's densely packed with valuable information, but not opaque. (Don't worry if you don't understand it all yet.) Take some time with this one.

Jargon note

What is the difference between Data Analytics, Data Analysis, Data Mining, Data Science, Machine Learning, and Big Data?
Another handy term: "Data Engineering."
- "MLOps" overlaps with Data Eng, and there's an introductory MLOps section later in this guide.

Explore another notebook

Next, code along with one or more of these notebooks.

Series of notebooks:
- 2022: rasbt/machine-learning-book — notebooks from Machine Learning with PyTorch and Scikit-Learn by Sebastian Raschka, Yuxi (Hayden) Liu, and Vahid Mirjalili
Dr. Randal Olson's Example Machine Learning notebook: "let's pretend we're working for a startup that just got funded to create a smartphone app that automatically identifies species of flowers from pictures taken on the smartphone. We've been tasked by our head of data science to create a demo machine learning model that takes four measurements from the flowers (sepal length, sepal width, petal length, and petal width) and identifies the species based on those measurements alone."
- Launch in Binder, no installation steps required
Various topical notebooks:
- trekhleb/machine-learning-experiments
- trekhleb/homemade-machine-learning

Find more great Jupyter Notebooks when you're ready:

Jupyter's official Gallery of Interesting Jupyter Notebooks: Statistics, Machine Learning and Data Science (permalink)

Immerse yourself

Pick one of the courses below and start on your way.

Prof. Andrew Ng's Machine Learning on Coursera

Prof. Andrew Ng's Machine Learning is a popular and esteemed free online course. I've seen it recommended often. And emphatically.

It's recommended to grab a textbook to use as an in-depth reference. The two I saw recommended most often were Understanding Machine Learning and Elements of Statistical Learning. You only need to use one of the two options as your main reference; here's some context/comparison to help you pick which one is right for you.

Public datasets and pet projects

You might like to have a pet project to play with, on the side. When you are ready for that, you could explore one of these: Awesome Public Datasets, paperswithcode.com/datasets, datasetlist.com, KKulma/climate-change-data

Tips for this course

Study tips for Prof. Andrew Ng's course, by Ray Li
If you're wondering, Is it still a relevant course? or trying to figure out if it fits for you personally, check out these reviews:
- Review: Andrew Ng's Machine Learning Course
- The user reviews on Coursera

Tips for studying on a busy schedule

It's hard to make time available every week. So, you can try to study more effectively within the time you have available. Here are some ways to do that:

"Learning How to Learn" by Barbara Oakley by Barbara Oakley, a free video course on Coursera.
Prefer book/audiobook? These are great options:
- Barbara Oakley's book A Mind for Numbers: How to Excel at Math and Science (reviews) — "We all have what it takes to excel in areas that don't seem to come naturally to us at first"
- Make It Stick: the Science of Successful Learning (reviews)

Take my tips with a grain of salt

I am not a machine learning expert. I'm just a software developer and these resources/tips were useful to me as I learned some ML on the side.

Other courses

Data science courses as Jupyter Notebooks:
- Practical Data Science
- Python Data Science Handbook, as Jupyter Notebooks
microsoft/Data-Science-For-Beginners — added in 2021 — "10-week, 20-lesson curriculum all about Data Science. Each lesson includes pre-lesson and post-lesson quizzes, written instructions to complete the lesson, a solution, and an assignment. Our project-based pedagogy allows you to learn while building, a proven way for new skills to 'stick'."
See also microsoft/ML-For-Beginners

More free online courses I've seen recommended. (Machine Learning, Data Science, and related topics.)

Coursera's Data Science Specialization
Prof. Pedro Domingos's introductory video series. Prof. Pedro Domingos wrote the paper "A Few Useful Things to Know About Machine Learning", which you may remember from earlier in the guide.
ossu/data-science (see also ossu/computer-science)
Stanford CS229: Machine Learning
Harvard CS109: Data Science
Advanced Statistical Computing (Vanderbilt BIOS8366). Interactive.
Kevin Markham's video series, Intro to Machine Learning with scikit-learn, starts with what we've already covered, then continues on at a comfortable place.
UC Berkeley's Data 8: The Foundations of Data Science course and the textbook Computational and Inferential Thinking teaches critical concepts in Data Science.
Prof. Mark A. Girolami's Machine Learning Module (GitHub Mirror). "Good for people with a strong mathematics background."
An epic Quora thread: How can I become a data scientist?
ujjwalkarn/Machine-Learning-Tutorials
There are more alternatives linked at the bottom of this guide

Getting Help: Questions, Answers, Chats

Start with the support forums and chats related to the course(s) you're taking.

Check out datascience.stackexchange.com and stats.stackexchange.com – such as the tag, machine-learning. There are some subreddits, like /r/LearningMachineLearning and /r/MachineLearning.

Don't forget about meetups. Also look for chat invitations on project pages and so on.

Some communities to know about!

Supplement: Learning Pandas well

You'll want to get more familiar with Pandas.

Essential: Things in Pandas I Wish I'd Had Known Earlier (as a Jupyter Notebook)
Essential: 10 Minutes to Pandas
Another helpful tutorial: Real World Data Cleanup with Python and Pandas
Video series from Data School, about Pandas. "Reference guide to 30 common pandas tasks (plus 6 hours of supporting video)."
Here are some docs I found especially helpful as I continued learning:
Bookmarks for scaling pandas and alternatives
- dask: A Pandas-like interface, but for larger-than-memory data and "under the hood" parallelism.
- vaex: "Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualize and explore big tabular data at a billion rows per second"

Supplement: Troubleshooting

These debugging tools can be used inside (or outside) a Jupyter notebook:

There are many more tools than that, but those might get you started, or might be especially useful while you're learning. Beyond learning, troubleshooting is more than just logs or debuggers, of course... there's also some MLOps links, later in this guide.

Assorted Tips and Resources

Risks - some starting points

"Machine learning systems automatically learn programs from data." Pedro Domingos, in "A Few Useful Things to Know about Machine Learning." The programs you generate will require maintenance. Like any way of creating programs faster, you can rack up technical debt.

Here is the abstract of Machine Learning: The High-Interest Credit Card of Technical Debt:

Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns.

If you're reading this guide, you should read that paper. You can also listen to a podcast episode interviewing one of the authors of this paper.

Awesome Production Machine Learning, "a curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning." It includes a section about privacy-preserving ML, by the way!
"Rules of Machine Learning: Best Practices for [Reliable] ML Engineering," by Martin Zinkevich, regarding ML engineering practices.
The High Cost of Maintaining Machine Learning Systems
Overfitting vs. Underfitting: A Conceptual Explanation
11 Clever Methods of Overfitting and How to Avoid Them
"So, you want to build an ethical algorithm?" An interactive tool to prompt discussions (source)

That's not a comprehensive list, of course! They are just some gateways and starting-points. Know some other resources? Please share them, pull requests are welcome!

Peer review

OpenReview.net "aims to promote openness in scientific communication, particularly the peer review process."

More about OpenReview.net

Open Peer Review: We provide a configurable platform for peer review that generalizes over many subtle gradations of openness, allowing conference organizers, journals, and other "reviewing entities" to configure the specific policy of their choice. We intend to act as a testbed for different policies, to help scientific communities experiment with open scholarship while addressing legitimate concerns regarding confidentiality, attribution, and bias.

Open Publishing: Track submissions, coordinate the efforts of editors, reviewers and authors, and host… Sharded and distributed for speed and reliability.

Open Access: Free access to papers for all, free paper submissions. No fees.

Open Discussion: Hosting of accepted papers, with their reviews, comments. Continued discussion forum associated with the paper post acceptance. Publication venue chairs/editors can control structure of review/comment forms, read/write access, and its timing.

Open Directory: Collection of people, with conflict-of-interest information, including institutions and relations, such as co-authors, co-PIs, co-workers, advisors/advisees, and family connections.

Open Recommendations: Models of scientific topics and expertise. Directory of people includes scientific expertise. Reviewer-paper matching for conferences with thousands of submissions, incorporating expertise, bidding, constraints, and reviewer balancing of various sorts. Paper recommendation to users.

Open API: We provide a simple REST API [...]

Open Source: We are committed to open source. Many parts of OpenReview are already in the OpenReview organization on GitHub. Some further releases are pending a professional security review of the codebase.

OpenReview.net is created by Andrew McCallum’s Information Extraction and Synthesis Laboratory in the College of Information and Computer Sciences at University of Massachusetts Amherst

OpenReview.net is built over an earlier version described in the paper Open Scholarship and Peer Review: a Time for Experimentation published in the ICML 2013 Peer Review Workshop.

OpenReview is a long-term project to advance science through improved peer review, with legal nonprofit status through Code for Science & Society. We gratefully acknowledge the support of the great diversity of OpenReview Sponsors––scientific peer review is sacrosanct, and should not be owned by any one sponsor.

Production, Deployment, MLOps

If you are learning about MLOps but find it overwhelming, these resources might help you get your bearings:

MLOps Stack Template by Henrik Skogström
Lessons on ML Platforms from Netflix, DoorDash, Spotify, and more by Ernest Chan in Towards Data Science
MLOps Stack Canvas at ml-ops.org

Recommended awesomelists to save/star/watch:

Easier sharing of deep learning models and demos

🐣 Replicate "makes it easy to share a running machine learning model"
- Easily try out deep learning models from your browser
- The demos link to papers/code on GitHub, if you want to dig in and see how something works
- The models run in containers built by cog, "containers for machine learning."
  - It's an open-source tool for putting models into reproducible Docker containers.
  - You can put models in containers with just Python and YAML.
- There's an API for Replicate to run predictions for you

Deep Learning

Take note: some experts warn us not to get too far ahead of ourselves, and encourage learning ML fundamentals before moving onto deep learning. That's paraphrasing from some of the linked coursework in this guide — for example, Prof. Andrew Ng encourages building foundations in ML before studying DL. Perhaps you're ready for that now, or perhaps you'd like to get started soon and learn some DL in parallel to your other ML learnings.

When you're ready to dive into Deep Learning, here are some helpful resources.

Dive into Deep Learning - An interactive book about deep learning (view on GitHub)
- Quickstart:
  - Run this book locally, using Jupyter Notebooks
  - Run this book in your browser, using Google Colab
- "The entire book is drafted in Jupyter notebooks, seamlessly integrating exposition figures, math, and interactive examples with self-contained code."
- "You can modify the code and tune hyperparameters to get instant feedback to accumulate practical experiences in deep learning."

Collaborate with Domain Experts

Machine Learning can be powerful, but it is not magic.

Whenever you apply Machine Learning to solve a problem, you are going to be working in some specific problem domain. To get good results, you or your team will need "substantive expertise" / "domain knowledge." Learn what you can, for yourself... But you should also collaborate with experts. You'll have better results if you collaborate with subject-matter experts and domain experts.

Machine Learning and User Experience (UX)

I couldn't say it better:

Machine learning won’t figure out what problems to solve. If you aren’t aligned with a human need, you’re just going to build a very powerful system to address a very small—or perhaps nonexistent—problem.

That quote is from "The UX of AI" by Josh Lovejoy. In other words, You Are Not The User. Suggested reading: Martin Zinkevich's "Rules of ML Engineering", Rule #23: "You are not a typical end user"

Skilling up

What are some ways one can practice?

One way: competitions and challenges

You need practice. On Hacker News, user olympus commented to say you could use competitions to practice and evaluate yourself. Kaggle and ChaLearn are hubs for Machine Learning competitions. (You can find more competitions here or here.)

You also need understanding. You should review what Kaggle competition winners say about their solutions, for example, the "No Free Hunch" blog. These might be over your head at first but once you're starting to understand and appreciate these, you know you're getting somewhere.

Competitions and challenges are just one way to practice! Machine Learning isn't just about Kaggle competitions.

Another way: try doing some practice studies

Here's a complementary way to practice: do practice studies.

Ask a question. Start exploring some data. The "most important thing in data science is the question" (Dr. Jeff T. Leek). So start with a question. Then, find real data. Analyze it. Then ...
Communicate results. When you think you have a novel finding, ask for review. When you're still learning, ask in informal communities (some are linked below).
Learn from feedback. Consider learning in public, it works great for some folks. (Don't pressure yourself yet though! Everybody is different, and it's good to know your learning style.)

How can you come up with interesting questions? Here's one way. Pick a day each week to look for public datasets and write down some questions that come to mind. Also, sign up for Data is Plural, a newsletter of interesting datasets. When a question inspires you, try exploring it with the skills you're learning.

This advice, to do practice studies and learn from review, is based on a conversation with Dr. Randal S. Olson. Here's more advice from Olson, quoted with permission:

I think the best advice is to tell people to always present their methods clearly and to avoid over-interpreting their results. Part of being an expert is knowing that there's rarely a clear answer, especially when you're working with real data.

As you repeat this process, your practice studies will become more scientific, interesting, and focused. Also, here's a video about the scientific method in data science.)

More machine learning career-related links

"Advice on building a machine learning career and reading research papers by Prof. Andrew Ng"
Some links for finding/following interesting papers/code:
- Papers With Code is a popular site to follow, and it can lead you to other resources. github.com/paperswithcode
- MIT: Papers + Code — "Peer-review is the lifeblood of scientific validation and a guardrail against runaway hype in AI. Our commitment to publishing in the top venues reflects our grounding in what is real, reproducible, and truly innovative."
- papers.labml.ai/papers/weekly, monthly
Pull requests welcome!

More Data Science materials

Here are some additional Data Science resources:

Python Data Science Handbook, as Jupyter Notebooks
r0f1/datascience — "A curated list of awesome resources for practicing data science using Python, including not only libraries, but also links to tutorials, code snippets, blog posts and talks."

Aside: Bayesian Statistics and Machine Learning

From the "Bayesian Machine Learning" overview on Metacademy:

... Bayesian ideas have had a big impact in machine learning in the past 20 years or so because of the flexibility they provide in building structured models of real world phenomena. Algorithmic advances and increasing computational resources have made it possible to fit rich, highly structured models which were previously considered intractable.

Here are some awesome resources for learning Bayesian methods.

The free book Probabilistic Programming and Bayesian Methods for Hackers. Made with a "computation/understanding-first, mathematics-second point of view." Uses PyMC. It's available in print too!
Like learning by playing? Me too. Try 19 Questions, "a machine learning game which asks you questions and guesses an object you are thinking about," and explains which Bayesian statistics techniques it's using!
Time Series Forecasting with Bayesian Modeling by Michael Grogan, a 5-project series - paid but the first project is free.
Bayesian Modelling in Python. Uses PyMC as well.

(↑ Back to top)

More ways to "Dive into Machine Learning"

Here are some other guides to learning Machine Learning.

Machine Learning for Software Engineers, by Nam Vu. In their words, it's a "top-down and results-first approach designed for software engineers." Definitely bookmark and use it. It can answer many questions and connect you with great resources.
ujjwalkarn/Machine-Learning-Tutorials
josephmisiti/awesome-machine-learning
Courses by cloud vendors. These are usually high quality content but steer you heavily to use vendor-specific tools/services. To avoid getting locked into vendor specifics, you can make sure you're learning from other resources as well.
2022: Machine Learning with PyTorch and Scikit-Learn by Sebastian Raschka, Yuxi (Hayden) Liu, and Vahid Mirjalili

(↑ Back to top)

dive-into-machine-learning's People

Contributors

Stargazers

Watchers

Forkers

rolandmueller gurvindersingh schmich mahesh-singh luca saveforks pbamotra ucaksiz lyrl uditmo2006 librari qhfgva aboutaaron ashumeow zemily brspurri mattallty markvaughan webcoding wwf5067 xunyou ranjanmanish wware rssr25 yinhuagang easonlv coddinglxf louiekang eavie simplelovecs yanjiegao genliu777 rhiever indy9000 robert-ko kod3r mukulrawat1986 seewhy-chen dribnet sharpchicity nandigama vishfrnds ruthvik92 luong-vinh xguse dtiholden yutarochan davidlowjw luojianp better1900 andrubrown eric-guanwy atarax1 madjelan jayhetee amr2393 potis xuq shantanukumar patrickjonesdotca malisetti pragnesh sgzsh269 michaellady tomkinsc aksiksi koffisam wavelets jwcmitchell yosoyubik fcschmidt lordnacho tomhebbron madirey badarahmed applecool svijaykr datrick oilgasdataanalyst gracecur123 vaishaksuresh dt3ch djvita googoljqk swaroopgj benjamesbabala epocolis ulricgan koolkt luojiahuli ranganaths supernova15 bssrdf binderwang akansal1 wxdublin necatikartal saurabhdhupar leosartaj gotoc

dive-into-machine-learning's Issues

Add link to Learning How to Learn (at the right moment in the guide)

https://www.coursera.org/learn/learning-how-to-learn/

Add link to Sonnet Python library in the right place

should go in with the other Deep Learning and Tensor Flow links

https://github.com/deepmind/sonnet

Link to distill.pub

https://distill.pub/about/

Link to Data is Plural somewhere

Link to that great article on Machine Learning Mastery, but with caveats/framing

Submitted by @vishwajeetv

This guide sets pragmatic philosophy towards becoming a machine learning developer.

But I need to add some framing or caveats. From #38 ... I said:

"Dive into Machine Learning" is very carefully curated so far ... While the Brownlee article has a similar hack-first focus, I have issues with Brownlee's tone. I'm trying to put myself in the shoes of someone who has more experience with ML, and did get a PhD in it.

Maybe this is my issue: this article provides advice that may help some developers get into machine learning ... yet it has a lot of attitude, and that attitude is discouraging towards inquiry and curiosity. If everyone had the attitude of this article (at least, at some of its moments) ... there would be no machine learning field for developers to get into; nobody would get into math or research let alone something as "un-pragmatic" as machine learning research!

I will mull this over a bit. Sorry to delay :) Appreciate the PR, for sure

Add link to course Data 8: The Foundations of Data Science from UC Berkeley

I agree that hacking is the way you get started. But after going through the same process I struggled for some foundation later.

The Data 8 course and the inferential thinking textbook cover the foundations of data science. The course was designed for undergrads because students were struggling with concepts once taking advance topics at the grad level.

Also, the idea behind the course came from Prof. Micheal Jordon who is also linked to AMP Lab (the birth place for Spark).

PS: This is the only textbook with atleast a definition of Data Science compared to the Venn diagram which now has potentially 7-8 versions.

Wanted to confirm if you want to include this information:

Atleast a link
Or link with some description highlighting the importance of this course
Using the definition of data science as mentioned in the text book (separate PR)

I can raise a PR based on your response.

Validate pull requests with Travis

Hello, I wrote a tool that can validate README links (valid URLs, not duplicate). It can be run when someone submits a pull request.

It is currently being used by

Examples

https://travis-ci.org/matteocrippa/awesome-swift/builds/96526196 ok ✅
https://travis-ci.org/matteocrippa/awesome-swift/builds/96722421 link redirected / rename 🔴
https://travis-ci.org/dkhamsing/open-source-ios-apps/builds/96763135 bad link / project deleted 🔴
https://travis-ci.org/dkhamsing/open-source-ios-apps/builds/95754715 dupe 🔴

If you are interested, connect this repo to https://travis-ci.org/ and add a .travis.yml file to the project.

See https://github.com/dkhamsing/awesome_bot for options, more information
Feel free to leave a comment 😄

TPOT

Need to try out TPOT and probably add it to the guide.

Link to Google's new "crash course" using TensorFlow. (Deep Learning/ Other Guides)

https://developers.google.com/machine-learning/crash-course/

add link in the Deep Learning section.
add link in the Other Guides section.

Broken link - fsecurify

The fsecurify link seems to be broken in section "A note about Machine Learning and Security (InfoSec, AppSec)"

edx course on Data 8

UC Berkeley now offering their Data 8 course via edX(MOOC) platform. Update the Data 8 section accordingly.

add links to @rabst's books

Image linked from Photo Bucket not showing in readme

Specifically in this section:
https://github.com/hangtwenty/dive-into-machine-learning#a-few-useful-things-to-know-about-machine-learning
The image at the end of this section (is from)/(hosted on) Photo Bucket and is not showing up. If possible we could use an alternative host for the image.

Add about the numpy arrays

The numpy array takes 13 milliseconds for its execution in an expression while a normal array takes about
several seconds for its operation

Add link to book: Python Machine Learning

Python Machine Learning

(and a blog post reflecting on the writing of this book)

Encourage Python 3 adoption, but welcome all backgrounds.

See Pull Request #4 .

edX course on Data8

UC Berkeley now offering their Data 8 course via edX(MOOC) platform. Update the Data 8 section accordingly.

Note: The previous issue #105 with the same name was not resolved, so opening again.

Add Table of Contents at the Beginning of README

I think it's better for README to have a table of contents, with clickables to jump directly to each of the high-level section headline.
I feel it's a good first task for me as a rookie open-source contributor. Kindly remind me of do's and dont's regarding making changes and pull requests.

Add TensorFlow link somewhere ...

Consider adding link to HackerRank for practice problems.

May want to add link to HackerRank: Statistics and Machine Learning, but want to vet it a bit. The site has many categories, so I would like to check if these are (indisputably) worthwhile exercises before adding link.

Video series about scikit-learn designed for machine learning beginners

Hello! I have a suggestion for the repo, and thought an issue might be the best place to post it.

I created a video series about scikit-learn, designed specifically for Python users with no background in machine learning or scikit-learn. I thought it might be useful for your readers. Here are the relevant links:

GitHub repo containing the notebooks - https://github.com/justmarkham/scikit-learn-videos
each notebook is related to a video - https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A
each notebook is also related to a blog post - http://blog.kaggle.com/author/kevin-markham/

If you'd like a quick overview of the series, it can be found here:
http://blog.kaggle.com/2015/04/08/new-video-series-introduction-to-machine-learning-with-scikit-learn/

As a side note, the first video in the series isn't really about scikit-learn, it's just an introduction to how machine learning "works". Thus, it could be worth a separate link in your repo:
https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1

Anyway, I'm happy to answer any questions and appreciate your consideration!

UX section += reference to, You Are Not the User

Very start of the section should link to a page mentioning, "You Are Not the User" (a generally useful principal for developers to remember).
And link/reference to "Rule #23: You are not a typical end user" from Martin Z's Rules of ML Engineering doc http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf

"Starting Simple" episode outdated link

http://www.thetalkingmachines.com/blog/2015/4/23/starting-simple-and-machine-learning-in-meds link doesn't work anymore.

This may be the correct one.

Add YN2 Deep Learning tutorial.

http://yerevann.com/a-guide-to-deep-learning/

fix broken links

Fix broken links as part of Travis CI build:

Issues :-(

Links

[L026] 522 https://www.continuum.io/downloads
[L121] http://nb.bianp.net/sort/views/ Connection timed out - connect(2) for "nb.bianp.net" port 80

add link to 'Dive into Deep Learning'

https://github.com/hangtwenty/dive-into-machine-learning#deep-learning

should have link to this in its list:

https://d2l.ai/

Recommend Anaconda as a viable alternative to pip.

See Pull Request #4.

Jupyter: go through guide, ensure compatibility

Prodded by this comment on DataTau.

I replied ...

When I first started putting this together, it was still just -- ipython notebook, and other notebooks. The unifying effort of Jupyter hadn't been released yet. I need to catch up here... After I know the full picture about how Jupyter is different from ipython notebook, any compatibility issues, etc., I will make a full pass through the guide to update anything needing updates.

Link to lore and a tutorial for it

Need a section about Deep Learning intro/way-out/next-steps if people are interested ... from someone who knows what they are talking about !

(braindump)

I've largely avoided having much deep learning links or neural nets links ... because it seems like it can be dangerous for beginners to jump ahead to those when they are not ready. I don't know enough about deep learning to really situate this beside just a couple links in a list ... let alone situate it in a sober, smart way :)

Recently, this MIT book about Deep Learning has been published, and judging by reactions it seems like a worthwhile resource to link to. I still don't want "everything and the kitchen sink" but this would be a good "essential resource" on the subject. Or maybe an expert knows a better choice.

So. Might want a small subsection to link to these two things, with the appropriate one or two sentence caveat that these are advanced topics, refer back to the pyramid showing that algorithm is least important (for many problems), etc etc. Maybe link to a Talking Machines episode too to give the appropriate context.

Add some more links to ML in Security problem domains

in notebooks section: http://clicksecurity.github.io/data_hacking/
by security section: https://github.com/jivoi/awesome-ml-for-cybersecurity/blob/master/README.md

I want the same caution as before, that people should proceed carefully here. But still want some good resources :-)

Can someone revise/ add color to the big data section ...?

There is already a section about Big Data but it needs some revision.

I've accumulated a couple links I want to throw in, but I may need assistance from an expert to keep this section tiny.

Preview of the revised links (add onto the links to Apache Spark etc. -- those will remain there)

10 things statistics taught us about big data analysis (and some more food for thought: "What Statisticians think about Data Scientists")
Talking Machines #12: Interviews Prof. Andrew Ng (from our main course, which has its own module on big data); the show covers some problems relevant to high-dimensional data
Talking Machines #15: "Really Really Big Data and Machine Learning in Business":
This preview edition of this book, Getting Data Right ... - http://jdp491bprdv1ar3uk2puw37i.wpengine.netdna-cdn.com/wp-content/uploads/2015/11/Getting_Data_Right_Preview_Edition_Nov2015.pdf
the relevant module in Prof. Ng's class should be mentioned.
anything else?

Need advice about how to evaluate your proficiency

Please don't sell yourself as a Machine Learning expert while you're still in the Danger Zone. Don't build bad products or publish junk science. This guide can't tell you how you'll know you've "made it" into Machine Learning competence ... let alone expertise. It's hard to evaluate proficiency without schools or other institutions. This is a common problem for self-taught people. Your best bet may be: expert peers.

If you know a good way to evaluate Machine Learning proficiency, please submit a Pull Request to share it with us.

Need to tell people how they know they're out of the Danger Zone or how they know they are hire-able.

Link to more than just one document for "Learn Python" tip

Right at the beginning I like to Dive into Python ... and while I would still recommend that as a #1 choice, I should link to my document with more options.

add link to Python Data Science Handbook

https://github.com/jakevdp/PythonDataScienceHandbook

Add info about Dask

section on timeseries DBs and tsfresh (automated feature extraction)

or perhaps just a link to someone else's. or highlighting that one handy OSS timeseries-db-comparison, plus personal favorites like timescaledb

this is a link that would be added to that section

https://github.com/blue-yonder/tsfresh

[notes to self/ thinking out loud] regarding human learning styles - "impasse-driven learning"

in this ticket, 'learning' refers to humans learning a topic, in general - it is not machine-learning hehe

I learned Python by hacking first, and getting serious later. [...] If this is your style, join me in getting a bit ahead of yourself

I've recently learned the term/concept of "impasse-driven learning" and it was a 💡 for me. Realized this is my preferred learning style. Some interesting papers exist (though I'm surprised how few): https://scholar.google.com/scholar?q=impasse-driven+learning

this guide is oriented towards that learning style. so I had the thought of adding a small note or link that points that out.

[Contributions Welcome!] Look into book about "Test-Driven Approach" to learning Machine Learning; add a note in the guide

Thoughtful Machine Learning with Python: A Test-Driven Approach 1st Edition
by Matthew Kirk. Looks very interesting. I want to look into the book and/or early reviews of it, and most likely find a place to link it in the guide.

Link to Julia Evans blogpost, "Machine Learning isn't just Kaggle competitions"

Machine Learning isn't just Kaggle competitions

will you open source the site's d3 code?

👍

broken URL in README

https://fivethirtyeight.blogs.nytimes.com/fivethirtyeights-2012-forecast/ is invalid and was captured by CI in #90 pull request

Include Orange in tools

I suggest to add Orange to the tools references.
Orange is an open source machine learning and data visualization for novice and expert.
Provides an interactive data analysis workflows with a large toolbox.

machine-learning-module -- add somewhere ?

This is a machine learning module I found here:

http://www.dcs.gla.ac.uk/~girolami/Machine_Learning_Module_2006/week_2/Lectures/wk_2_lect_2.pdf

None of this material is mine, it has all been created by Professor. M. A .Girolami.
This is hands down the best machine-learning tutorials I have found on the web, and I was afraid
the university link would be taken down, so now its on github.

i hope you enjoy this as much as i did.

https://github.com/josephmisiti/machine-learning-module

Bayesian methods

PULL REQUESTS WELCOME!

We already have this ...

Here's an IPython Notebook book about Probabilistic Programming and Bayesian Methods for Hackers: "An intro to Bayesian methods and probabilistic programming from a computation/understanding-first, mathematics-second point of view."

I've come across markdregan/Bayesian-Modelling-in-Python and it looks great. Should probably add a link to this, right around the quote above. But then it might be good to have a word or two link to some explanation of the exact relationship of Bayesian Modeling to ML ...

the inline link should link to 1+ of these:

Deep Frameworks/ TensorFlow vs. Theano vs. Torch

I should get wise to TensorFlow vs. Theano vs. Torch then update this snip of verbiage...

TensorFlow seems like a really big deal. [...]

maybe to something like

TensorFlow seems like a really big deal. It has to have its own bullet point. Now, it's still not magic. And it's not the only Deep Learning framework. But. You can bet people will do exciting things with TensorFlow, Theano, Torch and other machine learning frameworks that make complex algorithms more successful. Just remember: "More data beats a cleverer algorithm" (Domingos).

https://github.com/humphd/have-fun-with-machine-learning

Elements of Statistical Learning vs. Understanding Machine Learning?

Elements of Statistical Learning and Understanding Machine Learning are both free books, and frequently recommended as reference textbooks. As you continue towards expertise you can also go deep into these books.

Issue: need to add link to UML in guide. Also need to link to context/comparison if possible.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

dive-into-machine-learning / dive-into-machine-learning Goto Github PK