microsoft / ml-for-beginners Goto Github PK
View Code? Open in Web Editor NEW12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
Home Page: https://microsoft.github.io/ML-For-Beginners/
License: MIT License
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
Home Page: https://microsoft.github.io/ML-For-Beginners/
License: MIT License
Is your feature request related to a problem? Please describe.
As of now, we are Thanking Contributors like this ๐๐ป
I think we can do it in a more better and personalized way !
Describe the solution you'd like
We can shift this section at bottom of reading me the file and we can add everyone's card so If anyone wants to reach out to them they can visit their GitHub profile card will contain their GitHub avatar and name where the link will be added.
Describe alternatives you've considered
Or we can just add links to their profiles to the existing places this will also help but the card will look a more attractive and a small tribute to great people!!
Will like to add this feature
Let's translate this content! Here are instructions: https://github.com/microsoft/ML-For-Beginners/blob/main/TRANSLATIONS.md
We can focus on the following for now, but please propose yours and start a draft PR:
Have taken most of the lessons.... Why not adding introduction to GPC Auto ML vision lesson?
It wil be very interesting as practical project!
The course is described as being "12 weeks, 24 lessons", but it doesn't give an idea as to how much time is required to complete each lesson.
I'd like to do the course, but I need to figure out if I've got time in my current schedule.
Build all PAT Rubrics in discussion board for each lesson grouping
@dasani-madipalli and @girliemac
I'd love to include one sketchnote (WebDev style if possible!) for each of the big topics in this course, to be added to each introductory lesson:
Do you think it's possible? ๐โค๏ธ
the last lesson, finishing NLTK for sentiment analysis
Hi @jlooper,
I really love the way in which these learning materials and curriculum are designed. However, I had some reviews which might help make this better. I don't have any reviews for Sections 1 and 3 and feel they are really great in the current form. Do let me know if I should make a PR addressing any of these reviews.
This section looks pretty good to me and I don't have any reviews which might help make this better.
๐ Tokenization Probably the first thing most NLP algorithms have to do is split the text into tokens, or words. While this sounds simple, having to account for punctuation and different language's word and sentence delimiters can make it tricky.
maybe we could give some intuition on why tokenization might be to not only split a sentence when there is whitespace by adding:
Thought it might seem very straightforward to simply split your sentence into words, you might have to use some other methods or add on top of this too.
๐ Embeddings Embeddings are a way to meaningfully convert your text data numerically. This is done in a way so that words with a similar meaning or words used together cluster together in a high dimensional space.
Optionally we could also add:
Try playing around with word embeddings from a quite popular model (Word2Vec) here. Can you see how clicking on one word shows the words with similar meaning clustering around! Eg. if you inspect the word 'toy' you see it clusters with words: 'disney', 'lego', 'playstation', 'console' etc.
However, I do understand this might make it a bit more deep at this stage, what do you think?
This section looks pretty good to me and I don't have any reviews which might help make this better.
To complete the lessons, we want to add an overview of ML as it is used in the real world. This will focus o
classic ML only! So don't worry about Neural Networks for this curriculum. Pick a domain and write a paragraph about it as a reply to this issue, and I'll add it to the lesson and credit you!
Finance
Credit card fraud detection
Wealth management
Education
Predicting student behavior
Preventing plagiarism
Course recommendations
Retail
Personalizing the customer journey
Inventory management
Health Care
Optimizing drug delivery
Hospital re-entry management
Disease management
Ecology and Green Tech
Forest management
Motion sensing of animals
Energy Management
Insurance
Actuarial tasks
Arts, Culture, and Literature
Fake news detection
Classifying artifacts
Marketing
'Ad words'
Hi @jlooper ,
I reviewed the solution notebook 5-Clustering/2-K-Means. Here are my experiments on top of which I make the below comments: https://colab.research.google.com/drive/1oIXPkQZzvJClRaCoEcpznKw3RhXOTPay?usp=sharing
It turns out we are trying to cluster the other features to get 3 categories (artist genres) but if you see there is almost no correlation between our features and our expected cluster bases (see this cell: https://colab.research.google.com/drive/1oIXPkQZzvJClRaCoEcpznKw3RhXOTPay#scrollTo=cpfSV8Yem1H9&line=5&uniqifier=1). The only good enough corellation we get is for loudness and energy which again wouldn't be a problem to be solvedย a clustering algorithm.
I think it would not be much useful to use clustering for this problem in its entirety, what do you think?
Issue in the markdown, says no repo found on android etc...
File attached: screenshot.png
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Smartphone (please complete the following information):
Additional context
Add any other context about the problem here.
Task- move all translations to their proper folder
Please create an outline of how you'd like to write this lesson
Hi @jlooper,
Here are my reviews for the Clustering Module, do note I have only reviewed Part 1 as of the moment as you asked.
Derived from mathematical terminology, non-flat vs. flat geometry refers to the measure of distances between points by either 'flat' (non-Euclidean) or 'non-flat' (Euclidean) geometrical methods.
I think there might have made a typo which could get a bit confusing for readers, by flat geometry (not flat object) we mean measuring distances, areas, and volumes using Euclidean distance i.e. following Euclidean geometry, thus the sentence should be rewritten as:
Derived from mathematical terminology, non-flat vs. flat geometry refers to the measure of distances between points by either 'flat' (Euclidean) or 'non-flat' (non-Euclidean) geometrical methods.
Quote from sklearn
docs to support my suggestion:
"Non-flat geometry clustering is useful when the clusters have a specific shape, i.e. a non-flat manifold, and the standard euclidean distance is not the right metric."
'Flat' in this context refers to Euclidean geometry (parts of which are taught as 'plane' geometry), and non-flat refers to non-Euclidean geometry
I think adding a mention of the difference in how distances are measured would be quite helpful since most readers might already know about calculating Euclidean distance and allow them to build their intuition on top of this, what do you think?
The only strong correlation is between energy and loudness, which is not too surprising, given that loud music is usually pretty energetic. Otherwise, the correlations are relatively weak. It will be interesting to see what a clustering algorithm can make of this data.
This sentence is quite simple but I think it might lead to beginner readers making wrong assumptions, how do I know, well I had this wrong intuition up my mind when starting with ML too!
Anyways, we here say that there is a good correl. between the energy of the song and loudness, I personally think it is very important to here specify
Correlation does not imply causation and should not be confused, we have proof of correlation but no proof of causation.
Optionally we could link readers to Tyler Vigen's super famous spurious correlations blog/book. I think this is quite important and highly misunderstood aspect which would be best suited here.
You can discover concentric circles around a general point of convergence, showing the distribution of points. In general, the three genres align loosely in terms of their popularity and danceability. Determining clusters in this loosely-aligned data will be interesting:
Do you think we should introduce the readers to some more about these graphs maybe something like:
We here use a KDE (Kernel Density Estimate) graph that represents the data using a continuous probability density curve. This allows us to easily interpret data especially when working with multiple distributions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.