Giter VIP home page Giter VIP logo

mlfromscratch's Introduction

ML algorithms from Scratch!

Machine Learning algorithm implementations from scratch.

You can find Tutorials with the math and code explanations on my channel: Here

Algorithms Implemented

  • KNN
  • Linear Regression
  • Logistic Regression
  • Naive Bayes
  • Perceptron
  • SVM
  • Decision Tree
  • Random Forest
  • Principal Component Analysis (PCA)
  • K-Means
  • AdaBoost
  • Linear Discriminant Analysis (LDA)

Installation and usage.

This project has 2 dependencies.

  • numpy for the maths implementation and writing the algorithms
  • Scikit-learn for the data generation and testing.
  • Matplotlib for the plotting.
  • Pandas for loading data.

NOTE: Do note that, Only numpy is used for the implementations. Others help in the testing of code, and making it easy for us, instead of writing that too from scratch.

You can install these using the command below!

# Linux or MacOS
pip3 install -r requirements.txt

# Windows
pip install -r requirements.txt

You can run the files as following.

python -m mlfromscratch.<algorithm-file>

with <algorithm-file> being the valid filename of the algorithm without the extension.

For example, If I want to run the Linear regression example, I would do python -m mlfromscratch.linear_regression

Watch the Playlist

Alt text

mlfromscratch's People

Contributors

dependabot[bot] avatar janasunrise avatar patrickloeber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlfromscratch's Issues

Explanation of `get_hyperplane_value`

Hi, I find your tutorials on SVM very helpful but I do not understand the get_hyperplane_value method in svm_tests.py. May I have an explanation of it? Thank you!

Project dependencies may have API risk issues

Hi, In MLfromscratch, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

numpy==1.22.0
scikit-learn==0.24.2
matplotlib==3.4.2
pandas==1.2.4

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project,
The version constraint of dependency numpy can be changed to >=1.8.0,<=1.23.0rc3.
The version constraint of dependency matplotlib can be changed to >=1.3.0,<=3.0.3.
The version constraint of dependency pandas can be changed to >=0.4.0,<=1.2.5.

The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the numpy
numpy.linalg.inv
numpy.linalg.eig
The calling methods from the matplotlib
matplotlib.colors.ListedColormap
The calling methods from the pandas
pandas.read_csv
The calling methods from the all methods
numpy.argwhere
self._grow_tree
self._best_criteria
self.plot
numpy.unique
numpy.amin
LDA.transform
RandomForest.predict
numpy.mean
range
numpy.exp
numpy.argsort
numpy.dot
sklearn.datasets.make_blobs
df.fillna.fillna
self._create_clusters
self._traverse_tree
numpy.log
self._approximation
numpy.sign
matplotlib.pyplot.figure
self._is_converged
numpy.linalg.eig
numpy.where
NaiveBayes
matplotlib.pyplot.show
numpy.sum
DecisionTree
mean_overall.mean_c.reshape.dot
SVM
matplotlib.colors.ListedColormap
SW.np.linalg.inv.dot
numpy.empty
csv.reader
centroid_idx.clusters.append
most_common_label
numpy.argmax
sklearn.datasets.make_classification
ax.scatter
matplotlib.pyplot.cm.get_cmap
matplotlib.pyplot.figure.add_subplot
KNN.predict
numpy.genfromtxt
bootstrap_sample
Node
LinearRegression
self._predict
fig.add_subplot.plot
Adaboost.fit
LinearRegression.predict
Perceptron.predict
enumerate
list
SVM.fit
Adaboost.predict
KMeans.predict
node.is_leaf_node
numpy.sqrt
self.trees.append
sum
matplotlib.pyplot.plot
numpy.swapaxes
self._pdf
DecisionTree.predict
numpy.random.seed
self._information_gain
matplotlib.pyplot.xlabel
KNN.fit
numpy.amax
DecisionStump
Perceptron
len
posteriors.append
numpy.log2
numpy.argmin
numpy.linalg.inv
self.clfs.append
self._get_cluster_labels
Perceptron.fit
numpy.cov
abs
accuracy
LogisticRegression.predict
numpy.array
mean_c.X_c.T.dot
visualize_svm
numpy.bincount
decision_tree.DecisionTree.fit
float
entropy
RandomForest.fit
sklearn.datasets.make_regression
mean_overall.mean_c.reshape
sklearn.datasets.load_iris
LinearRegression.fit
mean_squared_error
NaiveBayes.fit
KMeans.plot
PCA.transform
k_neighbor_labels.Counter.most_common
numpy.loadtxt
cmap
self._sigmoid
RandomForest
decision_tree.DecisionTree
numpy.zeros
sklearn.model_selection.train_test_split
self._split
pandas.read_csv
X_c.mean
X_c.var
self._get_centroids
df.fillna.to_numpy
LDA
fig.add_subplot.set_ylim
split_thresh.X_column.np.argwhere.flatten
collections.Counter.most_common
numpy.full
euclidean_distance
decision_tree.DecisionTree.predict
min
matplotlib.pyplot.scatter
self._most_common_label
print
get_hyperplane_value
matplotlib.pyplot.ylabel
PCA
Adaboost
numpy.corrcoef
self.activation_func
matplotlib.pyplot.subplots
numpy.ones
r2_score
matplotlib.pyplot.get_cmap
LogisticRegression.fit
KNN
open
sklearn.datasets.load_breast_cancer
NaiveBayes.predict
numpy.random.choice
DecisionTree.fit
self._closest_centroid
matplotlib.pyplot.colorbar
collections.Counter
KMeans
LDA.fit
PCA.fit
LogisticRegression

@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.

Repository has no License

Thanks for the wonderful work on this repository! Unfortunately your code does not have a license, could you please add one so it's clear whether we are allowed to reuse it for teaching students etc.?

Thank you!

AdaBoost suggestions

Thanks much for putting this material together!

Looking at Lucky 13: AdaBoost. A few items are a bit unclear for us newbies.

First, in the fit() method, there is just a single pass over the data, X, while the original Freund & Schapire, 1995 paper suggests looping for T iterations, refitting the classifiers on each pass based on the evolving weights. Looks like the version here is based on Zhu, et al 2009. Might be worth a few words to explain the source of the algorithm, and also why this version needs to make only one pass over the samples.

Second, just from a learning perspective, it would be great to provide a data set that mimics the illustrations in the video, just so we can verify that things work as expected. For extra credit, use MatPlotLib to create the decision boundary visualization from the video.

Third, it might be worthwhile pointing out refinements a real design would need. For example, here are the decision stubs created from the test code. Notice that feature 23 is used twice: same polarity, just different threshold. Is this a limitation of this simple example, or actually a useful quirk of AdaBoost?

0: {'polarity': -1, 'feature_idx': 27, 'threshold': 0.1424, 'alpha': 1.2271759901553476}
1: {'polarity': -1, 'feature_idx': 23, 'threshold': 728.3, 'alpha': 0.9273811402788633}
2: {'polarity': -1, 'feature_idx': 1, 'threshold': 19.98, 'alpha': 0.7916733128875748}
3: {'polarity': -1, 'feature_idx': 23, 'threshold': 876.5, 'alpha': 0.6099992009200025}
4: {'polarity': -1, 'feature_idx': 26, 'threshold': 0.2177, 'alpha': 0.5775069918855832}

IndexError: index 6 is out of bounds for axis 0 with size 6

Hi!

I tried yours naive bayes classifier on
https://archive.ics.uci.edu/ml/machine-learning-databases/glass/glass.data

n_classes = len(self._classes) returns classes like [ 1,2,3,4,5,6 ] unlike in iris [0,1,2]

In loop
for c in self._classes:
X_c = X[y==c]
self._mean[c, :] = X_c.mean(axis=0)
self._var[c, :] = X_c.var(axis=0)
self._priors[c] = X_c.shape[0] / float(n_samples)

I will try to go self._mean[6, ;] which will be out of boundry.
Shouldn't it be
for index, c in enumerate(self._classes):
with index insteed of c in calculations?

Multiclass SVM classifier

Hello,
Could you please provide an example for the implementation of multiclass SVM classifier from scratch?
Thanks!

Clustering Predict a single tuple

Hey guys, I am understanding how clustering is working but how to save this model and predict on a new tuple. I want to put this model into production, hence the hassle.

euclidean distance should be sqrt((x1-x2)**2+(y1-y2)**2))

euclidean distance should be sqrt((x1-x2)**2+(y1-y2)**2))
or in more advanced way
euclid_dist= np.linalg.norm(np.array(feature)-np.array(predict))

def knn(data,predict,k=3):
if len(data)>=3:
warnings.warn("len should be less than 3")
distances=[]
for group in data:
for feature in data[group]:

        euclid= np.linalg.norm(np.array(feature)-np.array(predict))
      
        distances.append([euclid,group])
vote=[i[1]for i in sorted(distances)[:k]]
vote_result=Counter(vote).most_common(1)[0][0]

return vote_result

Regression tree

hey,
Could you please provide an implementation of Regression tree from scratch?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.