roboticsclubiitj / ml-dl-implementation Goto Github PK

View Code? Open in Web Editor NEW

46.0 4.0 68.0 12.12 MB

An implementation of ML and DL algorithms from scratch in python using nothing but NumPy and Matplotlib.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

python numpy machine-learning deep-learning nwoc woc matplotlib statistics hacktoberfest

ml-dl-implementation's People

Contributors

Stargazers

Watchers

ml-dl-implementation's Issues

Implement "Divisive Hierarchical Clustering"

One of the least implemented Models for sure. Still, reference Literature associated with it are : -

https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/ <---- (Focus on Divisive Model and Components for each Hierarchical Clustering Model)

https://www.geeksforgeeks.org/ml-hierarchical-clustering-agglomerative-and-divisive-clustering/ <---(Focus on Insight about Divisive Model)

Hereby, Video Links are also attached for better Understanding. Even after all these, if Research Papers are required for clarity, I will post it here then .

https://youtu.be/MIWVfCcHzM4
https://youtu.be/Fm01pqWLqzU

For Program Structure, refer K-Means Clustering at models.py and its utilities at Utils folder, contained in MLlib folder.

Requirement of Graphs, corresponding to each of the Implemented Machine-Learning Models

In order to understand more about influence of Parameters, Hyper-Parameters and Input Dataset on a Machine-Learning Model, following Graphs are important : -

Loss v/s Number of Iterations.
Bias-Variance Tradeoff.
Visual Graphs, which highlight the working of a Model on Input Dataset, taken as an Example.

Reference Literatures for Machine Learning Models related Visualizations are mentioned below : -

https://towardsdatascience.com/machine-learning-visualization-fcc39a1e376a

https://towardsdatascience.com/data-visualization-for-machine-learning-and-data-science-a45178970be7

https://www.tutorialspoint.com/machine_learning_with_python/machine_learning_with_python_understanding_data_with_visualization.htm

Principal component analysis (PCA)

Can I work on this @agrawalshubham01 @kwanit1142

Move all examples into an examples directory for more managed file structure.

Currently, all the examples are placed in the root directory of the repository. For now, there are only 2 examples. However, later this may become an issue and clutter up the root directory. We wish to move these examples into a root/examples directory for better file management.

Implementation of Bernoulli Naive Bayes classifier

This is similar to the multinomial naive bayes but the predictors are boolean variables. The parameters that we use to predict the class variable take up only values yes or no, for example if a word occurs in the text or not.

Resources:
https://kenzotakahashi.github.io/naive-bayes-from-scratch-in-python.html

Implement Polynomial regression

You can use the following resource for reference https://towardsdatascience.com/polynomial-regression-bbe8b9d97491.

The resource uses sklearn but we have to implement a similar API in our library.
Implement polynomial regression in the models.py file in MLlib directory.
Also, add an example showcasing your additions in the examples folder. If you need to add a dataset for that, feel free to add it in
the datasets directory.

Also, you may inherit the LinearRegression class for your implementation just like Logistic Regression inherits it.

Implement "Agglomerative Hierarchical Clustering"

Reference Literature for Working and Specific Details : -

https://stackabuse.com/hierarchical-clustering-with-python-and-scikit-learn/

https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/ <-----(Focus only on Agglomerative Model)

For Program Structure, refer K-Means Clustering at models.py and its utilities at Utils folder, contained in MLlib folder.

Adding more Activation Functions

Hello,
I have checked out the code and found out, you are using only Sigmoid as your activation function
I would like to add other activations like Softmax, Relu, Leaky Relu, Tanh, etc

Other variants of Relu can be added in the script activation.py

I'm willing to add other variants of the Relu activation function for a wide range of applications of this package.

Add PCA algorithm

Can i work on this @agrawalshubham01

Create a proper Readme File for this module. Explain the functions implemented and the code structure properly.

Implementation of Multinomial Naive Bayes classifier

Resources:
https://towardsdatascience.com/multinomial-naive-bayes-classifier-for-text-analysis-python-8dd6825ece67
https://medium.com/@johnm.kovachi/implementing-a-multinomial-naive-bayes-classifier-from-scratch-with-python-e70de6a3b92e

PS: The resources might use pandas, however implementation should be using pure python with numpy.

Get Started with Neural Networks

This issue for later stages of development. Once we have implemented all the basic algorithms it is necessary for us to make our own neural network from our module without any other library then numpy and matplotlib.

Adding Doc strings.

There are many classes and function where the doc strings aren't added.

Inclusion of parameter α in leakyRelu

The current implementation of leakyRelu activation function does not take the value of α or slope of activation function for X<0 into account and directly assumes it to be 0.01, which is pretty much the standard. But the option to choose a custom α(and defaulting its value to 0.01) would be more convenient for the purpose of experimentation with models.

Support vector machine (SVM) with kernels (Linear, Poly, RBF)

Can i work on this @agrawalshubham01

Adaboost Classifier and Regressor implementation from scratch

Adaboost is an important boosting algo in machine learning as it helps to enhance the performance of decision trees over classification and regression problems. It will be really helpful if it implemented and stored in this repo.

Resources :-

https://medium.com/analytics-vidhya/implementing-an-adaboost-classifier-from-scratch-e30ef86e9f1b
https://www.kdnuggets.com/2020/12/implementing-adaboost-algorithm-from-scratch.html
https://towardsdatascience.com/adaboost-from-scratch-37a936da3d50
https://medium.com/analytics-vidhya/implementing-an-adaboost-classifier-from-scratch-e30ef86e9f1b
https://geoffruddock.com/adaboost-from-scratch-in-python/

Note :- Although most of the Logics would be same, but for Regressor Configuration, somethings can vary accordingly.

Incorporate Tensor class in every model.

NOTE: Can only be worked on when issue #62 is solved and closed.

If the Tensor class has been implemented, we now need to incorporate into our repository code throughout.

Since Tensor class should have all the same methods as a NumPy array, duck typing will come into play and everything will work together, however, we need to use Tensors whenever new NumPy arrays are being generated and used.

Add visualization through graphs to plot loss/epoch live while training.

Right we don't have any way to see how our loss per epoch is working or how our parameters are affecting the loss of our algorithm. We need some visualisation through graphs which can be done using matplotlib and cv2. Make some functions that can do the same.

[Bug report] Test error

Describe the bug
Import error to other files after updating the function of activations.py

----------- coverage: platform linux, python 3.6.9-final-0 -----------
Coverage HTML written to dir htmlcov

=========================== short test summary info ============================
ERROR Examples/gaussian_naive_bayes_example.py - ImportError: cannot import n...
ERROR Examples/k_means_clustering_example.py - ImportError: cannot import nam...
ERROR Examples/knn_example.py - ImportError: cannot import name 'sigmoid'
ERROR Examples/linear_example.py - ImportError: cannot import name 'sigmoid'
ERROR Examples/logistic_example.py - ImportError: cannot import name 'sigmoid'
ERROR Examples/naive_bayes_example.py - ImportError: cannot import name 'sigm...
ERROR MLlib/loss_func.py - ImportError: cannot import name 'sigmoid'
ERROR MLlib/models.py - ImportError: cannot import name 'sigmoid'
ERROR MLlib/optimizers.py - ImportError: cannot import name 'sigmoid'
!!!!!!!!!!!!!!!!!!! Interrupted: 9 errors during collection !!!!!!!!!!!!!!!!!!!!
============================== 9 errors in 0.56s ===============================

To Reproduce
Steps to reproduce the behavior:
1. python3 -m pytest --doctest-modules --cov=./ --cov-report=html

Convert Activations from simple functions to classes and add gradient method.

Currently, all the activations in activations.py are simple functions.

However, for future implementation of Neural Networks, we will also need derivatives methods of each of these
functions. You can have a look into loss_func.py for reference. There, each class represents a loss function and has both loss
and derivative methods. You have to implement something similar for activations.

Adding Tests for the repo

Currently we are missing important part of the repo i.e the tests. I believe this couldn't be done in 1 PR. So multiple small PRs could be made. This could be tested by pytest. Then this could be merged with travis-ci.

Add Linear_Regression_using_PyTorch

Can i work on this @agrawalshubham01

Implement Random Forests (After Decision trees)

This issue will be merged only when Decision trees are implemented.
Related Resources one can follow :-

Update Algorithms implemented table in Readme

Currently, some of the links in the algorithms table refer to a separate branch, not the master branch. So the updates will not be shown in the resultant code referred in the links. Can I fix this?

Improve Readme

We need to write and link every algorithm and function implemented in our library in the readme.
Currently, there is no way to know what functions and algorithms have been implemented, so it is very hard for newcomers to understand the repository. We want to fix that by providing links in the Readme to all the functions and algorithms that are currently implemented :)

Add issue template

Can I make issue templates to enhance the workflow of the repository?

Implement K-Means Clustering

Related resources one can follow:
https://towardsdatascience.com/k-means-clustering-from-scratch-6a9d19cafc25

Implementing Huber loss function

Resources for the same:

https://towardsdatascience.com/importance-of-loss-function-in-machine-learning-eddaaec69519
https://towardsdatascience.com/understanding-the-3-most-common-loss-functions-for-machine-learning-regression-23e0ef3e14d3

Implement Principal Component Analysis

A Flexible Unsupervised Method to reduce the dimensionality of Input dataset, alongside preserving the information and minimizing the loss as much as possible.

Reference Literature for understanding ;

https://jakevdp.github.io/PythonDataScienceHandbook/05.09-principal-component-analysis.html

Implementation of Bayesian optimization

Resources:
https://machinelearningmastery.com/what-is-bayesian-optimization/
https://towardsdatascience.com/bayesian-optimization-concept-explained-in-layman-terms-1d2bcdeaf12f

Implement Support Vector Machine as a Classifier and Regressor

Related Resource one can follow :-

https://towardsdatascience.com/svm-implementation-from-scratch-python-2db2fc52e5c2
https://www.python-engineer.com/courses/mlfromscratch/07_svm/
https://pythonprogramming.net/svm-in-python-machine-learning-tutorial/
https://fordcombs.medium.com/svm-from-scratch-step-by-step-in-python-f1e2d5b9c5be
http://madhugnadig.com/articles/machine-learning/2017/07/29/implementing-svm-support-vector-machines-from-scratch-in-python.html

Add contributing.md file

I think it is important to have contributing guidelines so that an open-source contributor know how to contribute properly.
Can I work on this?

Implement Naive Bayes classification

Related Resources to Follow :-

Add more loss functions like cross-entropy...

Right we have implemented only Mean Squared Error and Logarithmic Error as our loss functions in our module. As we are going to implement more and more machine learning algorithms we will need more loss functions. There in this issue you can add one or more loss functions like cross-entropy etc.

Resources:-

https://keras.io/api/losses/ <--------(Most of the known loss functions)

https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
https://heartbeat.fritz.ai/5-regression-loss-functions-all-machine-learners-should-know-4fb140e9d4b0
https://www.analyticsvidhya.com/blog/2019/08/detailed-guide-7-loss-functions-machine-learning-python-code/

Implement Decision Trees

Related Resources one can Follow :-

Implement One hot encoding

Hello,
I would like to add One hot Encoding in utils

Wrong function in script activation.py

Wrong Function
The unit_step function in the script activation.py is wrongly implemented by me and got merged also. I realized my fault while writing its test script. Allow me to correct it.

Implement "Numerical outlier method" , to detect Anomaly/Outlier points in Dataset

Changes in loss functions

As we have discussed in the gitter channel that the dimensions of the input matrices are wrongly mentioned in the loss_function.py, therefore need to be changed. Also, the dimensions of the W and Y vectors are not mentioned. Apart from doc change, there is also an additional 2 is divided in the mean squared error which is not there in original loss function(compared to sklearn's method). So allow me to apply the mentioned changes and show you.

Code of Conduct

The project is currently missing the Code of Conduct file which is important for open source project
So can I go for it @rohansingh9001

Implementing Absolute Error loss function

Resources for the same:

Add pull request template

Can I add pull request template to this project to enhance the workflow of the repository?

Necessities of Score Metrics, for evaluation of ML Classifier Models

Reference Literatures for this issue are as follows : -

https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234

https://towardsdatascience.com/20-popular-machine-learning-metrics-part-1-classification-regression-evaluation-metrics-1ca3e282a2ce <---------------------(Refer this first and foremost)

Implementation of Gaussian Naive Bayes

In continuation of my previous #43 , where we can predict the class for given label( P(y/x)), further using Gaussian Naive bayes we can predict the label for given class( P(xi/y) ).
I will start working on it.

Implement Ridge regression a.k.a L2 Regularization

Resources:
https://www.geeksforgeeks.org/implementation-of-ridge-regression-from-scratch-using-python/
https://machinelearningmastery.com/ridge-regression-with-python/

Change All Functions and Matrices to accepted standards in the DL community.

Implement autograd.

NOTE: Can only be worked on after issue #62 is completed and closed.

Autograd is a feature which lets us backpropagate gradients by automatically computing the gradients on a Tensor.

Such features come in very handy when implementing deep neural networks.

You can use the following resource to implement autograd - https://medium.com/@a.nikishaev/making-backpropagation-autograd-mnist-classifier-from-scratch-in-python-bec4f05ce09f

Wrap all numpy arrays as an Tensor class.

Currently, we are using all raw NumPy arrays in the directory.

However, we might need more functionality from these arrays specific to our class when we implement
features like Autogradient.

For now, we want to wrap all Numpy arrays in a simple Tensor class which inherits from the NumPy array class.

Option to save best weights during training

ML-DL-implementation/MLlib/models.py

Lines 8 to 29 in e30d8d4

 class LinearRegression(): 

 def fit(self, X, Y, optimizer=GradientDescent, epochs=25, zeros=False): 

 self.weights = generate_weights(X.shape[1], 1, zeros=zeros) 

 print("Starting training with loss:", 

 optimizer.loss_func.loss(X, Y, self.weights)) 

 for epoch in range(1, epochs+1): 

 print("======================================") 

 self.weights = optimizer.iterate(X, Y, self.weights) 

 print("epoch:", epoch) 

 print("Loss in this step: ", 

 optimizer.loss_func.loss(X, Y, self.weights)) 

 print("======================================\n") 

 print("Finished training with final loss:", 

 optimizer.loss_func.loss(X, Y, self.weights)) 

 print("=====================================================\n") 

 def predict(self, X): 

 return np.dot(X, self.weights)

In above code only the last weight is being saved, instead one can give user option to take which weight

 class LinearRegression(): 
     def fit(self, X, Y, optimizer=GradientDescent, epochs=25, zeros=False): 
         _weight ={weight:None, loss = None} 
         self.weights = generate_weights(X.shape[1], 1, zeros=zeros)  
         print("Starting training with loss:", 
               optimizer.loss_func.loss(X, Y, self.weights)) 
         for epoch in range(1, epochs+1): 
             print("======================================") 
             self.weights = optimizer.iterate(X, Y, self.weights) 
             print("epoch:", epoch) 
             print("Loss in this step: ", 
                   optimizer.loss_func.loss(X, Y, self.weights)) 
         print("======================================\n") 
         print("Finished training with final loss:", 
               optimizer.loss_func.loss(X, Y, self.weights))
         print("=====================================================\n")
         if _weight['weight'] is None:
             _weight['weight']=self.weights
             _weight['loss'] = optimizer.loss_func.loss(X, Y, self.weights))
         elif _weight[weight] is not None and optimizer.loss_func.loss(X, Y, self.weights))<_weight['loss']:
             _weight['loss'] = optimizer.loss_func.loss(X, Y, self.weights))
             _weight['weight']=self.weights
     def predict(self, X, weights= 'best'): 
         if weights == 'best':
              return  np.dot(X, _weight['weight'])
        elif weights =='last':
             return np.dot(X, self.weights)

In this way we can save best/last weight.

	class LinearRegression():

	def fit(self, X, Y, optimizer=GradientDescent, epochs=25, zeros=False):

	self.weights = generate_weights(X.shape[1], 1, zeros=zeros)

	print("Starting training with loss:",
	optimizer.loss_func.loss(X, Y, self.weights))
	for epoch in range(1, epochs+1):
	print("======================================")
	self.weights = optimizer.iterate(X, Y, self.weights)
	print("epoch:", epoch)
	print("Loss in this step: ",
	optimizer.loss_func.loss(X, Y, self.weights))

	print("======================================\n")
	print("Finished training with final loss:",
	optimizer.loss_func.loss(X, Y, self.weights))
	print("=====================================================\n")

	def predict(self, X):
	return np.dot(X, self.weights)

roboticsclubiitj / ml-dl-implementation Goto Github PK

ml-dl-implementation's People

Contributors

Stargazers

Watchers

Forkers

ml-dl-implementation's Issues

Recommend Projects

Recommend Topics

Recommend Org