roboticsclubiitj / ml-dl-implementation Goto Github PK
View Code? Open in Web Editor NEWAn implementation of ML and DL algorithms from scratch in python using nothing but NumPy and Matplotlib.
License: BSD 3-Clause "New" or "Revised" License
An implementation of ML and DL algorithms from scratch in python using nothing but NumPy and Matplotlib.
License: BSD 3-Clause "New" or "Revised" License
One of the least implemented Models for sure. Still, reference Literature associated with it are : -
https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/ <---- (Focus on Divisive Model and Components for each Hierarchical Clustering Model)
https://www.geeksforgeeks.org/ml-hierarchical-clustering-agglomerative-and-divisive-clustering/ <---(Focus on Insight about Divisive Model)
Hereby, Video Links are also attached for better Understanding. Even after all these, if Research Papers are required for clarity, I will post it here then .
https://youtu.be/MIWVfCcHzM4
https://youtu.be/Fm01pqWLqzU
For Program Structure, refer K-Means Clustering at models.py and its utilities at Utils folder, contained in MLlib folder.
In order to understand more about influence of Parameters, Hyper-Parameters and Input Dataset on a Machine-Learning Model, following Graphs are important : -
Reference Literatures for Machine Learning Models related Visualizations are mentioned below : -
https://towardsdatascience.com/machine-learning-visualization-fcc39a1e376a
https://towardsdatascience.com/data-visualization-for-machine-learning-and-data-science-a45178970be7
Can I work on this @agrawalshubham01 @kwanit1142
Currently, all the examples are placed in the root directory of the repository. For now, there are only 2 examples. However, later this may become an issue and clutter up the root directory. We wish to move these examples into a root/examples directory for better file management.
This is similar to the multinomial naive bayes but the predictors are boolean variables. The parameters that we use to predict the class variable take up only values yes or no, for example if a word occurs in the text or not.
Resources:
https://kenzotakahashi.github.io/naive-bayes-from-scratch-in-python.html
You can use the following resource for reference https://towardsdatascience.com/polynomial-regression-bbe8b9d97491.
The resource uses sklearn but we have to implement a similar API in our library.
Implement polynomial regression in the models.py file in MLlib directory.
Also, add an example showcasing your additions in the examples folder. If you need to add a dataset for that, feel free to add it in
the datasets directory.
Also, you may inherit the LinearRegression class for your implementation just like Logistic Regression inherits it.
Reference Literature for Working and Specific Details : -
https://stackabuse.com/hierarchical-clustering-with-python-and-scikit-learn/
https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/ <-----(Focus only on Agglomerative Model)
For Program Structure, refer K-Means Clustering at models.py and its utilities at Utils folder, contained in MLlib folder.
Hello,
I have checked out the code and found out, you are using only Sigmoid as your activation function
I would like to add other activations like Softmax, Relu, Leaky Relu, Tanh, etc
I'm willing to add other variants of the Relu activation function for a wide range of applications of this package.
Can i work on this @agrawalshubham01
Resources:
https://towardsdatascience.com/multinomial-naive-bayes-classifier-for-text-analysis-python-8dd6825ece67
https://medium.com/@johnm.kovachi/implementing-a-multinomial-naive-bayes-classifier-from-scratch-with-python-e70de6a3b92e
PS: The resources might use pandas, however implementation should be using pure python with numpy.
This issue for later stages of development. Once we have implemented all the basic algorithms it is necessary for us to make our own neural network from our module without any other library then numpy and matplotlib.
There are many classes and function where the doc strings aren't added.
The current implementation of leakyRelu activation function does not take the value of α or slope of activation function for X<0 into account and directly assumes it to be 0.01, which is pretty much the standard. But the option to choose a custom α(and defaulting its value to 0.01) would be more convenient for the purpose of experimentation with models.
Can i work on this @agrawalshubham01
Adaboost is an important boosting algo in machine learning as it helps to enhance the performance of decision trees over classification and regression problems. It will be really helpful if it implemented and stored in this repo.
Resources :-
https://medium.com/analytics-vidhya/implementing-an-adaboost-classifier-from-scratch-e30ef86e9f1b
https://www.kdnuggets.com/2020/12/implementing-adaboost-algorithm-from-scratch.html
https://towardsdatascience.com/adaboost-from-scratch-37a936da3d50
https://medium.com/analytics-vidhya/implementing-an-adaboost-classifier-from-scratch-e30ef86e9f1b
https://geoffruddock.com/adaboost-from-scratch-in-python/
Note :- Although most of the Logics would be same, but for Regressor Configuration, somethings can vary accordingly.
NOTE: Can only be worked on when issue #62 is solved and closed.
If the Tensor class has been implemented, we now need to incorporate into our repository code throughout.
Since Tensor class should have all the same methods as a NumPy array, duck typing will come into play and everything will work together, however, we need to use Tensors whenever new NumPy arrays are being generated and used.
Right we don't have any way to see how our loss per epoch is working or how our parameters are affecting the loss of our algorithm. We need some visualisation through graphs which can be done using matplotlib and cv2. Make some functions that can do the same.
Describe the bug
Import error to other files after updating the function of activations.py
----------- coverage: platform linux, python 3.6.9-final-0 -----------
Coverage HTML written to dir htmlcov
=========================== short test summary info ============================
ERROR Examples/gaussian_naive_bayes_example.py - ImportError: cannot import n...
ERROR Examples/k_means_clustering_example.py - ImportError: cannot import nam...
ERROR Examples/knn_example.py - ImportError: cannot import name 'sigmoid'
ERROR Examples/linear_example.py - ImportError: cannot import name 'sigmoid'
ERROR Examples/logistic_example.py - ImportError: cannot import name 'sigmoid'
ERROR Examples/naive_bayes_example.py - ImportError: cannot import name 'sigm...
ERROR MLlib/loss_func.py - ImportError: cannot import name 'sigmoid'
ERROR MLlib/models.py - ImportError: cannot import name 'sigmoid'
ERROR MLlib/optimizers.py - ImportError: cannot import name 'sigmoid'
!!!!!!!!!!!!!!!!!!! Interrupted: 9 errors during collection !!!!!!!!!!!!!!!!!!!!
============================== 9 errors in 0.56s ===============================
To Reproduce
Steps to reproduce the behavior:
1. python3 -m pytest --doctest-modules --cov=./ --cov-report=html
Currently, all the activations in activations.py are simple functions.
However, for future implementation of Neural Networks, we will also need derivatives methods of each of these
functions. You can have a look into loss_func.py for reference. There, each class represents a loss function and has both loss
and derivative methods. You have to implement something similar for activations.
Currently we are missing important part of the repo i.e the tests. I believe this couldn't be done in 1 PR. So multiple small PRs could be made. This could be tested by pytest. Then this could be merged with travis-ci.
Can i work on this @agrawalshubham01
This issue will be merged only when Decision trees are implemented.
Related Resources one can follow :-
Currently, some of the links in the algorithms table refer to a separate branch, not the master branch. So the updates will not be shown in the resultant code referred in the links. Can I fix this?
We need to write and link every algorithm and function implemented in our library in the readme.
Currently, there is no way to know what functions and algorithms have been implemented, so it is very hard for newcomers to understand the repository. We want to fix that by providing links in the Readme to all the functions and algorithms that are currently implemented :)
Can I make issue templates to enhance the workflow of the repository?
Related resources one can follow:
https://towardsdatascience.com/k-means-clustering-from-scratch-6a9d19cafc25
A Flexible Unsupervised Method to reduce the dimensionality of Input dataset, alongside preserving the information and minimizing the loss as much as possible.
Reference Literature for understanding ;
https://jakevdp.github.io/PythonDataScienceHandbook/05.09-principal-component-analysis.html
Related Resource one can follow :-
https://towardsdatascience.com/svm-implementation-from-scratch-python-2db2fc52e5c2
https://www.python-engineer.com/courses/mlfromscratch/07_svm/
https://pythonprogramming.net/svm-in-python-machine-learning-tutorial/
https://fordcombs.medium.com/svm-from-scratch-step-by-step-in-python-f1e2d5b9c5be
http://madhugnadig.com/articles/machine-learning/2017/07/29/implementing-svm-support-vector-machines-from-scratch-in-python.html
I think it is important to have contributing guidelines so that an open-source contributor know how to contribute properly.
Can I work on this?
Right we have implemented only Mean Squared Error and Logarithmic Error as our loss functions in our module. As we are going to implement more and more machine learning algorithms we will need more loss functions. There in this issue you can add one or more loss functions like cross-entropy etc.
Resources:-
https://keras.io/api/losses/ <--------(Most of the known loss functions)
https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
https://heartbeat.fritz.ai/5-regression-loss-functions-all-machine-learners-should-know-4fb140e9d4b0
https://www.analyticsvidhya.com/blog/2019/08/detailed-guide-7-loss-functions-machine-learning-python-code/
Hello,
I would like to add One hot Encoding in utils
Wrong Function
The unit_step function in the script activation.py is wrongly implemented by me and got merged also. I realized my fault while writing its test script. Allow me to correct it.
As we have discussed in the gitter channel that the dimensions of the input matrices are wrongly mentioned in the loss_function.py, therefore need to be changed. Also, the dimensions of the W and Y vectors are not mentioned. Apart from doc change, there is also an additional 2 is divided in the mean squared error which is not there in original loss function(compared to sklearn's method). So allow me to apply the mentioned changes and show you.
The project is currently missing the Code of Conduct file which is important for open source project
So can I go for it @rohansingh9001
Can I add pull request template to this project to enhance the workflow of the repository?
Reference Literatures for this issue are as follows : -
https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234
https://towardsdatascience.com/20-popular-machine-learning-metrics-part-1-classification-regression-evaluation-metrics-1ca3e282a2ce <---------------------(Refer this first and foremost)
In continuation of my previous #43 , where we can predict the class for given label( P(y/x)), further using Gaussian Naive bayes we can predict the label for given class( P(xi/y) ).
I will start working on it.
NOTE: Can only be worked on after issue #62 is completed and closed.
Autograd is a feature which lets us backpropagate gradients by automatically computing the gradients on a Tensor.
Such features come in very handy when implementing deep neural networks.
You can use the following resource to implement autograd - https://medium.com/@a.nikishaev/making-backpropagation-autograd-mnist-classifier-from-scratch-in-python-bec4f05ce09f
Currently, we are using all raw NumPy arrays in the directory.
However, we might need more functionality from these arrays specific to our class when we implement
features like Autogradient.
For now, we want to wrap all Numpy arrays in a simple Tensor class which inherits from the NumPy array class.
ML-DL-implementation/MLlib/models.py
Lines 8 to 29 in e30d8d4
In above code only the last weight is being saved, instead one can give user option to take which weight
class LinearRegression():
def fit(self, X, Y, optimizer=GradientDescent, epochs=25, zeros=False):
_weight ={weight:None, loss = None}
self.weights = generate_weights(X.shape[1], 1, zeros=zeros)
print("Starting training with loss:",
optimizer.loss_func.loss(X, Y, self.weights))
for epoch in range(1, epochs+1):
print("======================================")
self.weights = optimizer.iterate(X, Y, self.weights)
print("epoch:", epoch)
print("Loss in this step: ",
optimizer.loss_func.loss(X, Y, self.weights))
print("======================================\n")
print("Finished training with final loss:",
optimizer.loss_func.loss(X, Y, self.weights))
print("=====================================================\n")
if _weight['weight'] is None:
_weight['weight']=self.weights
_weight['loss'] = optimizer.loss_func.loss(X, Y, self.weights))
elif _weight[weight] is not None and optimizer.loss_func.loss(X, Y, self.weights))<_weight['loss']:
_weight['loss'] = optimizer.loss_func.loss(X, Y, self.weights))
_weight['weight']=self.weights
def predict(self, X, weights= 'best'):
if weights == 'best':
return np.dot(X, _weight['weight'])
elif weights =='last':
return np.dot(X, self.weights)
In this way we can save best/last weight.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.