Day1 of 66DaysOfData!
π‘ Logistic Regression:
- Logistic Regression is the appropriate regression analysis to conduct when the dependent variable is binary. It is used to describe data and to explain the relationship between one dependent binary varible and one or more nominal, ordinal, interval or ratio-level varaibles.
- Binary or Binomial Logistic Regression can be understood as the type of Logistic Regression that deals with scenarios wherein the observed outcomes for dependent variables can be only in binary, i.e., it can have only two possible types.
- Multinomial Logistic Regression works in scenarios where the outcome can have more than two possible types β type A vs type B vs type C β that are not in any particular order.
Day2 of 66DaysOfData!
π‘ Gradient Descent:
-
It is an algorithm to find the minimum of a convex function. It is used in algorithm, for example, in linear regression. Gradient descent is an iterative optimization algorithm that is popular and it is a base for many other optimization techniques, which tries to obtain minimal loss in a model by tuning the weights/parameters in the objective function.
There are threee types of Gradient Descent: i. Batch Gradient Descent ii. Stochastic Gradient Descent iii. Mini Batch Gradient Descent
Steps to achieve minimal loss: 1. Decide your cost function. 2. Choose random initial values for parameters ΞΈ, 3. Find derivative of your cost function, 4. Choosing appropriate learning rate, 5. Update your parameters till you converge. This is where, you have found optimal ΞΈ values where your cost function, is minimum.
Day3 of 66DaysOfData!
π‘ Perceptron Algorithm:
- The Perceptron is one of the simplest ANN architectures, invented by Frank Rosenblatt. It is based on a slightly different artificial neuron called a threshold logic unit (TLU).
- Perceptron algorithm is a simple classification method that plays an important role in development of the much more felxible neural network and are trained using the stochastic gradient descent optimization algorithm.
- It consists of single node or neuron that takes a row of data as input and predicts a class label. This is achieved by calculating the weighted sum of the inputs and a bias (set to 1). The weighted sum of the input is called activation.
Day4 of 66DaysOfData!
π‘ K Nearest Neighbor:
-
K-Nearest Neighbor is a Supervised Machine Learning Algorithm that is used to solve classificaiton as well as regression problems.
-
It is probably the first machine leanring algorithm developed and due to its simple nature, it is still widely accepted in solving many industrial problems.
-
Whenever new test sample comes, it tries to verify the similarity of the test sample with its training sample
Properties which might define KNN well: 1. Lazy learning algorithm β KNN is a lazy learning algorithm because it does not have a specialized training phase and uses all the data for training while classification. 2. Non-parametric learning algorithm β KNN is also a non-parametric learning algorithm because it doesnβt assume anything about the underlying data. Steps to be carried out during the KNN algorithm are as follow: 1. First we need to select the number of neighbors we want to consider. 2. We need to find the K-Neighbors based on any distance metric, that can be Euclidean/Manhatten/or custom distance metric. [The most commonly used method to calculate distance is Euclidean.] 3. Among selected K - neighbors, we need to count how many neighbors are form the different classes 4. Assign the test data sample to the class for which the count of neighbors was maximum
Day5 of 66DaysOfData!
π‘ Decision Tree:
- Decision tree is the powerful and popular tool for classification and regression that splits data-feature values into branches at decision nodes (eg, if a feature is a color, each possible color becomes a new branch) until a final decision output is made.
- Generally, Decision tree are nothing but a giant structure of nested if-else condition. Mathematically, decision tree use hyperplanes which run parallel to any one of the axes to cut coordinate system into hyper cuboids.
- Also, I learned about Entropy, GINI impurity, information gain, hyperparameters, overfitting, underfitting in decision tree.
- For regression, purity means the first child should have observations with high values of the target variable and the second should have observations with low values and similarly, for classification, purity means the first child should have observations primarily of one class and the second should have observations primarily of another.
Day6 of 66DaysOfData!
π‘ Ensemble Voting Classifier:
- A voting ensemble is an ensemble machine learning model that combines the predicitons from multiple other models. It implements hard and soft voting. In voting classifer, a hard voting ensemble picks class label that has the prediciton with the heighest number of votes, whereas soft voting classifies input data based on the probabilities of all the predictions made by different classifiers. Weights applied to each classifier get applied appropriately based on the given equation. I have presented the Implementation Voting Classifier using the Iris dataset here in the Snapshot. Excited about the days ahead!