๐ "Machine Learning" project (Artificial Intelligence, UniGe)
A detailed report and a presentation on this project can be found here:
This repository contains the implementation of the Random Forest algorithm, a supervised machine learning method used for classification and regression tasks. It combines the output of multiple decision trees to reach a single result by taking the average or majority of predictions. The algorithm's success is attributed to the low correlation between its models (trees), making it an extension of the bagging method that utilizes both bagging and feature randomness.
Here's a more schematic representation of the Random Forest algorithm:
- Data Generation:
- A training set is created by drawing a data sample from the original dataset with replacement, called the bootstrap sample.
- One-third of the training sample is set aside as test data, known as the out-of-bag (oob) sample.
- Construction and Training of the Model:
- Each tree in the ensemble is created from a bootstrap sample.
- Feature randomness is introduced through a random selection of features to split each node, adding more diversity to the dataset and reducing the correlation among decision trees.
- Cross-Validation and Prediction:
- The oob sample is used for cross-validation, finalizing the prediction.
- The algorithm's output is based on majority voting or averaging, which helps solve the problem of overfitting.
- The Random Forest algorithm is very stable, as even if a new data point is introduced in the dataset, the overall algorithm is not affected much since new data may impact one tree, but it is challenging for it to impact all the trees.
Advantages:
- It can be used in classification and regression problems.
- It's more accurate than the decision tree algorithm.
- It solves the problem of overfitting.
- This algorithm is very stable.
Disadvantages:
- It is highly complex compared to decision trees.
- Training time is longer compared to other models due to its complexity.
To run the Random_Forest_algorithm.ipynb file, follow these steps:
- Download the Random_Forest_algorithm.ipynb file from the provided link.
- Install the required libraries by running the following commands in your terminal or command prompt:
pip install numpy
pip install pandas
pip install scikit-learn
pip install matplotlib
- Open the Random_Forest_algorithm.ipynb file using Jupyter Notebook or any other compatible IDE.
- Run the cells in the notebook sequentially, following the instructions and comments provided in the notebook.