Forest Cover Type Prediction

Capstone project (RS School ML Course 2022)

This project uses Forest Cover Type Prediction dataset.

Project navigation

Project structure

This package was developed, using WSL: Ubuntu-20.04. Python version 3.9.2

|- .github/workflows                   <---- GitHub action configuration
|- assets                              <---- Screenshots of the code check
|- models                              <---- Models predictions                 
│- src  
│   └─ cover_type_classifier           <---- Source code for the project
│       │- models                      <---- Scripts for training, tuning models, making predictions
│       |   |- knn_train.py
│       |   |- random_forest_train.py
│       |- data                        <---- Scripts for data processing (EDA report)
│       |   |- generate_eda.py
│       |   |- get_dataset.py
│       │- init.py                     <---- Makes src a Python module
│- tests                               <---- Model and data functions tests
|   │- conftest.py
|   │- test_data.py
|   │- test_models.py
|- .gitignore
|- LICENSE
|- README.md                           <---- Project description
|- mypy.ini                            <---- mypy configuration
|- noxfile.py                          <---- nox configuration
|- poetry.lock                         <---- Project dependencies
|- pyproject.toml                      <---- Project dependencies

Configuring local enviroment

This package allows you to train model for forest cover type prediction.

Before starting using this package, it is necessary to check the version of Python. It can be done with the following command:

python --version

This project requiers Python of version upper 3.9.

Also check whether Poetry is installed.

poetry --version

If everything is installed, move to the usage instruction.

Clone repository.
Download dataset from the website Forest Cover Type Prediction, save csv locally (default path is data/external/train.csv and data/external/test.csv in repository's root).
Install necessary dependencies by running the following commands in terminal.

Possible options:

For only package usage run

poetry install --no-dev

If you want to use this package in development aim, run

poetry install

Usage

This project provides the following abilities:

Generate EDA report using pandas-profiler or sweetviz and save report on the local machine.

poetry run generate-eda \
--profiler <pandas-profiler or sweetviz> \
--dataset-path <path to csv file> \
--report-path <directory, to save report>

Example of the command output

Train classifiers
NOTE (for reviewers) Making predictions for data in test.csv takes much time, so in order to make homework check faster I added parameter nrows. It will allow to read only part of the data. By default all row of the dataset will be read.

kNN

poetry run knn-train \
--dataset-path <path to train data> \
--test-path <path to test data> \
--report-path <path, where to save predictions> \
--nrows <number of rows to read from file> \
--n-neighbors <knn param: number of neighbors> \
--weights <knn param: distance weights> \
--min-max-scaler <use feature scaling> \
--remove-irrelevant-features <removes irrelevant features>

Random Forest

poetry run rf-train \
--dataset-path <path to train data> \
--test-path <path to test data> \
--report-path <path,where to save predictions> \
--nrows <number of rows to read from file> \
--max-features <random forest param: number of features, used in each tree> \
--n-estimators <random forest param: the number of trees in the forest> \
--min-samples-leaf <random forest param: minimum number of samples required to be at a leaf node> \
--min-max-scaler <use feature scaling> \
--remove-irrelevant-features <removes irrelevant features>

Example of command output

Experiments

Manual parameter tuning experiments can be seen in MLFlow UI. Run MLFrow using the following command:

  poetry run mlflow ui

Then follow the link listed under Listening at (for example, http://127.0.0.1:<port>). After clicking on the link you will see the scoreboard with the results of the experiments. For example, the following screenshot shows the results of the experiment with kNN and Random Forest Classifier.

Development

Formatting and linting project

To format the code, run the following command:
```
poetry run black src
```
And for linting code, use the command
```
poetry run flake8
```
Then, you will get something like this:
Checking type annotation

To check static type annotation, run the following command:
```
poetry run mypy src
```
If there are no errors, you will get the following report:
Testing the code

To run test type the following command in terminal:
```
poetry run pytest
```
If all tests are passed, you will get the following report:
Running multiple sessions

All above step can be combined into one command, using nox:
```
poetry run nox
```
If all sessions were complited succesfully, you will get the following report:

Also this repository supports Github action, which allows linting and testing code when you commit changes. If there is no errors you Action tab will look like the following

temp-forks / model_selection_evaluation Goto Github PK

model_selection_evaluation's Introduction

Forest Cover Type Prediction

Capstone project (RS School ML Course 2022)

Project navigation

Project structure

Configuring local enviroment

Usage

Experiments

Development

model_selection_evaluation's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent