This project uses Forest Cover Type Prediction dataset.
This package was developed, using WSL: Ubuntu-20.04. Python version 3.9.2
|- .github/workflows <---- GitHub action configuration
|- assets <---- Screenshots of the code check
|- models <---- Models predictions
│- src
│ └─ cover_type_classifier <---- Source code for the project
│ │- models <---- Scripts for training, tuning models, making predictions
│ | |- knn_train.py
│ | |- random_forest_train.py
│ |- data <---- Scripts for data processing (EDA report)
│ | |- generate_eda.py
│ | |- get_dataset.py
│ │- init.py <---- Makes src a Python module
│- tests <---- Model and data functions tests
| │- conftest.py
| │- test_data.py
| │- test_models.py
|- .gitignore
|- LICENSE
|- README.md <---- Project description
|- mypy.ini <---- mypy configuration
|- noxfile.py <---- nox configuration
|- poetry.lock <---- Project dependencies
|- pyproject.toml <---- Project dependencies
This package allows you to train model for forest cover type prediction.
Before starting using this package, it is necessary to check the version of Python. It can be done with the following command:
python --version
This project requiers Python of version upper 3.9.
Also check whether Poetry is installed.
poetry --version
If everything is installed, move to the usage instruction.
- Clone repository.
- Download dataset from the website Forest Cover Type Prediction, save csv locally (default path is data/external/train.csv and data/external/test.csv in repository's root).
- Install necessary dependencies by running the following commands in terminal.
Possible options:
For only package usage run
poetry install --no-dev
If you want to use this package in development aim, run
poetry install
This project provides the following abilities:
- Generate EDA report using pandas-profiler or sweetviz and save report on the local machine.
poetry run generate-eda \ --profiler <pandas-profiler or sweetviz> \ --dataset-path <path to csv file> \ --report-path <directory, to save report>
-
Train classifiers
NOTE (for reviewers) Making predictions for data in test.csv takes much time, so in order to make homework check faster I added parameter nrows. It will allow to read only part of the data. By default all row of the dataset will be read.kNN
poetry run knn-train \ --dataset-path <path to train data> \ --test-path <path to test data> \ --report-path <path, where to save predictions> \ --nrows <number of rows to read from file> \ --n-neighbors <knn param: number of neighbors> \ --weights <knn param: distance weights> \ --min-max-scaler <use feature scaling> \ --remove-irrelevant-features <removes irrelevant features>
Random Forest
poetry run rf-train \ --dataset-path <path to train data> \ --test-path <path to test data> \ --report-path <path,where to save predictions> \ --nrows <number of rows to read from file> \ --max-features <random forest param: number of features, used in each tree> \ --n-estimators <random forest param: the number of trees in the forest> \ --min-samples-leaf <random forest param: minimum number of samples required to be at a leaf node> \ --min-max-scaler <use feature scaling> \ --remove-irrelevant-features <removes irrelevant features>
Manual parameter tuning experiments can be seen in MLFlow UI. Run MLFrow using the following command:
poetry run mlflow ui
Then follow the link listed under Listening at (for example, http://127.0.0.1:<port>). After clicking on the link you will see the scoreboard with the results of the experiments. For example, the following screenshot shows the results of the experiment with kNN and Random Forest Classifier.
-
Formatting and linting project
To format the code, run the following command:
poetry run black src
And for linting code, use the command
poetry run flake8
-
Checking type annotation
To check static type annotation, run the following command:
poetry run mypy src
-
Testing the code
To run test type the following command in terminal:
poetry run pytest
-
Running multiple sessions
All above step can be combined into one command, using nox:
poetry run nox
If all sessions were complited succesfully, you will get the following report:
Also this repository supports Github action, which allows linting and testing code when you commit changes. If there is no errors you Action tab will look like the following