Giter VIP home page Giter VIP logo

model_selection_evaluation's Introduction

Forest Cover Type Prediction

Capstone project (RS School ML Course 2022)

This project uses Forest Cover Type Prediction dataset.

Project navigation

Project structure

This package was developed, using WSL: Ubuntu-20.04. Python version 3.9.2

|- .github/workflows                   <---- GitHub action configuration
|- assets                              <---- Screenshots of the code check
|- models                              <---- Models predictions                 
│- src  
│   └─ cover_type_classifier           <---- Source code for the project
│       │- models                      <---- Scripts for training, tuning models, making predictions
│       |   |- knn_train.py
│       |   |- random_forest_train.py
│       |- data                        <---- Scripts for data processing (EDA report)
│       |   |- generate_eda.py
│       |   |- get_dataset.py
│       │- init.py                     <---- Makes src a Python module
│- tests                               <---- Model and data functions tests
|   │- conftest.py
|   │- test_data.py
|   │- test_models.py
|- .gitignore
|- LICENSE
|- README.md                           <---- Project description
|- mypy.ini                            <---- mypy configuration
|- noxfile.py                          <---- nox configuration
|- poetry.lock                         <---- Project dependencies
|- pyproject.toml                      <---- Project dependencies

Configuring local enviroment

This package allows you to train model for forest cover type prediction.

Before starting using this package, it is necessary to check the version of Python. It can be done with the following command:

python --version

This project requiers Python of version upper 3.9.

Also check whether Poetry is installed.

poetry --version

If everything is installed, move to the usage instruction.

  1. Clone repository.
  2. Download dataset from the website Forest Cover Type Prediction, save csv locally (default path is data/external/train.csv and data/external/test.csv in repository's root).
  3. Install necessary dependencies by running the following commands in terminal.

Possible options:

For only package usage run

poetry install --no-dev

If you want to use this package in development aim, run

poetry install 

Usage

This project provides the following abilities:

  • Generate EDA report using pandas-profiler or sweetviz and save report on the local machine.
    poetry run generate-eda \
    --profiler <pandas-profiler or sweetviz> \
    --dataset-path <path to csv file> \
    --report-path <directory, to save report>

Example of the command output report

  • Train classifiers
    NOTE (for reviewers) Making predictions for data in test.csv takes much time, so in order to make homework check faster I added parameter nrows. It will allow to read only part of the data. By default all row of the dataset will be read.

    kNN

    poetry run knn-train \
    --dataset-path <path to train data> \
    --test-path <path to test data> \
    --report-path <path, where to save predictions> \
    --nrows <number of rows to read from file> \
    --n-neighbors <knn param: number of neighbors> \
    --weights <knn param: distance weights> \
    --min-max-scaler <use feature scaling> \
    --remove-irrelevant-features <removes irrelevant features> 

    Random Forest

    poetry run rf-train \
    --dataset-path <path to train data> \
    --test-path <path to test data> \
    --report-path <path,where to save predictions> \
    --nrows <number of rows to read from file> \
    --max-features <random forest param: number of features, used in each tree> \
    --n-estimators <random forest param: the number of trees in the forest> \
    --min-samples-leaf <random forest param: minimum number of samples required to be at a leaf node> \
    --min-max-scaler <use feature scaling> \
    --remove-irrelevant-features <removes irrelevant features> 

    Example of command output rf

Experiments

Manual parameter tuning experiments can be seen in MLFlow UI. Run MLFrow using the following command:

  poetry run mlflow ui

Then follow the link listed under Listening at (for example, http://127.0.0.1:<port>). After clicking on the link you will see the scoreboard with the results of the experiments. For example, the following screenshot shows the results of the experiment with kNN and Random Forest Classifier. mlflow_experiments

Development

  • Formatting and linting project

    To format the code, run the following command:

    poetry run black src

    And for linting code, use the command

    poetry run flake8

    Then, you will get something like this: linting_code

  • Checking type annotation

    To check static type annotation, run the following command:

    poetry run mypy src

    If there are no errors, you will get the following report: linting_code

  • Testing the code

    To run test type the following command in terminal:

    poetry run pytest

    If all tests are passed, you will get the following report: pytest_report

  • Running multiple sessions

    All above step can be combined into one command, using nox:

    poetry run nox

    If all sessions were complited succesfully, you will get the following report: nox_sessions

Also this repository supports Github action, which allows linting and testing code when you commit changes. If there is no errors you Action tab will look like the following github_action

model_selection_evaluation's People

Contributors

victorialebedeva avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.