Giter VIP home page Giter VIP logo

scs-3253-067-capstone-project's Introduction

Introduction

This is the repo containing the capstone project for the University of Toronto School of Continuing Studies (SCS) machine learning course (3253-067) that takes place from Sept 21, 2023 to Dec 7, 2023. This README file is used to document the project's information, and it will be updated as the project progresses.

Project Members

  • Will Huang
  • Anmole Bajwa

Project Description

Purpose / Objective Statement

  • What is the main problem to address? Why does it matter?
  • Clearly define the population / phenomenon that this project is trying to model / predict / forecast.
  • What kind of modeling problem is it (regression; classification / clustering; etc.)
  • How does ML address this problem and add value that other solutions cannot?

Project Scope

Here is a list of activities to be accomplished during this project.

  • Problem identification and statement
  • Initial planning and identifying potential solutions
    • Literature research
    • Relevant regulations (if applicable)
  • Data collection. exploration and analysis
  • Data selection, transformation and feature engineering
  • Model assumptions
  • Model training, tuning and selection
  • Measure and analyze model performance
  • Benchmarking
  • Identify model limitations
  • State the final model(s) developed
  • Design a model monitoring plan
  • Model deployment
  • Discussions on future enhancements
  • Model risk assessment

Initial Planning and Identifying Potential Solutions

  • Briefly re-state the problem here.
  • Do a brief literature research to identify the most prominent solutions to the problem that are currently available in academia and / or industry.
  • If applicable and time allows, do research to identify all regulatory requirements that place constraint on how the model can be developed and used.

Technological Composition

Describe the technological tools that are used to develop and implement the model(s). For example, describe:

  • The computer hardware and / or cloud platform(s) used.
  • The programming languages (such as Python) used.
  • The databases for hosting the datasets used to train the model.
  • The model's infrastructure and / or pipeline, such as how the model is connected to the database.

Data Composition

Clearly describe the following about the dataset.

  • Provide a short sentence describing what this dataset represents.
  • State the source(s) of this dataset.
    • Is it sourced from one place or multiple places?
    • How reliable / reputable is this source?
    • How was this data collected? Is there any potential of bias in the data collection process?
  • Data representativeness: Given the intended target population defined above, how representative is the dataset to the target population?
  • What is the dataset format (e.g., tabular; unstructured; etc.)? If it is comprised of multiple datasets, how are these datasets related to each other (e.g., related by primary and foreign key)?
  • Data composition
    • How many records are in the dataset?
    • What are the variables available? What are their data types (e.g., integer, float, string, date, etc.)?
    • For each variable, how many records are available, and how many are missing?
    • Are there outliers in the dataset? Do they warrant removal?

Given the information above, how reliable will the model be?

Data Analysis

Inspect the data both numerically and visually. Refer to chapter 2 of the textbook for examples.

Be sure to isolate a training set from the full dataset before doing the data analysis too far; this will help to prevent data snooping.

Data Transformation and Feature Engineering

Benchmarking

Develop a simple model that represents the "base-level" to compare the developed the model with.

scs-3253-067-capstone-project's People

Contributors

willhuangongit avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.