Introduction

This is the repo containing the capstone project for the University of Toronto School of Continuing Studies (SCS) machine learning course (3253-067) that takes place from Sept 21, 2023 to Dec 7, 2023. This README file is used to document the project's information, and it will be updated as the project progresses.

Project Members

Will Huang
Anmole Bajwa

Project Description

Purpose / Objective Statement

What is the main problem to address? Why does it matter?
Clearly define the population / phenomenon that this project is trying to model / predict / forecast.
What kind of modeling problem is it (regression; classification / clustering; etc.)
How does ML address this problem and add value that other solutions cannot?

Project Scope

Here is a list of activities to be accomplished during this project.

Problem identification and statement
Initial planning and identifying potential solutions
- Literature research
- Relevant regulations (if applicable)
Data collection. exploration and analysis
Data selection, transformation and feature engineering
Model assumptions
Model training, tuning and selection
Measure and analyze model performance
Benchmarking
Identify model limitations
State the final model(s) developed
Design a model monitoring plan
Model deployment
Discussions on future enhancements
Model risk assessment

Initial Planning and Identifying Potential Solutions

Briefly re-state the problem here.
Do a brief literature research to identify the most prominent solutions to the problem that are currently available in academia and / or industry.
If applicable and time allows, do research to identify all regulatory requirements that place constraint on how the model can be developed and used.

Technological Composition

Describe the technological tools that are used to develop and implement the model(s). For example, describe:

The computer hardware and / or cloud platform(s) used.
The programming languages (such as Python) used.
The databases for hosting the datasets used to train the model.
The model's infrastructure and / or pipeline, such as how the model is connected to the database.

Data Composition

Clearly describe the following about the dataset.

Provide a short sentence describing what this dataset represents.
State the source(s) of this dataset.
- Is it sourced from one place or multiple places?
- How reliable / reputable is this source?
- How was this data collected? Is there any potential of bias in the data collection process?
Data representativeness: Given the intended target population defined above, how representative is the dataset to the target population?
What is the dataset format (e.g., tabular; unstructured; etc.)? If it is comprised of multiple datasets, how are these datasets related to each other (e.g., related by primary and foreign key)?
Data composition
- How many records are in the dataset?
- What are the variables available? What are their data types (e.g., integer, float, string, date, etc.)?
- For each variable, how many records are available, and how many are missing?
- Are there outliers in the dataset? Do they warrant removal?

Given the information above, how reliable will the model be?

Data Analysis

Inspect the data both numerically and visually. Refer to chapter 2 of the textbook for examples.

Be sure to isolate a training set from the full dataset before doing the data analysis too far; this will help to prevent data snooping.

Data Transformation and Feature Engineering

Benchmarking

Develop a simple model that represents the "base-level" to compare the developed the model with.

willhuangongit / scs-3253-067-capstone-project Goto Github PK

scs-3253-067-capstone-project's Introduction

Introduction

Project Members

Project Description

Initial Planning and Identifying Potential Solutions

Technological Composition

Data Composition

Data Analysis

Data Transformation and Feature Engineering

Benchmarking

scs-3253-067-capstone-project's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent