Giter VIP home page Giter VIP logo

code-312 / rescue-chicago Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 15.35 MB

Repository for work related to a interactive data dashboard that can be used to analyze how different dog characteristics may correlate with average length of stay in a shelter prior to adoption.

Home Page: https://code312-rescue-trends-2659be78e6b4.herokuapp.com/

Shell 0.02% Procfile 0.01% Python 20.70% Jupyter Notebook 79.27%
data-visualization pandas python

rescue-chicago's People

Contributors

ecooperman avatar fiasco071 avatar irmaarios avatar jared-kunhart avatar jaydrojas avatar jjd129 avatar kaylarobinson077 avatar theechris avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

rescue-chicago's Issues

Data Pipeline Script

Do alllll of the data stuff in one script, parameterized by city. This will run everything from calling the petfinder API to cleaning to syncing data up to Postgres.

Plot Trends Over Time

Right now we show aggregate counts / LOS across all dogs over all time. If we could plot trends over time, it would be interesting to see how LOS or counts have varied over time.

  • Create a new "Trends Over Time" page(s)
  • Add plot with x-axis of month - year, y-axis of length of stay
  • Add plot with x-axis of month - year, y-axis of count of dogs
  • Extra Credit: Let users add different colored lines to the plot corresponding to "groups". There's several options of how we do this that would be valuable, some ideas / options below:
    • "group by" an attribute (age, size, etc) and make a colored line on the plots per unique value in that category
    • "group by" breed, but since there's a huge number of breeds, we'd probably want to do something like we have in the sidebar where you can select a number of random breeds, or can specifically type in a list of breeds
    • "filter by" where you can bunch through a bunch of attributes to define each group, sort of like the two columns we have on the other pages

Parameterize Data Pipeline by City

Or some different way, but goal is to create data files by city. Also when the data goes into the Postgres table, add in a column with that city's name

Heroku Uses Our Repo

Right now Heroku is looking at @ecooperman 's personal GitHub repo clone, which seems a little fragile. We should figure out a way to either point to our C4C repo, or some type of manual sync over process.

More Trend Options

Give more options for trends to look at!

Current state:
Screen Shot 2022-10-21 at 10 15 43 AM

Ideas of some things to add:

  • organization (once @Jared-Kunhart 's PR is merged and the db is updated)
  • color (primary)
  • location (once we add it to the db)

Automated Feature Selection

Feature values in "breed_primary","gender","coat","color_primary","color_secondary","color_tertiary" vary significantly. The model features are manually typed out but values change in each feature depending on the data pull.

To reduce the amount of manual work and remain consistent, feature pulls for X should be automated based on values.

Adoptability Model Improvements

Feature Selection

  • Descriptive stats on color, breed to learn which are correlated to LOS
  • Test for multicollinearity between features

Data Quality

  • Remove outliers (based on LOS)

Model Evaluation

  • Add additional evaluation metrics

Research PetFinder Resources

Some datasets from PetFinder have already been curated and used for other applications like Kaggle competitions.

This ticket is to research prior competitions or community projects that have used PetFinder data, and see if anything from this could be useful. For example, there might be an existing dataset we could pull from, ideas of how to engineer features, etc.

One starting place could be this Kaggle competition, or this corresponding dataset in TensorFlow.

You can document your findings either in this GitHub repo, or our shared Google Drive

Remove Outliers

There's some crazy data that we ingest from PetFinder - like dogs that have apparently been up for adoption for 10+ years. We'll want to remove any obviously wrong data from our database, so that it doesn't lead to misleading conclusion in the dashboards.

Some ideas for analysis to help inform outlier removal:

  • Plot of typical LOS by posted date (thought being that maybe older postings have less reliable data)
  • Histogram of LOS data (thought being that maybe there's some process to automatically remove dogs after some amount of time, e.g. maybe their postings "expire")
  • Boxplot of LOS by organization (thought being that maybe some organizations are better or worse than others at being diligent about their data)

Forecasting Intake V2

Christine from Rescue Chicago expressed that having forecasts for intake would both help them plan internally, and communicate out their anticipated needs to transfer facilities. For example, this can help transfer facilities plan their long-distance transfers in anticipation of CACC needs.

This ticket is to iterate on the initial forecasting POC that @TheeChris completed, and see if we can improve it, and also explore any of the driving factors for intake rates.

Some ideas to get started might include exploring forecasting methods, accounting for known trends like seasonality, handling the crazy data from 2020, etc.

Plotly Comparison Visualization

Currently Breed Trends by Length of Stay has a comparison chart using Plotly. However Other Trends by LoS, Breed Trends by Count, Other Trends by Count don't use Plotly and have two separate charts instead. Combine them into one comparison chart.

Organization Trends by Length of Stay

There is pages for Breed Trends by Length of Stay and Breed Trends by Count. Kayla and I thought it would be great to compare by organizations as well. Whether this comes in the form of a separate page or just select boxes on those pages would be great.

Preprocess Features for a Model

Most machine learning models expect exclusively numeric input features. Some (most?) of our features are categories (puppy, young, adult... or breed names for example).

Let's use pandas.DataFrame as the data structure in preparing our dataset for modeling. Scikit-learn, the most commonly used ML package, supports this datatype for running models.

Preprocessing ideas:

  • Any True/False feature can be converted to 0/1 values
  • Any ordinal features (like age category, or size category) should be mapped to numbers (e.g. baby: 0, young: 1, adult: 2, senior: 3)
  • Categorical variables without ordering should be one-hot encoded. Scikit-learn has a helpful function here. For dog breeds, since there's a huge number of possibilities, I'd recommend only keeping the most popular breeds. So, you might play around with the parameter min_frequency or max_categories so we don't end up with more than ~50 or so breeds
  • Drop any columns that we don't want included in the model. For example, name or ID number should probably be dropped

I think it would make the most sense to organize this as a new step that runs on the output from data_cleaner. We could call it data_preprocessor?

Unit Tests for Data Pipeline

To follow best practices, we should write unit tests ๐Ÿ˜„ This ticket is to add unit tests for our data pipeline functions

More Features!

Not all the data we get from the API makes it through the data cleaner and into our database. Let's change that!

Some ideas of features to add:

  • description
  • photo urls
  • tags
  • location

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.