ubc-mds / 532_group_22 Goto Github PK

The Criminality in Canada: Fighting Anecdotes with Data app allows users to explore Canadian crime data and trends by location.

Home Page: https://canadian-crime.herokuapp.com/

License: MIT License

Python 100.00%

leaflet choropleth barplot time-series crime-statistics slider dropdown tabs multiselect-dropdown python

532_group_22's Introduction

DSCI 532 - Group 22

The project housed in this repository has been created in partial fulfillment of the requirements of DSCI 532: Visualization II of the University of British Columbia's Master of Data Science program (2020/2021). The purpose of this project is to create an interactive dashboard that allows users to explore and interact with a data set. Click here to view our proposal for Criminality in Canada: Fighting Anecdotes with Data.

For our Milestone 2 release, our Dashboard can be found here: https://canadian-crime.herokuapp.com/

The data source is Incident-based crime statistics, by detailed violations, Canada, provinces, territories and Census Metropolitan Areas released by Statistics Canada.

Welcome 🎉

Hi everyone, thanks for visiting the Criminality in Canada: Fighting Anecdotes with Data app project repository.

The document you are currently reading (README) is here to provide an overview and give some information regarding our project. Click the links below to take you to a section in which you're interested, or just scroll down to find out more.

What are we doing? (And why?)
Who are we?
Installation
Team Members
Teamwork Contract
License

What we are doing? (And why?)

The general public's perception and understanding of crime is extremely skewed, due to the fact that the main source of information regarding crime is often derived from commercial mass media, which relies extensively on scary stories and scaremongering tactics to obtain and keep viewers attention.

Our Solution

To address this issue, our team have decided to create a dashboard which allows the public to easily view and explore Canadian crime data, hence making it easier for the general public to make informed decisions.

Who are we?

The founders of this app are (Cal Schafer, Ifeanyi Anene, Sasha Babicki, Steffen Pentelow) lovely Masters of Data Science (MDS) students at The University of British Columbia.

The development of this app is overseen by our wonderful DSCI 532: Data Visualizations II Instructor and the respective teaching assistants TAs (Analise, Andy, Chris, Afshin). A pictorial visualization of our proposed dashboard can be seen below.

Installation

From the root folder run the following commands to activate the environment:

conda env create -f group22env.yaml

conda activate group22env

To run the app locally, run the following command from the root of this repository

python src/app.py

Contributions

Feedback and suggestions are always welcome! We have included some issues labeled as enhancements, as suggestions for anyone interested in contributing to the project. Please read the contributing guidelines to get started.

Team Members

Cal Schafer
Ifeanyi Anene
Sasha Babicki
Steffen Pentelow

License

The Incident-based crime statistics, by detailed violations, Canada, provinces, territories and Census Metropolitan Areas data contains information licensed under the Open Government License – Canada (version 2.0).

532_group_22's People

Contributors

Watchers

Forkers

calschafer sbabicki spentelow calsvein ifyanene7

532_group_22's Issues

Improve the README (Optional)

5. Improve the README (Optional)

rubric={reasoning:2}

Expand on the README file to be a welcoming place for anyone coming
to your project for the first time.
For your project,
your README should cater to at least two groups of people
(on bigger projects these can be separated and put in different files):

Those potentially interested in using your dashboard
- Include motivation behind your project and clearly explain
  what problem you are solving and why it is important.
- You do not have to include detailed usage instructions,
  just high level what they can do in your dashboard and and the deployed link.
- This is a good example
Those interesting in helping you develop your dashboard
- Potential contributors are interested in the above as well,
  but also need to know how they can install your app and how to run it locally
  (maybe they are great in Altair but have never used Dash).
- Suggestions for what you would like help with and how to work in your project,
  some of this can go in contributing also.
- This is an example of a program I made as part of my thesis.

Including a table of contents can be useful,
as well as a short GIF of your dashboard doing something impressive.
No matter how many nice words you put down,
seeing the functionality right when they land on your GH page
is very useful to evoke interest.

Milestone 1 Feedback

Hi Group 22,

I enjoyed reading your proposal. It was well written and included necessary details in appropriate detail.
I like how there is an order to how the user would use the dashboard and a reason behind how you set it up. The dashboard concept is clear and well designed.
Tab 1: I am worried the text in the bar graph might be too small to read if you show all the data. Subsetting or a scroll bar might help the used look at the data easier. Where will you put the legend for the choropleth map?
Tab 2: will there be a max number of CMAs Sarah can select to compare?
Will the abbreviation CMA be explained anywhere in the dashboard?

Reflection Document

4. Reflection

rubric={reasoning:6}

Reflect on what you think your dashboard does well
what its limitations are,
and what are good future improvements and additions.
This section should not be more than 500 words
and the reflection-milestone2.md document should live in your GitHub.com repo
in the doc folder.

Interactive Dash app in Python and Altair

3. Interactive Dash app in Python and Altair

rubric={accuracy:10, quality:5, viz:15}

Implement the dashboard you outlined in your proposal.
Keep your usage scenario and target audience in mind when designing your interface.
Aim to implement most of your dashboard's functionality, but not everything.
- Since the complexity varies between proposals,
  the rough goal here is to have around 3 plots
  and most of their widgets
  and interactivity implemented
- The app should be clearly usable,
  so focus on the most important things first.
- In the upcoming milestones you will have time to improve your app
  based on your proposal and the feedback you have received.
- The TAs will give you feedback on how to adjust the overall complexity
  of your final app for milestone 3 and 4 (if needed). For this milestone,
  use the above directions.
Your interface should be as self-documenting as possible,
with appropriate labels for panes and widgets,
legends documenting the meaning of visual encodings,
and a meaningful title for the app.
Note that TAs will be grading your app on Heroku in a full-screen window

It can be easy to get sucked into a rabbit hole when trying to implement a stubborn feature
(I know this all too well myself =p).
While it is important to build your troubleshooting skills,
it is often even more important to build your time management skills
and we do not want one annoying bug to prevent you from completing your app.
Compromises may need to be made - this is a short project.
You can add the bells and whistles at the later milestones and
if you're struggling with a particularly tough problem,
save it for later and ask a TA for help!

Code of Conduct

Remove clearing dropdowns option

Missing dependency - alt.data_transformers

The line alt.data_transformers.enable("data_server") in tab1.py causes an error when I try to run it with the current environment file. I'm not sure what the dependency is called, but it needs to be added to the yaml file if we keep that line in.

Update About on repo

Please modify your GitHub repo description in the top right corner where it says "About" to include

A short description of your app (might already be there)
The link to your deployed dashboard (it is often useful to still keep this in the README as well)
A few keywords describing which plots, widgets, and interactions you have used in your dashboards, like I have done in my demo app.
If I have time during the break, I will use these to make a resource where you can easily find each others dashboards without searching through the public GitHub repos directly. I think this will be useful to reference back to for capstone and later. You can DM me on slack if your group does not want to be part of this for some reason.

Add link to heroku app to README

https://canadian-crime.herokuapp.com/

Deployment on Heroku

rubric={accuracy:3}

Deploy your app on Heroku and include the link to your deployed dashboard clearly visible near the top of your README.
Don't push to Heroku after the milestone deadline.
We will compare the milestone release commits with the deployed app so updating it after the deadline will give a late penalty. If you want your newest changes deployed online, you can create a new heroku repo.
This week, you're also going to setup Heroku's GitHub integration to automate your deploys, so that you have a branch in your github repo that is automatically deployed to Heroku.
Create a new branch for this on GitHub that you name deployment.
Don't wait to deploy until Saturday night, you will not have time to solve potential issues.
Deploy early and check that things are working, then redeploy every now and then.
After making the milestone release, make a final push to Heroku to redeploy the miletone app.
Make sure to take away debug=True when you are deploying to Heroku, there should not be a blue debug button on the page your target audience will visit!

Submit Milestone 2 to Canvas

Once you have finished the work for this milestone
you must create a release on GitHub.com before the submission deadline.
- Please read the GitHub documentation on how to create a release via the online interface. Name your release with the respective milestone name.
- We will grade all files in the repo at the state they were in when you created the release.
  This means that you can continue to make changes in the repo without worrying about messing up your grading for the previous milestone.
The only file you need to submit to Canvas is the one called canvas-submission.html.
- I changed the file ending to HTML so it renders up on canvas.
- This file is in your github.ubc.ca repo.
- Submit this file manually on Canvas and only once per group.
- Make sure to add a link to your milestone release in this file and leave the rest as is (this facilitates grading).

Create environment.yaml (or/and requirements.txt)

Add any packages you use here

Tab 1 - implement violation subcategory

Not implemented yet. It needs to be a subset of the Violations column, based on filtering the "Crime Level" column = 1.

Add text to app

I think it would be useful to add some text explaining how to use the app, what some of the key terms in the data mean, and where the data is from.

Tab 2

#26
This milestone

4 Plots
CMA vs Province radio button
CMA/Province selection (based on radio button response)

Future milestone

Move legend to menu on left
Metric dropdown

Tab 2 Updates

Troubleshoot: Plots not showing up
Location dropdown: have list of locations remain after switching from Province to CMA and back.
Set default plots on Tab 2

Tab 2- get rid of year commas

Select categories and subcategories to include

Peer feedback from Group21

Hi, Group 22,
Hope you are doing great. This is frank from group 21. After looking at your proposal and the dashboard you have been working on, I am really interested in this project. Here is my feedback on your milestone2 release.

`README.md` file

As a potential user of your app, I really like the way you wrote the readme file. It has a clear structure and sections are well explained and organized.
For the screenshot that you included in the readme file, I think it would be great to include the screenshot of your real app interface instead of the prototype given that it was milestone2.
For the convenience of potential contributors who are interested in contributing to your project, it would be great to include the instructions on how they can run the app, and suggestions on what aspects they can contribute to for the project.

`proposal.md` file

This file looks great to me.

dashboard interface

It's great to see a lot of interactivities on all plots on your dashboard interface.
However, regarding usability, I think the appearance can be improved. For example,
- the font of the two tabs could be bigger & bolded.
- You can also add more white space between filters.
- would be great if you add a side note to explain what the empty plots mean based on user selections (e.g. no data).

Thank you! Let me know if you have any questions or want to have a further discussion with me.

Writeup for Section 2

Section 2: Description of the data
rubric={reasoning:8,writing:2}

You are allowed to select any dataset you want for this project, as long as you have the license to use it publicly. Warning: finding a good data set can take a lot of time and effort. We therefore recommend that you select one that you have worked with in a previous lab in MDS and that you are already familiar with (for example the Gapminder, movie, or language data sets from 531 (all are on OneDrive)).

A few datasets that have been popular in previous years:

https://www.kaggle.com/zynicide/wine-reviews/data
https://www.kaggle.com/osmi/mental-health-in-tech-survey
https://github.com/themarshallproject/city-crime
Good general resources for finding interesting datasets:

https://github.com/fivethirtyeight/data
https://github.com/the-pudding/data
https://www.kaggle.com/datasets
In your proposal, briefly describe the dataset and the variables that you will visualize. If your are planning to visualize a lot of columns, provide a high level descriptor of the variable types rather than listing every single column. For example, indicate that the dataset contains a variety of categorical variables for demographics and provide a brief list rather than describing every single variable. You may also want to consider visualizing a smaller set of variables given the short duration of this project. This might include brief exploratory data analysis for you to grasp what could be interesting aspects to look at in your data. We will not be grading the EDA aspect, but feel free to include your EDA notebooks in the public GitHub repo, so that you have everything in one place.

Example writeup:

We will be visualizing a dataset of approximately 300,000 missed patient appointments. Each appointment has 15 associated variables that describe the patient who made the appointment (patient_id, gender, age), the health status (health_status) of the patient (Hypertension, Diabetes, Alcohol intake, physical disabilities), information about the appointment itself (appointment_id, appointment_date), whether the patient showed up (status), and if a text message was sent to the patient about the appointment (sms_sent). Using this data we will also derive a new variable, which is the predicted probability that a patient will show up for their appointment (prob_show).

Remember if your dataset has a lot of columns, stick to summaries and avoid listing out every single column. The example also differentiates columns that come with the dataset (i.e. Age) from new variables that you might derive for your visualizations (i.e ProbShow) - you should make a similar distinction in your write-up if you can. Another example of a good description of a dataset is the Kaggle world happiness report.

Update repo folder structure

We might want to consider updating the folders. For instance, there are 3 image files for the sketch which should probably be in a folder. Just not sure what to name it :P
If we change the location of the image files we need to remember to update the references in the README and any other document that links to them.

Teamwork Contract

Tab 1 - add a Year Select dropdown

Submit Milestone 4 to Canvas

Once you have finished the work for this milestone you must create a release on GitHub.com before the submission deadline.
Please read the GitHub documentation on how to create a release via the online interface. Name your release with the respective milestone name.
We will grade all files in the repo at the state they were in when you created the release. This means that you can continue to make changes in the repo without worrying about messing up your grading for the previous milestone.
The only file you need to submit to Canvas is the one called canvas-submission.html.
This file is in your github.ubc.ca repo.
Submit this file manually on Canvas and only once per group.
Make sure to add a link to your milestone release in this file and leave the rest as is (this facilitates grading).
Optionally include a link to an auto-deployed Heroku PR (see section 6).

Submit Milestone 1 to Canvas

Colour map enhancements - Tab 1

Colourscale
Add Legend
Hover info on map (add units/format)
Highlight colour

Tab 1

#26

This milestone

Violation per CMA plot
Violation dropdown
Metric dropdown

Future milestone

Violation per province plot (in progress, currently commented out)
Subcategory dropdown
Sorting for CMA plot
Year slider

Update README based on peer feedback

#67
README.md file
As a potential user of your app, I really like the way you wrote the readme file. It has a clear structure and sections are well explained and organized.

For the screenshot that you included in the readme file, I think it would be great to include the screenshot of your real app interface instead of the prototype given that it was milestone2.
For the convenience of potential contributors who are interested in contributing to your project, it would be great to include the instructions on how they can run the app, and suggestions on what aspects they can contribute to for the project.

Title and Tabs

Add Title
Switch to dbc tabs

Misc fixes/improvements (includes Joel feedback)

Add a Title to the tabs (+ perhaps a short description/explanation)
remove the coding numbers that are part of the Geographies and the Violation Descriptions
PEI is being recorded as a CMA. , for Geography = "Prince Edward Island [11]", we need to change the value for the Province column to = "PROVINCE".

def functions documented

Add year slider

Add widget
Connect barplot and choropleth

Peer feedback: the font of the two tabs could be bigger & bolded.

Not sure if we want to do this one or not

Deployment on Heroku

2. Deployment on Heroku

rubric={accuracy:8}

Deploy your app on Heroku
and include the link to your deployed dashboard clearly visible near the top of your README.
Don't push to Heroku after the milestone deadline.
- We will compare the milestone release commits with the deployed app
  so updating it after the deadline will give a late penalty.
  If you want your newest changes online,
  you can create a new heroku repo.
Since your app.py will be inside the src folder,
you need to change the Procfile to web: gunicorn src.app:server
instead of what it is in the dash deployment docs.
I recommend creating requirements.txt manually
and only fix the versions of dash and plotly.
- Don't forget to include gunicorn.
Don't wait to deploy until Saturday night
after you have implemented every single feature you want.
You will not have time to solve potential issues.
- Deploy early and check that things are working,
  then redeploy every now and then,
  especially after adding new package dependencies.
- After making the milestone2 release,
  make a final push to Heroku to redeploy the miletone2 app.

Tab 1 & 2 - Alphabetize the dropdown lists

The dropdown lists are unordered. They should be sorted alphabetically.

Sketch and Description

Description of your app & sketch
rubric={viz:10}

Building from your research questions and usage scenarios, give a high-level description of the interface for the app you will build. Remember to be realistic about your expectations and plans since you will actually be implementing this app (but again, you will not be penalized if you need to adjust a bit in later milestones). It is better to design a slightly more limited app that you have time to implement well, instead of a complicated app that you don't have time to finish. At the same time, you cannot just make a single barchart and call it a day. The app needs to have a few plot panels, use the visualizations from previous students shown in lecture one as a guide as a complexity target for the final app.

In this description you are not required to use terminology specific to Dash apps (i.e. widgets, components, etc...) or make reference to specific Python or R libraries. Your sketch can be hand-drawn or mocked up using a graphics editor. If you can show the app visual design & interaction design in a single image that is ideal, but if you need more space to show some other planned features of your app you can include max three images for this proposal.

The description should be about 200-300 words and live in the README.md file of your GitHub.com repository. The sketch should be linked in the README.md file of your GitHub.com repository underneath the high level description so that the image shows up on GitHub.

Example description

The app contains a landing page that shows the distribution (depending on data type, bar chart, density chart etc) of dataset factors (hypertension, physical disabilities etc.) colored coded according to whether patients showed up or didn't show up for an appointment. From a dropdown list, users can filter out variables from the distribution display, by patient demographics (i.e. only show female patients), by appointment data (i.e. if SMS was sent), and finally by the date range of appointments. A different dropdown menu will allow users to re-order variables according to the probability of patients being a no-show or in alphabetical order to comorbidities. Users can compare the distribution of co-morbidities by scrolling down through the app interface.

Example sketch

dashBoard

This sketch was drawn using Powerpoint with icons from the noun project. You can use others graphics tools (i.e. Inkscape, GIMP, Photoshop, Illustrator, etc.) or you can even draw you app by hand and upload the scanned version of your drawing. Whatever you choose to do, make sure that the final image in your report is legible.

Implement value sorting in bar chart

-At minimum sort the bar chart from largest value to smallest value
-optional: allow user to flip the sorting order (largest to smallest, smallest to largest)

Writeup for Section 1

Section 1: Motivation and Purpose
rubric={reasoning:8,writing:2}

In a few sentences, provide motivation for why you are creating a dashboard. Who is your target audience, and what role are you embodying? What problem could your dashboard solve for the intended user? You can read the Project background section for some rough ideas. Be brief and clear.

Example writeup:

Our role: Data scientist consultancy firm

Target audience: Health care administrators

Missed medical appointments cost the healthcare system a lot of money and affects the quality of care. If we could understand what factors lead to missed appointments it may be possible to reduce their frequency. To address this challenge, we propose building a data visualization app that allows health care administrators to visually explore a dataset of missed appointments to identify common factors. Our app will show the distribution of factors contributing to appointment show/no show and allow users to explore different aspects of this data by filtering and re-ordering on different variables in order to compare factors that contribute to absence.

add colors to bar chart by province

I think adding the ability to filter by provinces would be cool. That way the CMA barplot on the right won’t be as long, we would just show CMAs in the provinces selected. It would also help for the choropleth since it will change the scale based on only the province of interest.

This could be implemented as a multi-select dropdown. Let me know in the comments if you like the idea or not!

Writeup Section 3

Section 3: Research questions and usage scenarios
rubric={reasoning:12,writing:2}

The purpose of this section is to get you to think about how your target audience might use the app you're to designing and to account for those needs in the proposal.

For this it can be helpful to create a brief persona description of a member in your intended target audience and write small user story for what they might do with your app. User stories are typically written in a narrative style and include the specific context of usage, tasks associated with that use context, and a hypothetical walkthrough of how the user would accomplish those tasks with your app. If you are using a Kaggle dataset, you may use their "Overview (inspiration)" to create your usage scenario.

An example usage scenario with tasks (tasks are indicated in brackets, i.e. [task], and are optional to include)

Mary is a policy maker with the Canadian Ministry of Health and she wants to understand what factors lead to missed appointments in order to devise an intervention that improves attendance numbers. She wants to be able to [explore] a dataset in order to [compare] the effect of different variables on absenteeism and [identify] the most relevant variables around which to frame her intervention policy. When Mary logs on to the "Missed Appointments app", she will see an overview of all the available variables in her dataset, according to the number of people that did or did not show up to their medical appointment. She can filter out variables for head-to-head comparisons, and/or rank patients according to their predicted probability of missing an appointment. When she does so, Mary may notice that "physical disability" appears to be a strong predictor missing appointments, and in fact patients with a physical disability also have the largest number of missed appointments. She hypothesizes that patients with a physical disability could be having a hard time finding transportation to their appointments, and decides she needs to conduct a follow-on study since transportation information is not captured in her current dataset.

Note that in the above example, "physical disability" being an important variable is fictional. You don't need to conduct an analysis of your data to figure out what is important or not. Instead, estimate what someone might find, and how they may use this information.

Design update

Add app description on sidebar
Peer feedback #67 You can also add more white space between filters.
TA feedback #66 Nice touch adding a graph save option. It would be helpful if you made a note to let the user know they can do this. It was hard to notice.
Peer feedback #67 Would be great if you add a side note to explain what the empty plots mean based on user selections (e.g. no data).
Peer feedback #67 : the font of the two tabs could be bigger & bolded.

Data Wrangling

Clean data
Save in data/processed/

Document your functions' functionality (Optional)

You have all already written good docstring for your functions, right??? Well then, congrats! Your good habits have been awarded with free points in this lab. If not, this is your chance to remedy the situation. Write proper docstrings for all functions, including a description of what the function parameters do, as you have learnt in previous courses. Clear comments where needed in the code is also a plus.

Update folder structure

GitHub folder structure

Since we now have a mix of many different file types,
let's tidy things up a bit.
Use a project structure similar to what we learnt in 521:

project/
├── data/            .csv .hdf .pkl .feather
│   ├── processed/
│   └── raw/
├── src/             .py .R
├── reports/         .ipynb .Rmd
├── doc/             .md
├── environment.yaml (or/and requirements.txt)
├── README.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
└── LICENSE.md

The difference between the reports and doc folders
is that the former contains analytic reports often involving code
(such as notebooks)
whereas the latter is more project documentation.
So where should you put your project proposal and reflections?
I would suggest the doc folder,
but remember that these are guidelines and not strict rule,
there are other sensible folder structures too.
You can upload any analysis you do along the way to explore the data in the reports folder,
but the analysis itself will not be reviewed by the TAs.

Fix text formating for french characters

E.g. Montréal is showing up as Montr?al

It's not super important, just something we can deal with in a future milestone.

MIlestone 2 Feedback

HI team 22,

I found your reflection document very easy to read and was organized in a way that made it very easy to understand what you have and haven’t deployed, why or why not, etc. Thank you!
Overall, well done getting most of your components working! The dashboard is clean, and not too cluttered.
In regards to your question about manually cleaning the y-axis of your Tab 1 bar graph, removing the [number] from each label would be more than enough. I don’t think you have to also abbreviate the provinces, not that I think this would actually be too hard with a set of replace commands. I would instead focus on ordering the data in the bar graph, either from highest to lowest or by province. The current order doesn’t make sense and I think is limiting the amount of useful information that someone can get from looking at it. Having the x-axis at the bottom of the graph outside the immediate view is not ideal. Consider putting it at the top of the graph as well, and even more ideal would be for it to float as you scroll. Since region is not a selection on Tab 1 you could even break the bar chart into multiple plots, one for each region, and then order each from highest to lowest.
For tab 2: As I was using this tab, I found it a little frustrating that my location selection was lost when I toggled between the Province and CMA options. - Thank you for defining CMA in your reflection for me! Otherwise, well done on this tab, the selections work, and the graphs are well done.
Nice touch adding a graph save option. It would be helpful if you made a note to let the user know they can do this. It was hard to notice.

Address TA Feedback

#40

Subsetting or a scroll bar for barchart on tab 1 might help the used look at the data easier
Put legend for the choropleth map somewhere
Limit CMAs to select
Display CMA abbreviation

Reflection

In this section, your group should document on what you have implemented in your dashboard so far and explain what is not yet implemented. It is important that you include what you know is not working in your dashboard, so that your TAs can distinguish between features in development and bugs. Since this is the last milestone, you really need to motivate well why you have not chosen to include some feature that you were planning on including previously.

This week it is suitable to include thoughts on the feedback you received from your peer and/or TA, e.g.

Has it been easy to use your app?
Are there reoccurring themes in your feedback on what is good and what can be improved?
Is there any feedback (or other insight) that you have found particularly valuable during your dashboard development?
This section should be around 300-500 words and the reflection-milestone4.md document should live in your GitHub.com repo in the doc folder.

Setup app reviews with Heroku (Optional)

Heroku has a neat functionality where you can set it up to atomically deploy branches when PRs are opened on GitHub. This way you can test the dashboard live while reviewing a PR without downloading your collaborator's branch and running it locally. Set up your repo accordingly and create at least one PR that triggers an auto-deployment. Link this PR in canvas-submission.html.