Giter VIP home page Giter VIP logo

orie-4741-proj's People

Contributors

ay1man2 avatar jdf254 avatar sc2284 avatar

Watchers

 avatar

orie-4741-proj's Issues

Mid-term Peer Review

This project analyzes the US police shootings data and tries to figure out whether demographics and location affect the number of shootings as well as the trend of the frequency of police shootings. The midterm report provided descriptive statistics and histograms by grouping the shootings by states, race of victims and age of victims. The report also included a preliminary polynomial regression analysis using year as the feature space and the cumulative number of shootings as the output space. After that, further steps are discussed.
3 things I like:

  1. Missing data is handled in a reasonable way. Deleting columns with mostly missing information is a good way to get a meaningful model without having to delete a large number of observations.
  2. The presentation of the report is clear and follows strong logic for analysis.
  3. The topic is meaningful for the society.

3 areas of improvement:

  1. It is mentioned in the report that unknown data exists in many columns. There may be better way to deal with these data other than deleting them such as imputation.
  2. It is mentioned that different models will be experimented with, and it is worth considering how to do comparison among them.
  3. It would be better to illustrate more about how you plan to use non-fatal shooting data, e.g. for classification?

Midterm Peer Review

The project predicts the number of shootings in the US per year based on a dataset from VICE that includes demographical features of shooting from 2010-2016.

Things I like:

  1. The table that displayed the features and their types makes it very clear what dataset we are looking at.
  2. They also include a very in-detailed description of how they clean the data, such as what columns have the most missing data.
  3. They have thought of many things to work on in the rest of the semester and they seems reasonable to do.

Things to improve:

  1. I do not think they have mentioned how to avoid overfitting/underfitting.
  2. For the model they now have, there are only 7 data points in total. It might be too few to get a valid prediction.
  3. It would be great to see analyzing other features such as race and gender of the officer since your dataset includes a lot other information.

Peer Review

This project is about the fatal police shootings that have occurred in America. This group has a dataset containing information on shootings that have taken place from 2015 to 2020. The goal is to learn the data such that they can find how a person’s demographics affects their chances of being fatally shot, as well as to analyze the frequency of fatal shootings over the years. The dataset includes features such as the date of the shooting, the location, an indicator of whether or not the victims had a weapon, etc.

This project proposal is very straight to the point which I liked. I also like their reasoning for this project and the thought that this analysis could be used to help regulating the police force and shine light on the underlying racial issues within the police. Thirdly, I think its great that the dataset being used not only contains numbers but also nominal data such as descriptions of how the victims were shot.

This is definitely some area for improvement. The proposal is not too lengthy so I do think there could be some greater detail included in the proposal. I also think the question posed is slightly limiting and that maybe the group should be more open to other analyses. Additionally, I would like to see some more reasoning behind how this model can be used and applied to the real world to minimize violence.

peer review [dwc236]

Peer Review

dwc236

This project seeks to understand who is likely to be shot by the police based on demographic information and the likelihood of a shooting occurring on geospatial and temporal information. They seek to use a dataset of police shootings over the past five years with descriptive information about each shooting. The authors explain that such a project could be helpful for advising police forces on decisions to change use of force procedures and training to combat bias and illegal use of force.

I like that the authors know the dataset well and explain each of the variables. I think another positive of this project is that it is clear the authors have thought through how it will be directly applicable to different police departments, and show that the issue is important. In addition, I like that the problem question is direct and clear, and that there are multiple parts of the issue that are discussed, showing that the problem is deep enough to be investigated by a four-person team.

However, I am worried that the dataset is not rich enough. There is not string information about other details of the dataset, such as the police report or any news coverage about it. In addition, I wonder how the authors will use the geospatial information to complete their analysis - this could be coded by zip code, but I'm unsure how that would aid their analysis. Maybe they could use a different variable that encodes location by income, or some other proxy variables. Finally, I'm unsure exactly how answering the questions links to the information that would be given to stakeholders, i.e. the police departments. Simply knowing the demographic information of individuals who are shot does not necessarily demonstrate much about bias, as there are many intervening factors.

Midterm peer review

This project explores the fatality of shootings in the United States, given features such as the race of the officer and the subject, the city, the number of stops, etc.

Things I like about the project

  • The project is particularly impactful, especially when assessing the nature of police brutality in these times. If the team mangers to get a decent analysis out of this project, this can definitely be meaningful.
  • A valid train/test methodology has been adopted
  • The team made a point to fit multiple models to test

Points for improvement

  • Dropping 'important' fields like number of shots might not be the best idea. Maybe replace it with the average, or interpolate the data in an unsupervised setting. I imagine such fields are crucial to determining the fatality of an encounter.
  • Some more explanation is needed to accompany the visualizations
  • It would be helpful to explain the interpretation of the regression coefficients

Final Review - ajs692

The authors have a beautifully written report that clearly expresses the problem and the methods to solve. They thoroughly explained thought process behind each of the methods used. Additionally, they had convincing visualizations that further enhanced their reasoning and discussion of results.

I was surprised that “Number of Officers” was not included in the model, as I would argue that more officers could play a role in either escalating or deescalating a situation. I would have liked to see location (and resulting demographics) included in the analysis, but the authors did say that in the future this is an aspect they would like to include. Race of the officer could also be another area of importance, and would be another factor to include.

This project could be very useful with further data and analysis to show more evidence of racism and help reduce fatal shootings, as discussed in their conclusion. However, as it stands, the models were not able to predict the results of a police encounter with high enough accuracy to be able to be used by others immediately. That being said, the model itself is promising!

peer review

The project is about analyzing police shootings and how they relate with certain demographs. They are using a dataset containing information from the past 5 years. Their objective is to see how the frequency of police shootings has changed over the past 5 years and if someones chances of being shot are increased based on their demographics. I like that this project is tackling a very prevalent issue in our society. Additionally, they are using very recent data but also going back the past 5 years which is good as it will provide a very good sense of changes and trends. Something I like about the proposal is how they mentioned the features that they will be using - race, date, gender since we there probably is a big correlation between those features. Areas of improvement would be adding more detail (I think the proposal was not completed correctly - the second paragraph just stops midsentence). Also, maybe see how shootings change with the demograph of the officer/popularity of the NRA in that state. These are obviously not included in the dataset, but it would be interesting to consider merging two different datasets and then performing a bigger analysis. Another area to consider or be careful of is to not tune parameters in a way that is biased. We know from personal experience certain demographs have higher chances, but making sure we don't change hyperparameters to fit our model rather than letting the model show us is something to be careful of.

Final Peer Review - ja497

The project is about predicting whether a police shooting is fatal or not. The dataset used is gathered by a newspaper agency, Vice News, and contains background on shootings between 2010 - 2016, including information about the victim and police officers’ backgrounds, nature, as well as location of each shooting. A side goal of this project is also to verify if the victims’ race can be used as a significant feature to predict the fatality of police shootings, consistent with the claims popular on social media.

What I like about the project:

  1. The visualization section is rather helpful in looking at the overall distribution of the dataset. We can see that number of police shootings remain somewhat consistent throughout the years, hence despite the dataset’s long coverage (2010 - 2016), the data remains rather consistent.
  2. I particularly like the future works section; the authors raise a few suggestions that could be potentially useful, such as incorporating location information as well as further expanding their dataset. Throughout the duration of their dataset (2010 - 2016), certain locations in the US have seen improvements in their police department, while others have deteriorated, so fitting a single model across the entire country might lead to lots of inaccuracies.
  3. Overall, your ideas and logic were quite well-explained, and easy to follow throughout the report. The question itself is rather interesting, and your final model’s misclassification rate of 0.3372 seems to be rather promising.

Avenues for future improvement:

  1. You mentioned that the goal of the project is to determine which features (after preprocessing) are the best predictors of whether a police shooting is fatal or not. However, there seems to be a bias in your approach, since you start out by already manually narrowing down your choice of features to a select few. I was wondering if there was any justification for your choice of features; as we covered in the lecture on EBMs, it might be more useful to include more features such as police officer race, nature of the stop etc., since eliminating these features would make the explanation power be distributed to other correlated features. For instance, if fatal police shootings are more likely to happen when the police and victim are of different races, using only the subject’s information might lead to misinterpretations of coefficients in your model.
  2. There might be some room for exploring your selected models more; for your perceptron and polynomial models, you seem to be only fitting only a single variable vs. the predicted variable. It might be preferable to do some form of feature selection given the number of features you have; while you acknowledged that the data plotted on single features isn’t linearly separable, if you included more features, the data is more likely to be linearly separable at higher dimensional spaces. An alternative could also be to use a soft margin SVM as covered in lecture. Additionally, it might be nice if you explained more on how you determined that your models overfit; the 7th order polynomial model might not necessarily overfit. Perhaps checking the difference between train and test accuracies might be a better benchmark.
  3. It seems that you removed quite a lot of instances from the dataset in your data cleaning portion, from 4400 rows down to 1028 rows. That’s more than 76% of the data — it might be useful to check if these removed data have anything in common, as there might be some biased factor that consistently interrupts the data collection.
  4. Lastly, instead of simply looking at the misclassification rate, it might be more useful to look at the weighted accuracy — based on your data visualization section, it seems that the race of shooting victims is rather imbalanced, so balancing the misclassification rate might prove useful.

Midterm Peer Review

Summary of the project:
The project aims to explore if there exists a relation between the number of shootings and the span of years in the US. Descriptive statistics and histograms detailing key demographics are also provided. The dataset they are using is the VICE police shootings database.

Things I like:

  • Descriptive statistics are comprehensive and histograms allow for the reader to better understand the data.
  • The dataset is very complete and comprehensive, and includes data on both fatal and non-fatal shootings.
  • The project is very relevant to current social issues in the United States, and could potentially benefit society.

Areas for improvement:

  • It is stated that the columns “number of shots” and “nature of stop” were dropped due to high prevalence of missing values. However, features like “number of shots” could be good predictors for fatality. Rather than dropping this column, other alternatives could be considered, such as using a mean instead.
  • I believe that the report could be strengthened if the coefficients of the different polynomial fits were reported and interpreted.
  • It is mentioned that you will experiment with higher order polynomial fits. How will you avoid overfitting?

Peer review

This project addresses the link between racial bias and police brutality in the United States by looking at victim data from police shootings over the last couple of years. They are using a large dataset from Kaggle which includes data on police shootings across the country. They aim to find a link between victim demographics or shooting location and the chance of being shot, which may help highlight the current issues surrounding police departments in the US.

I like how relevant and important this problem is, as police brutality is a human rights issue that has recently come into the public eye in the United States. Throughout the pandemic, the news has been covering protests around the country that address this issue, and through your analysis, an analytical view on the issue can be brought to light. I also like how you chose to look at victim data, as it will very clearly show if a certain demographic features more in police shootings than another. Tying all this together, I thought that your questions were very clearly defined, helping the overall significance of the project together.

In terms of improvement, there were several things that I found. To help address the problem, you might want to look at police department data and relate that to victim demographics, as I suspect that the demographics and sentiment within a department have an effect on the number of shootings that result and who they tend to shoot. Additionally, you want to see if an individual’s demographics affect their chances of being shot by police, however you don’t mention looking at the data in terms of a population. If there is a majority demographic in a certain geographical area, it may appear that the majority is more likely to be shot, but that won’t tell you much about whether that demographic affects the chance of being shot. To address this, you may want to look at population demographics and find some way of standardizing your victim demographics to them. Finally, your “dataset” section appeared to be cutoff, so completing it will improve the project proposal as a document.

Midterm Peer Review

Summary

The goal of the project is to predict the number of shootings in a year and what might affect it.

What I liked

  1. You did a good job preprocessing the data from its original form to a usable state.
  2. You are trying to tackle an important issue, and the choice to find a new dataset with both fatal and nonfatal incidents makes a lot of sense.
  3. I liked how you mentioned shuffling/randomizing your training/testing samples, which is a key part of getting accurate, representative metrics.

Areas of Improvement

  1. Could have explained more on why you chose to fit using polynomial regression as opposed to other models. Would have been nice to see some other metrics of accuracy to more easily compare models as well.
  2. It seems you mentioned splitting data into training and validation sets, but forgot about a testing set.
  3. Your data set seems quite small and prone to overfitting - it’s probably a better idea to impute than drop the missing values in order for you to retain enough data to make decent predictions.
  4. Unsure how valuable it is to try to predict the number of shootings in a time frame. It seems like your choice of dataset would point to trying to classify whether an incident is fatal or nonfatal based on demographic, location, subject info, officer info, and other data instead.

Final Peer Review

Summary

This group looked at data from police shootings from 2010-2016 to predict whether a specific police shooting would be fatal. They were primarily interested in whether race had a strong effect on the severity of police shooting as they were doing the analysis in the context of police brutality and the Black Lives Matter movement.

Positives

  1. It's clear you have a really good understanding of your data. You applied some domain expertise when interpreting the features available.
  2. I liked that you first tried the perceptron method to determine that your data isn't simply linearly separable. With this knowledge, you then confidently fit a polynomial model and other more complex models.
  3. I like that you included a Future Improvements section where you acknowledge some of the limitations of your models and data.

Points of Improvement

  1. Whereas you went into a lot of detail on your perceptron and polynomial models, you then when into a laundry list of models in the Model Selection section. Perhaps you can just highlight a few (2-3) such models and describe them in more depth.
  2. Overall, you spent a lot more time talking about your data (which is good) but I think your models section could be fleshed out more to show understanding of what these models actually do.
  3. There is evidently a lot of detail in this paper, but it would be helpful if you split up your longer paragraphs so it is easier to read and follow. Weapons of Math Destruction would've been helpful to separate into another paragraph with a section header since I almost missed it.

Overall, super interesting paper and very relevant in these times! Great work!

Final Peer Review - yz2772

This project is aiming to find out the most important factors that determine whether a police shooting is fatal or not. The group used data set made by Vice News which covers national wide fatal and non-fatal police shooting cases from year 2010-2016.

Things I like

  1. The group really understand their data and the data cleaning and processing parts are well written.
  2. I like how the group included perceptron and polynomial models and justified why these models wouldn't work for their dataset or why these models are not an ideal choice. People usually just wrote what worked in the end, but this group also included what didn't work for them, which can provide interesting and valuable insight into their data analysis.
  3. I really like how they draw a connection between their prediction results and real world data. The group compared their results to those from other studies and gave their interpretation and explanation on results from other real world police shooting data.

Things can be improved

  1. I am not very convinced on the group's choice of simply dropping some of the columns with too many missing entries, because I feel like some of those entries are pretty significant in determine the police shooting fatality, such as number of officers, officer's race, and city could definitely contribute to whether a shooting is fatal or not.
  2. It seems that after data cleaning and pre-processing there were only 1028 data points left, which is a really small amount. Instead of doing a 80/20 split on the remaining data, I think it would be better if this group did a cross validation on all of their models due to having a really small dataset.
  3. Besides fitting linear models with regularizations, I think maybe the group could try fitting more complex models such as random forest or EBM or some other more interpretable models which will help them identify the most important features.

Overall a good project with interesting results and huge potential in real world applications. Good job guys!

Final Peer Review - rl447

Summary:

This project aims to apply three different data analysis models and techniques to police shooting data scraped from public records in order to identify if certain demographic and situational features can give insight into the extent and nature of racial disparity. This can greatly benefit society, particularly after 4.6 million Americans protested the racially motivated violence behind police brutality, and enable citizens to increase their awareness of primary factors that drive institutional racism and are particularly prominent in the context of police shootings in the United States. The dataset consisted of 4,400 police shootings from 2010-2016 with descriptive data about the victims, shooters, and situations behind the shootings.

Things I Like:

  1. I was extremely interested in the insights you gained from the data visualizations and the larger trends of police shootings that you were able to tease out from the histograms on different demographic and geographical features.
  2. I think you handled the noise and messiness of your data well, after reading how much you had to preprocess and clean your data is clear that this dataset included several nominal features and was not very easy to gather significant conclusions about (though you were still able to get a .3372 misclassification rate)
  3. In the future improvements section, I liked that this group thought so deeply about the potential causes of the limitations of the project, and what sorts of additional data could be gathered going forward in order to extract more robust conclusions from police shooting data.

Areas for Improvement:

  1. I see that you decided that it would be appropriate to select and only look at the top 20 most common car model types. I was wondering what it may look like with another number of car models.
  2. If the city of shootings proved to be an interesting and somewhat relevant feature in the initial data, it may be worth figuring out a way to geographically encode the location of the shooting, maybe with a graph network of some sorts, in order to cluster shootings in certain areas that tend to have more racially-charged shootings and potentially gain insight from the location.
  3. You acknowledged that EBM models are equipped to handle bias in predictions. Given that the EBM model graphs the importance of individual features on a continuous scale, this may have allowed you guys to get a more nuanced look at the effect of each individual feature since any spikes, as Rich mentioned, are almost always due to human/social/policy effects, and your data is strongly related to these effects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.