Giter VIP home page Giter VIP logo

covid19-attitude's Introduction

JSC270_A4

Final project for JSC270 Winter 2021

Author: Rui Miao(Amy), Xinpeng Shan(Joy), Shiyuan Zhou(Eric)

File descriptions:

JSC270_A4_Report.pdf: The report of A4

JSC270_A4Q1.ipynb: the colab notebook including all the codes for question 1

JSC270_A4Q2.ipynb: the colab notebook including all the codes for question 2

JSC270_A4Q2_presentation_slide.pdf: the slides for the presentation of question 2

JSC270_A4Q2_dataset_countryplot_neg.csv: dataset to plot the pie chart (The proportion of locations of people who support #maskoff) in A4Q2

JSC270_A4Q2_dataset_countryplot_pos.csv: dataset to plot the pie chart (The proportion of locations of people who support #maskon) in A4Q2

JSC270_A4Q2_dataset_mask.csv: dataset for the mask model in A4Q2

JSC270_A4Q2_dataset_vaccine.csv: dataset for the vaccine model in A4Q2

question_2_proportion.csv: dataset for calculating the proportion of people who supports wearing mask

Evaluation Comments

Q1

  • Careful about using the max_features attribute in the CountVectorizer. This restricts your vocabulary. This is fine as a hyperparameter to tune, but the true vocabulary without constraint should be above 50,000. This will affect your accuracies later in the question, but you'll only lose a mark once.
  • Instead of fitting a separate count vectorizer for each class, you could just use np.unique on your data directly to get token counts.
  • The top 5 words for each class should be mostly covid related, with counts in the high thousands. Your top 5 seem to have a lower count. You may have some encoding problems that are affecting this.
  • Note that ROC Curves are designed to measure TP and FP for binary classification. In this case a single ROC curve has no meaning; instead Sklearn will fit multiple pairwise curves and plot them together, or average them.

Q2

  • Great motivation of the problem and statement of questions
  • It's okay if your study is limited. You do a great job of addressing possible pitfalls in your data before analyzing it
  • Please don't include code in your report (only text and visuals)
  • Are there any related works predicting tags generally (not necessarily COVID-related)?
  • Excellent data description
  • Which stopword list did you use, and how many stopwords did you remove?
  • Are there any interesting patterns in your dataset? What are some descriptive statistics in terms of length, other tags, attitude, etc..?
  • NLP is not itself a machine learning model. It is a collection of techniques for processing text as data (which may or may not use ML)
  • What is the balance in attitudes, and how does it compare to your test accuracy?
  • Precision and Recall would be useful metrics here
  • Do your results change with a different train/test split?
  • What can you conclude from this experiment?
  • Excellent work

covid19-attitude's People

Contributors

sean-1005 avatar xinpeng13 avatar 12mre1 avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.