Giter VIP home page Giter VIP logo

2018-big-data-contest's Introduction

2018-Yonsei Bigdata Analysis Competition

Team Aronamin-ing

**1. Data Info **

We were provided a year SNS crawling data which were mainly about vitamin reviews SNS involved Facebook, Instagram, Naver Blog, Naver Cafe, Daum Blog, Daum Cafe, and Youtube

Data Count per Brands

- Aronamin Gold : 3,238 | Aronamin C+ : 1,118건 | Urusa : 12,330 | Impactamin : 3,099 | Centrum : 11,359

- Vitamins : 29,736, Nutrient Supplements : 56,959

Data Sample

SNS Title Contents Address Date
naver_blog 찬찬약사.. 아로나민 골드, 아로나민 씨플러스 많이 들어보셨죠?? ... https://blog.naver.com/... 2017####

2. Preprocessing

We removed Youtube because the data only contained URL and title of video, which are insufficient to analyze solid reviews.

We put the most priority on the subjectivity of reviews. We classified reviews based on subjectivity of contents.

1) Official Advertisement
   Contents that the compnay voluntarily advertise their products through Facebook or official blogs
   
2) Unofficial Advertisement
   Contents that certain customer or middleman to advertise
   Removed ambiguous ones by visiting direct URL and reviewing real contents
   ex) 'Hot-deal', 'Coupang', 'Group purchase'
   
3) Duplicates
   Instagram Repost or Twitter Retweet

2-1. Simple analysis of brand impages after preprocessing We summarized the analysis below (in Korean) : https://docs.google.com/document/d/1j9fo8MiQO1yc5b-gLkDuwctuijfbbv4xHG0MZNV4y2g/edit?usp=sharing

3. Analysis

  1. Frequency analysis

1-1) Wordcloud: visualize based on simple token counts

1-2) TFIDF

  1. Association Rules

2-1) Apriori Association Rules

   utilized _nims jupyter_ and _Comoran Tokenizer_ on preprocessed data to tokenize nouns
   
   held minimum threshold of _support_ as 0.05
   
   held minimum thresholdl of _lift_ as 0.08
  1. Similarity Analysis

3-1) word2vec: word embedding

   analyzed similarity between keywords of product name, important features (ex) side effects, smells) and advertisement actors
  1. Sentiment Analysis

4-1) KOSAC sentiment dictionary

   utilized _Polarity_ dictionary of KOSAC sentiment dictionary

4. Results

4-1. Identify brand images and brand awareness

produced price score and image score
if several brands are correlated each other, we grouped them and analyzed differences

4-2. Identify customers' reactions on past advertisements

extracted the data that are related to advertisements 
conducted similarity analysis by using word2vec on several keywords

4-3. Strategy of customization

Categorized prospective customer into 6 groups (ex) Kids, Pregnant women, Students...)

There were differences of highly-related nutrients between groups, which are important factors when selecting products.
Also, we examined familiar keywords of each group such as multivitamin and general vitamins

Based on these results, we suggested marketing strategies to a target company

2018-big-data-contest's People

Contributors

dial0116 avatar ehgusqkr avatar myevertime avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

dial0116

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.