Giter VIP home page Giter VIP logo

fb_reactions_crawler's Introduction

Reaction Prediction

This project aims to predict the most frequently used facebook reaction for a given text.

0. Installation

a. Prerequisites

Virtualenv can be used to simply the process.

  1. python3

b. Setup

  1. Clone the repository
  2. Run python3 stup.py

1. Example Usage

Imagine you want to train your Model with facebook posts from CNN. This is the standard procedure you would do:

  1. Find the id of the page you want to crawl. The fastest way to retrieve a page id is https://findmyfbid.com/. (e.G. For CNN it is 5550296508.)
  2. Get yourself a facebook graph API access token using the graph API explorer https://developers.facebook.com/tools/explorer/.
  3. Or crawl pages. (dont forget to create /data/datasets folder)
    • python3 crawlpagepreferences.py YOUR_FB_ACCESS_TOKEN -c 500
  4. Crawling posts.
    • python3 crawl.py -i 5550296508 YOUR_FB_ACCESS_TOKEN
  5. Filter the crawled data using the filter script.
    • python3 filter.py cnn.json
  6. Normalize the filtered data using the normalize script.
    • python3 normalize.py cnn_filtered.json
  7. Train the model using the train script.
    • python3 train.py cnn_filtered_normalized.json
  8. Question the trained model using the requestmodel script.
    • python3 requestmodel.py "Your newest FB post!"

2. Documentation

a. Crawling Data

The usage of the script crawlpagepreferences.py is as follows:

usage: crawlpagepreferences.py [-h] [-c, --count page_count]
                               [-l, --limit rate_limit] [-e, --erase erase]
                               [-f, --file FILE] [-sp, --specific SPECIFIC]
                               [-s, --skip] [-nj, --nojoy] [-v, --value value]
                               access_token

Crawl facebook and represent page preferences.

positional arguments:
  access_token          a facebook access token

optional arguments:
  -h, --help            show this help message and exit
  -c, --count page_count
                        amount of page to be fetched
  -l, --limit rate_limit
                        limit of API requests per hour
  -e, --erase erase     overwrite existing files
  -f, --file FILE       a json file [{"id": xxxx, "name": "page_name"}, ...]
  -sp, --specific SPECIFIC
                        only crawl specific pages from category list
  -s, --skip            skip steps 0-4 (Crawling datas)
  -nj, --nojoy          show or not joy reaction
  -v, --value value     how many difference between main and other reactions
                        in percent

You can also provide a file in order to only plot preferences figures. To only crawl specific pages from category list, please provide a list of category (one by line) in a external file.

The usage of the script crawl.py is as follows:

usage: crawl.py [-h] [-c, --count post_count] [-l, --limit rate_limit]
                (-f, --file FILE | -i, --id PAGE_ID)
                access_token

Crawl facebook reactions from pages.

positional arguments:
access_token              a facebook access token

optional arguments:
-h, --help                show this help message and exit
-c, --count post_count    amount of posts to be fetched from each page
-l, --limit rate_limit    limit of API requests per hour
-f, --file FILE           a json file [{"id": xxxx, "name": "page_name"}]
-i, --id PAGE_ID          a facebook page id

You have to provide your Facebook access token, as well as either a page id or a file, containing ids.

If you choose to provide a file, e.g. for crawling multiple pages at once, use the following schema:

[{
    "id": 12345678,
    "name": "a facebook page"
},{
    "id": 87654321,
    "name": "another facebook page"
}]

The output is written to a file for each provided page individually.

b. Filtering Data

The usage of the script filter.py is as follows:

usage: filter.py [-h] [-u, --filter-urls filter_urls]
                [-c, --min-char min_char] [-r, --min-reactions min_reactions]
                [-g, --reaction-gap reaction_gap]
                filename

Filter crawled facebook reactions.

positional arguments:
filename                             a crawled json file

optional arguments:
-h, --help                           show this help message and exit
-u, --filter-urls filter_urls        whether to filter URLs
-c, --min-char min_char              a minimal character count
-r, --min-reactions min_reactions    a minimal reaction count
-g, --reaction-gap reaction_gap      a percentage value the dominant reaction has 
                                        to be above the secondary reaction

c. Normalizing Data

The usage of the script normalize.py is as follows:

usage: normalize.py [-h] filename

Normalize crawled and filtered facebook reactions.

positional arguments:
filename      a filtered json file

optional arguments:
-h, --help    show this help message and exit

d. Training a Model

The usage of the script train.py is as follows:

usage: train.py [-h] filename

Train model based on normalized facebook reactions.

positional arguments:
filename      a normalized json file

optional arguments:
-h, --help    show this help message and exit

d. Question the Model

The usage of the script requestmodel.py is as follows:

usage: requestmodel.py [-h]

Load a trained model and place requests.

optional arguments:
-h, --help    show this help message and exit

3. FAQ

How do I get a Facebook access token?

Go to the Graph API Explorer and request an access token with your Facebook user on the top right corner.

How do I get a page id?

Go to Facebook and navigate to the desired page. Now open the source code (e.g. ctrl + u) and search for "uid":. You just found your page id!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.