Giter VIP home page Giter VIP logo

fbcomments-test's Introduction

Prerequisites

  • Debian like OS (tested on Ubuntu 15.10)
  • Python 2.7
  • virtualenvwrapper 4.5.1
  • FB Graph API version 2.5
  • Internet connection
$ source virtualenvwrapper.sh

Project Setup

$ mkvirtualenv fbcomments-test
$ git clone [email protected]:twil/fbcomments-test.git
$ cd fbcomments-test
./fbcomments-test$ workon fbcomments-test
(fbcomments-test)./fbcomments-test$ pip install -r requirements.txt

The Basic Thoughts

  1. FB requests are pretty simple - we can use requests library.
  2. Cursor-based Pagination. Time-based and Offset-based pagination doesn't work with /comments edge! (https://developers.facebook.com/tools/explorer/)
  3. Batch Requests? might be we can fabricate paged URLs with offset and since?
  4. Error procession. If response (JSON) has error property then request failed.
  5. Rate limiting. App Level Throttling: 200 calls/person/hour (Error Code 4).
  6. Don't request unneeded fields (we need only created_time to calculate the frequency of comments).
  7. Timeouts.
  8. Use multiprocessing
  9. Use Google Charts
  10. pandas has a very neat way for calculating needed frequencies Series.resample('3Min', how='sum', label='right'). But how to parallelize the whole thing?
  11. The last index after Series.resample() of one page will be the same as the first one of the next page. In this case we need to sum these values to merge two sequences. If per chance the first index of the second page is different then we need to concatenate two sequences and that's it!
  12. How to parallelize? We get pages from FB sequentially. We can only parallelize procession of received data:(
  13. FB docs are not so good - order on the /comments edge can be chronological and reverse_chronological. This means we can "eat" comments from two sides in parallel!

So we have ~52k comments for the given post (10151775534413086) and 200 requests per hour per user and a limit of 5k comments in a single request (it might be less?).

We need 11 requests to procession the data.

52k comments in 5 minute buckets are 257k timestamps! We can drop NA values. That'll give ~4k values.

Zero Approximation

  1. Get an Access Token somehow (out of scope at this moment)
  2. Get all the comments timestamps using Cursor-based pagination with 10k limit and selecting only created_time field
  3. Calculate the frequencies for 5 min intervals
  4. Create a report folder
  5. Save data data.js
  6. Copy template report.html into the report folder

Testing

Tests are written in tests.py. To run a test suite issue:

(fbcomments-test)./fbcomments-test$ nosetests

Errors Procession

TODO:

Codes to wait and retry:

  • 1 - API Unknown. Retry and forget if not successful.
  • 2 - API Service.
  • 4 - API Too Many Calls. Examine your API request volume?
  • 17 - API User Too Many Calls. Examine your API request volume?
  • 341 - Application limit reached. Examine your API request volume?

How To Get Access Token

With JS

In Chrome Dev Tools (app is configured for test.domain domain)

// test.domain
window.fbAsyncInit = function() {
  FB.init({
    appId      : '645041415635369',
    xfbml      : true,
    version    : 'v2.5'
  });
};

(function(d, s, id){
 var js, fjs = d.getElementsByTagName(s)[0];
 if (d.getElementById(id)) {return;}
 js = d.createElement(s); js.id = id;
 js.src = "//connect.facebook.net/en_US/sdk.js";
 fjs.parentNode.insertBefore(js, fjs);
}(document, 'script', 'facebook-jssdk'));

FB.login(function(){}, {scope: ''});

FB.getAuthResponse();

Desktop (without JS)

https://www.facebook.com/dialog/oauth?client_id=645041415635369&redirect_uri=https://www.facebook.com/connect/login_success.html&response_type=token

After confirmation of permissions you'll be redirected to a new URL with access_token in it.

Device (a.k.a. TV)

https://developers.facebook.com/docs/facebook-login/for-devices

Server-to-Server Request

Somehow with an App Secret or Client Token.

Check The limit For The /comments Edge of The /post Node

FB.api(
  '/10151775534413086/comments',
  'GET',
  {"fields":"created_time","limit":"100000","pretty":"0","summary":"1","filter":"stream"},
  function(response) {
      console.log(response.data.length);
  }
);

This will give us 5000

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.