Giter VIP home page Giter VIP logo

facebook-multi-scraper's Introduction

Facebook Scraper For Multiple Pages

Scrape multiple public Facebook pages en masse to help yield social analytics. Automatically fetch more detailed insights for owned pages' posts and videos to see what content strikes well. Go all the way back in time or specify two dates and watch the goodness happen at lightning speed via multi-threading!

Distinguishing Features

  • Multi-threaded for rapid data collection from multiple pages simultaneously (as many as you want!)
  • Collect detailed performance metrics on multiple owned business pages automatically via the insights and video insights endpoints
  • Retrieve number of public shares of a link by anyone across Facebook via the URL Object
  • Custom metrics computed:
    • Impression Rate Non-Likers (%): explore virality of your posts outside your typical audience
    • Engagement Rate: (Shares + Reactions + Comments) / Total Unique Impressions
    • Adjusted Engagement Rate (%) and Adjusted CTR (%): normalise rates across pages of different audience sizes and account for uncertainty in small numbers i.e. 5/10 CTR < 100/200 CTR as detailed by Evan Miller
  • Proper timezone handling

Sample Output

What can be collected from public page posts?

Post ID, Publish Date, Post Type, Headline, Shares, Reactions, Comments, Caption, Link

... and optionally with a performance cost:
Public Shares, Likes, Loves, Wows, Hahas, Sads, Angrys

What is additionally collected from owned page posts?

Posts

Video Views, Unique Impressions, Impression Rate Non-Likers (%), Unique Link Clicks, CTR (%), Adjusted CTR (%), Engagement Rate (%), Adjusted Engagement Rate (%), Hide Rate (%), Hide Clicks, Hide All Clicks, Paid Unique Impressions, Organic Unique Impressions

Videos

Live Video, Crossposted Video, 3s Views, 10s Views, Complete Views, Total Paid Views, 10s/3s Views (%), Complete/3s Views (%), Impressions, Impression Rate Non-Likers (%), Avg View Time

Setup

1) Add the page names you want to scrape inside PAGE_IDS_TO_SCRAPE

Grab the @'handles' or in url (e.g. 'vicenews' below).

2) Grab your own temporary user token here and place inside OWNED_PAGES_TOKENS:
Get Token -> Get User Token -> Get Access Token

OWNED_PAGES_TOKENS is the dictionary that stores the token(s) necessary to scrape public data. If the token is a permanent token for a business page, it is used to scrape private data provided that the page is placed in PAGE_IDS_TO_SCRAPE and its corresponding key is identically named in this dictionary.

3) Install python dependencies with pip install requests scipy pandas

N.B OSX users should have installed Homebrew and python with brew install python

Execution

Specify number of days back from present:

python get_fb_data.py post 5 (Public & owned pages)
python get_fb_data.py video 5 (Owned pages only for video-specific data)

Specify two dates (inclusive) in yyyy-mm-dd format:

python get_fb_data.py post yyyy-mm-dd yyyy-mm-dd
python get_fb_data.py video yyyy-mm-dd yyyy-mm-dd

The csv file is placed in the facebook_output folder by default

Credit

Thanks to minimaxir and his project for showing me the ropes

FYI

Additional social_elastic.py used to scrape data and push to Elastic instance(s) via their bulk api

facebook-multi-scraper's People

Contributors

jpryda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

facebook-multi-scraper's Issues

Pages Public Content Access requires either app secret proof or an app token

Thank you very much for your work, I feel like it used to work, but as I enter my User Token in it, I get the response:

{u'message': u'(#100) Pages Public Content Access requires either app secret proof or an app token', u'code': 100, u'type': u'OAuthException', u'fbtrace_id': u'AuwdgxokRLwtt9dVKOsH06x'}

Well, when I plug in the App Token, the response follows:

{u'message': u"(#10) This endpoint requires the 'manage_pages' permission or the 'Page Public Content Access' feature. Refer to https://developers.facebook.com/docs/apps/review/login-permissions#manage-pages and https://developers.facebook.com/docs/apps/review/feature#reference-PAGES_ACCESS for details.", u'code': 10, u'type': u'OAuthException', u'fbtrace_id': u'ARh7FmUS7tq5k9sl2Vh4vU4'}

... which basically means that Facebook does not allow you to scrap posts of public pages you don't own? Really?

Or am I misusing it? If so, can you please help me figure this out? Thank you.

execution

OWNED_PAGES_TOKENS = {
'jpryda': os.environ['MY_TOKEN'], # Token as an environmental variable: export MY_TOKEN = 'abc-my-token'
# 'MyPage1': 'my-hardcoded-token' # Hardcoded token
}
//TO execute do we have to write 'jpryda' in the 2nd line or something else?

DOUBT ! What must be done here ?

OWNED_PAGES_TOKENS = {
'jpryda': os.environ['MY_TOKEN'], # Token as an environmental variable: export MY_TOKEN = 'abc-my-token'
# 'MyPage1': 'my-hardcoded-token' # Hardcoded token
}

this code is raising the following error:
File "C:\Users\Anaconda3\lib\os.py", line 678, in getitem
raise KeyError(key) from None

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.