Giter VIP home page Giter VIP logo

facebook-page-post-scraper's Introduction

Facebook Page Post Scraper

A tool for gathering all the posts of a Facebook Page (or Open Facebook Group) and related metadata, including post message, post links, and counts of each reaction on the post. All this data is exported as a CSV, able to be imported into any data analysis program like Excel.

The purpose of the script is to gather Facebook data for semantic analysis, which is greatly helped by the presence of high-quality Reaction data. Here's quick examples of a potential Facebook Reaction data visualization using data from CNN's Facebook page:

Usage

The Page data scraper is implemented as a Python 2.7 script in get_fb_posts_fb_page.py; fill in the App ID and App Secret of a Facebook app you control (I strongly recommend creating an app just for this purpose) and the Page ID of the Facebook Page you want to scrape at the beginning of the file. Then run the script.

Example CSVs for CNN, NYTimes, and BuzzFeed data are not included in this repository due to size, but you can download CNN data here [2.7MB ZIP], NYTimes data here [4.9MB ZIP], and BuzzFeed data here [2.1MB ZIP].

To get data from an Open Group, use the get_fb_posts_fb_group.py script with the App ID and App Secret filled in the same way. However, the group_id is a numeric ID: to get the ID, do a View Source on the Group Page, search for "entity_id", and use the number to the right of that field. For example, the group_id of Hackathon Hackers is 759985267390294.

You can download example data for Hackathon Hackers here [4.7MB ZIP]

Privacy

This scraper can only scrape public Facebook data which is available to anyone, even those who are not logged into Facebook. No personally-identifiable data is collected in the Page variant; the Group variant does collect the name of the author of the post, but that data is also public to non-logged-in users. Additionally, the script only uses officially-documented Facebook API endpoints without circumventing any rate-limits.

Note that this script, and any variant of this script, cannot be used to scrape data user profiles. (and the Facebook API specifically disallows this use case!)

Maintainer

For more information on how the script was originally created, see my blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis.

Credits

Peeter Tintis, whose fork of this repo implements code for finding separate reaction counts per this Stack Overflow answer.

License

MIT

If you do find this script useful, a link back to this repository would be appreciated. Thanks!

facebook-page-post-scraper's People

Contributors

minimaxir avatar

Watchers

Rasel Jabbar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.