Giter VIP home page Giter VIP logo

facebook-profile-pictures-downloader's Introduction

Mining into Facebook public profiles with Deep Learning

Applying Deep Learning to Facebook public information to extract interesting patterns

Nothing very precise yet. We're just going to have fun and build a big Facebook dataset in the short term!



How to use it?

Install the latest facebook-sdk.

cd /tmp/
git clone [email protected]:mobolic/facebook-sdk.git
cd facebook-sdk
sudo pip3 install .

Then clone this repository and follow the instructions below.

# For Python 3.x
git clone [email protected]:philipperemy/Facebook-Profile-Deep-Learning.git facebook-explorer
cd facebook-explorer
sudo pip3 install -r requirements.txt
cp credentials.json.example credentials.json
vim credentials.json # Get your Token ID here https://developers.facebook.com/tools/explorer/
python3 profile_miner.py 10 # to start mining facebook profiles. Here we use 10 threads to query Facebook.

Facebook Token ID

Manual update

Get your Facebook Token ID here and load it into your credentials.json file. https://developers.facebook.com/tools/explorer/

Automatic update (much more useful)

Before using the automatic updates, make sure that it worked at least one time with the manual procedure (just above). Browse on https://developers.facebook.com/tools/explorer/ and request a Token ID. This part relies on web scraping. If everything is not correctly set up beforehand, it is very likely to fail.

Once it's done, let's start this server that will automatically ask Facebook servers for a new token. The main script profile_miner.py auto detects when the token expires. When this happens, a call is made to the server started by auto_token_generator.py.

Start the server with this command:

export [email protected] FB_PASS='i_love_apple';python3 auto_token_generator.py

Where FB_EMAIL is your Facebook email address and FB_PASS is your Facebook password. I advise you to create a specific Facebook account just for those tasks.

You can check if the server is responding by running this command:

curl http://localhost:5000/

Or just connecting to http://localhost:5000/ from your favorite browser. Be patient, it can take up to one minute to query Facebook servers. The procedure is explicitly slow to avoid any bot detection.

Scan data

python3 scan_data.py
This scripts refreshes every 10 seconds.
--------------------------------------------------------------------------------
Number of Facebook descriptions : 15097 (+15097)
Number of Facebook images       : 15088 (+15088)
--------------------------------------------------------------------------------
Number of Facebook descriptions : 15104 (+7)
Number of Facebook images       : 15096 (+8)
--------------------------------------------------------------------------------
Number of Facebook descriptions : 15115 (+11)
Number of Facebook images       : 15107 (+11)

Example of a public profile (contained in ###.pkl where ### is the ID of the user. The ID is undisclosed here for privacy reasons):

{
 'first_name': 'Susan', 
 'updated_time': '2016-12-28T16:26:46+0000', 
  'last_name': 'Cothran', 
  'link': 'https://www.facebook.com/app_scoped_user_id/###/', 
  'name': 'Susan Cothran', 
  'id': '###'
}

The corresponding profile picture is located in ###.jpg.

Common errors

Sometimes the profile is there but it's not available in the Graph API. Most of the time, the profile is inactive and it's better to move on, rather than raising an exception that would block the script:

INFO:facebook-deep-learning:Unsupported get request. Object with ID '827435111' does not exist, cannot be loaded due to missing permissions, or does not support this operation. Please read the Graph API documentation at https://developers.facebook.com/docs/graph-api

The token is only valid for one hour. If you guys have a better way to extend the expiration date, I'll be happy to hear!

facebook.GraphAPIError: Error validating access token: Session has expired on Saturday, 08-Apr-17 23:00:00 PDT. The current time is Saturday, 08-Apr-17 23:01:30 PDT.

The GraphAPI has implemented user request limits. From my experience it's something like 10,000 calls per hour. But it seems to depend upon the application. It's a very gross rule of thumb. When it happens, the script is put on hold for one hour before resuming.

INFO:facebook-deep-learning:(#17) User request limit reached

facebook-profile-pictures-downloader's People

Contributors

dependabot-preview[bot] avatar philipperemy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

facebook-profile-pictures-downloader's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.