Giter VIP home page Giter VIP logo

facebook-archive's Introduction

facebook-archive

forthebadge forthebadge

Chat at Slack

PRs Welcome MIT Licence Python

GitHub open pull requests GitHub open issues

Analyse everything facebook knows about you, through their own archive.

In light of the recent facebook's data breach, Mark Zuckerberg made all the data available for each user via Facebook. You're going to need to download it, we'll get to it shortly. There are some things that would take a lot of time (too costly API calls) online, but can be easily done on archived data.

Note to KWoC contributors: All issues are available for KWoC, feel free to work on any issue after being assigned.

Table of Contents

Getting the data

  1. Head on to Facebook > Settings > General Settings > Your facebook information.
  2. Select the JSON data format and click on download archive. It might take some time to prepare the archive, this might take upto 10-15 minutes. NOTE: The download might be in order of 100s MBs. Disable photo and video download options to save some bandwidth. (My archive was ~300MB).
  3. If possible, download the same data in HTML format. It is much easier to browse through your archive and spot some interesting patterns in the HTML format, however this is not necessary. The JSON format will suffice for processing, refer #2.

↥ back to top

Usage

Install requirements with pip install -r requirements.txt

Friends

python plot_friends.py [path] [--from date] [--to date]
  • path is the path to the facebook data archive
  • --from date specify the beginning of the plot
  • --to date specify the end of the plot
  • date string is in format YYYY-MM-DD
> python plot_friends.py
Enter facebook archive extracted location: <location of extracted data folder,  e.g.: "facebook-kaustubhhiware">

You can also run the script on sample data included in the examples folder:

> python plot_friends.py
Enter facebook archive extracted location: ./examples

↥ back to top

Messages

Will be updated soon

  • Plot messages across all conversations.
> python plot_messages.py
Enter facebook archive extracted location: "location of extracted, downloaded zip: like facebook-kaustubhhiware" 

Locations

Will be updated soon

  • Plot your location history.
> python where_have_you_been.py 
Enter facebook archive extracted location: "location of extracted, downloaded zip: like facebook-kaustubhhiware" 

↥ back to top

Contributing

Your contributions are always welcome 😄 ! Please have a look at the contribution guidelines first.

Before working on an issue / feature, it is crucial that you're assigned the task on a GitHub issue.

  • If a relevant issue already exists, discuss on the issue and get yourself assigned on GitHub.
  • If no relevant issue exists, open a new issue and get it assigned to yourself on GitHub. Please proceed with a Pull Request only after you're assigned. It'd be a waste of your time as well as ours if you have not contacted us before hand when working on some feature / issue.

If you are here for GirlScript's Summer of Code and wish to seek assistance, feel free to contact any of the mentors on slack - @kaustubhhiware, @techytushar, @Anubhav, @fhackdroid, @Roopal.

↥ back to top

Features

(Click to expand)

Your friends

Plot the friends you make every day (blue), and the friends so far (orange).

Plot exclusively the friends you make each day.

Plot messages as a function of month.

↥ back to top

Your Messages

The following is available for either a specific chat (person / group) or for all messages.

Plot all messages so far,

Plot daily message frequency

Plot monthly message frequency

Plot yearly message frequency

↥ back to top

Top_10_friends_whom_I_message and Top_10_friends_who_message_me

Find the top ten friends whom you message and plot each friends no. of messages as a function of time https://github.com/hadesanirban/facebook-archive/tree/master/images/Top_10_Friends_whom_I_message

  • Plot Top_ten_Friends.
> python plot Top_ten_Friends.py --num_friends 7 (for example)
enter your official facebook name: "your name as in facebook i.e. Anirban Panda"
Enter facebook archive extracted location: "location of extracted, downloaded zip: like facebook-kaustubhhiware"

Also added a new command line argument named num_friends which helps you to plot as many friends as you want but default value is set to 10.

Friend Request

Plot the friends you make every day(Red) ,friend request send every day(green) and friend request received every day(blue)

We compare monthwise no.of friend request send vs friend request received

Your reactions

Plot count of different reactions to posts

Plot of 10 Friends whose posts you react to the most

Plot reactions as a function of month.

Plot cumulative count of different reactions on a single plot

↥ back to top

Your posts and comments

Wordcloud of common words in your posts and comments

Most tagged friends in your post

Your Locations

Plot all locations so far,

↥ back to top

Observations

  1. There is a spike in friends made in March (Election season) and July (new juniors, much higher spike).

  2. I tend to message less during exams (Feb, Apr, Sep, Nov).

  3. Highest number of messages sent at 9 and 11 pm, confirming with calls from home come at 10pm. Almost no messages shared between 3am-7am.

  4. I used to send more friend request as compared to friend request received.

  5. I tend to receive more friend request in the month of july,august(new juniors)

↥ back to top

Why

I always wanted to know how many friends I make every month. It would have been infeasible to make a webapp out of this because so many API calls would be so slow, and whosoever wants to work with Facebook's Graph API?

Plus it was raining and I couldn't go to MS's Hall Day till after the rain stopped.

Have a feature request? See an interesting avenue not utilised yet with facebook's archive? Let me know by making a new issue.

↥ back to top

License

The MIT License (MIT) 2018 - Kaustubh Hiware. Please have a look at the LICENSE for more details.

facebook-archive's People

Contributors

animesh-chouhan avatar farhaanbukhsh avatar galenwong avatar hadesanirban avatar kaustubhhiware avatar parimatrix avatar r00pal avatar techytushar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

facebook-archive's Issues

Where have you been?

  • What are the places frequently visited by you?
  • How often do you visit the same place?
  • how does it correlate with the location of you house/ college?
    Did you notice a pattern here?

EXIF Data for archived photos

Description

As a facebook-archive user,
I need to have EXIF Data (at least Date and Time (original)) from my photos archived,
so that I can upload / store this photos in an organized manner.

Explanation

Facebook archive doesn't provide EXIF data from your photos.
There is a way to obtain it by going to you_archive/photos_and_videos folder/album .
Here, there is all your albums in .html format.
In each .html, there is each photos and a link with photo timestamp.

Acceptance Criteria

Update [Required]

  • All photos in your_archive_/photos_and_videos/* should have an EXIF data (at least Date and time (original) attribute) extracted from your_archive_/photos_and_videos/album/* html files.

Definition of Done

  • All of the required items are completed.
  • Approval by 1 mentor.

Estimation

1 Day

How do you manage events around you?

What all events you have created so far?

  • Do they correspond to some particular page that you're handling?
  • Are they independent events

How to correspond to an event being invited to?

  • Do you generally accept the invite and never go
  • Do you mark it as interested?
  • Did you observe a pattern here?

Patterns in search history

  • How often do you search for yourself?
  • How often does a user search for a person, vs a group?
  • Which friends do you search again and again?

Extensions to this could be:

  • which people do you search for and immediately send a friend request to ? (Possible to get since time stamps logged)
  • Bonus: Did you react / comment on any of their posts recently before sending a friend request?

Friends > Did you send them a request or they did?

This is an addition over the current feature.
How many friends you make, based on received friend requests, and requests you sent. For each day, you have 2 entries, friends you made who sent you requests, and friends you sent a request to. (ed

Reacts > How do you react?

  • How often do you use love reacts?
  • What friends' posts are you most likely to react to?
  • Is their a friend for whose posts you use a single react?

Friends > Plot friends only since the last year

Description

Extend functionality of plot_friends.py to specify from and to period. Currently doing for all time.
Something of the type: python plot_friends.py --from 2018-12-01, should support both --from and --to flags, both optional.

Mocks

Steps to reproduce (Incase of a bug)

NA

Acceptance Criteria

Update [Required]

  • Support --from and --to flags in plot_friends.py

Enhancement to Update [Optional]

  • [LIST ITEMS]

Definition of Done

  • All of the required items are completed.
  • Approval by 1 mentor.

Estimation

4-5 hours after having an idea of original code

Wordcloud of user's comments and posts

This task could be a bit harder, since you have to process both posts and comments, and then aggregate words from both, to form a wordcloud.

  • What hashtags do you frequently use?
  • What language do you often use?
  • Skip proper nouns, esp. all your friend's names.

Classify ad interests

Classify a facebook user's ad interests into categories, say sports, engineering, humor, developing, etc.

This issue has a lot of potential, and is currently only in ideation phase. Please discuss your vision about how / what can be done here.

Messages > Top 10 friends that message you

Plot number of messages with each friend as a function of time. Since this could get messy, plot only top 10-20 friends.

This could include treating any message exchange as a message, or
it could have two plots:

  • top 10 friends by who message you
  • top 10 friends who you message

The case of the missing docstrings

We have few scripts but they are not documented enough, a lot of functions are missing docstring, I feel it will be a good addition to have a description regarding what are they doing.

message.json filenames trigger fatal errors in plot_messages.py

Description

There appears to be a bug in the plot_messages.py script due to a discrepancy between the naming convention currently used by Facebook vs. the name expected by the script for .json files inside the messages directory. The script expects those files to be named "message.json". However, in a download that I did this week of my Facebook archives, the files were named "message_1.json". This naming discrepancy causes the plot_messages.py script to fail to recognize the files and subsequently throw a fatal error.

I also noticed some documentation issues in the project's README.md file related to plot_messages.py. Currently the README.md file says there is an "-a" option for generating plots of all messages, but no such option exists. Also, the instructions for specifying a user's ID when plotting messages for a specific user don't work.

Acceptance Criteria

Update [Required]

  • Running plot_messages.py will not throw fatal errors.
  • The script can successfully find and process message files that match either a pattern of "message.json" or "message_\d+.json".
  • The README.md file does not give incorrect information about how to run the plot_messages command.

Definition of Done

  • All of the required items are completed.
  • Approval by 1 mentor.

Estimation

1 hour

README python version issue

Description

The README file indicates the code uses python 2.7.

However, when I tried to run plot_friends.py, the given example gives an error.

$ python plot_friends.py
Enter facebook archive extracted location: ./examples
Traceback (most recent call last):
  File "plot_friends.py", line 87, in <module>
    friends(loc)
  File "plot_friends.py", line 22, in friends
    loc = input('Enter facebook archive extracted location: ')
  File "<string>", line 1
    ./examples
    ^
SyntaxError: invalid syntax

This is because the python 2.7 input function expects a python expression that can be executed.

Mocks

N/A

Steps to reproduce (Incase of a bug)

  • Use python 2.7
  • Install required dependency pip install -r requirements.txt
  • run python plot_friends
  • input ./exmaples

Acceptance Criteria

Update [Required]

Choose Either one of the solutions:

  • Change to python 3.x on the README file
  • Change to raw_input instead of input
  • Use command line argument instead of having the user input it manually (recommended)

Definition of Done

  • All of the required items are completed.
  • Approval by 1 mentor.

Estimation

2-3 hours

Timeline of your posts

  • What time does the user post?
  • What months the user is most active?
  • Do posts with images get more likes? (ML-esque)
  • What times do other people post on your wall (Obvious birthday posts hike)

More additions can be suggested.

Location > Cluster map of all your locations

Create a map which tells how often you checked in ,
and the circle size for any place would be proportional to the number of times you were in the city.
Example: If I was 5 times in Bangalore, 2 times in Kolkata, 1 time in Pune; I would get a map of India (world?) that shows these places, and the circle over Bangalore would obviously be the biggest.

Example: onemilliontweetmap.com

Analyse user posted comments

  • Analyse what time user most likely to comment
  • How frequently do you comment on a particular friend's post? Which friends' posts are you most likely to comment on?

None of these are straight answers, all of them involve graphs.

Webapp to analyse without running scripts

The goal is to develop a webapp so that the user does not have to use command line at all.

However, this webapp has to explicitly mention that no data is being uploaded anywhere, and if the user feels, they can download the code and run it on their local machines just to be sure.

I'm not sure how many people would be willing to use a webapp, but this could open up an avenue of opportunities in terms of visualizations.

Work on this will only start once sufficient features are supported in the command line, until then this issue will be marked wontfix.

Messages> Who talks more?

  • For a given chat thread, what percentage of messages do you send?
  • How many conversations does a person initiate?
  • Are there any friends who never reply?

You can use this to find which friend is most needy.

  • How many chats are just You're now connected on Messenger. ?

Accommodate for different archive versions

The naming pattern for files and organisation for current archive is radically different as compared to the archive available at the time of writing this code first.

Since facebook now allows you to download data in html OR json format, we'll be using JSON for the following reasons, as suggested by @xprilion as well :

  • JSON easier to work with
  • JSON more accurate, as html tag search (current approach) might go wrong.
  • JSON less susceptible to change.

This issue is more of a guideline, so won't be closed any time soon.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.