Giter VIP home page Giter VIP logo

whatsapp-analyzer's Introduction

WhatsApp-Analyzer

Analyze WhatsApp chat

The script reads an exported WhatsApp chat and then extracts the data. You may need to install some packages before running it.

Supported Analysis

  • Chat Count
  • Chat Average
  • Member/Sender Rank
  • Website/URL/Link Domain Rank
  • Word Count and Rank
  • Most Used Word by Sender
  • Emoji Usage Rank
  • Most Used Emoji by Sender
  • Timestamp Heatmap
  • Attachment Classification (In Android, there is no difference pattern for attachment. But in iOS we can actually classify between Image, Video, Audio, GIF, Sticker, Document and Contact Card)

Preview


  • Sender Rank Sender rank
  • Domain rank Domain rank
  • Word Rank Word rank
  • Most used word by sender Most used word by sender
  • Emoji rank Emoji rank
  • Most used emoji by sender Most used emoji by sender
  • Chat activity heatmap Heatmap

Requirements


  • Python 3.6+
pip install -r requirements.txt

Usage


$ git clone https://github.com/PetengDedet/WhatsApp-Analyzer.git

$ cd WhatsApp-Analyzer
$ python whatsapp_analyzer.py chat_example.txt --stopword indonesian 
usage: python whatsapp_analyzer.py FILE [-h] [-d] [-s] [-c]

Read and analyze whatsapp chat

positional arguments:
  FILE                  Chat file path

optional arguments:
  -h, --help            show this help message and exit
  -d, --debug           Debug mode. Shows details for every parsed line.
  -s , --stopword       Stop Words: A stop word is a commonly used word (such
                        as 'the', 'a', 'an', 'in'). In order to get insightful
                        most common word mentioned in the chat, we need to
                        skip these type of word. The Allowed values are:
                        arabic, bulgarian, catalan, czech, danish, dutch,
                        english, finnish, french, german, hebrew, hindi,
                        hungarian, indonesian, italian, malaysian, norwegian,
                        polish, portuguese, romanian, russian, slovak,
                        spanish, swedish, turkish, ukrainian, vietnamese
  -c , --customstopword 
                        Custom Stop Words. File path to stop word. File must a
                        raw text. One word for every line

Stop Words


I've included stop words for several languages from https://github.com/Alir3z4/stop-words. You can use your own stop word file. Just use -c argument followed by filepath. One word for each file like below

able
ableabout
about
above
abroad
abst

Notes


  • This script uses regex to extract the data.
  • Currently supports the chat pattern below:
   "14/10/18, 11:16 - Contact Name: this is a message"
   "2/30/18, 2:07 AM - Contact Name:  Test👌"
   "[30/12/18 4.59.25 PM] Nama User: 🙏test"
   "[06/07/17 13.23.30] ‪+62 123-456-78910‬: image omitted"
  • Some date formats may not be supported

Flowchart


Describe how the script identify and classify the chat

           +------------------+
      +----+    Empty line?   +----+
      |    +------------------+    |
      |                            |
      |                            |
  +---v---+                   +----v---+
  |  Yes  | +-----------------+   No   |
  +-------+ |                 +---+----+
            |                     |
  +---------+-+             +-----v-----+
  | Event Log |        +----+    Chat   +----+
  +-----------+        |    +-----------+    |
                       |                     |
                +------v-----+         +-----v------+   +--------------------+
          +-----+Regular Chat+----+    | Attachment +-->+ Clasify Attachment |
          |     +------------+    |    +------------+   +-------+------------+
          v                       v                             |
+---------+---------+   +---------+----------+                  |
|   Starting Line   |   |   Following Line   |                  |
+------+------------+   +-+------------------+                  |
       |                  |                                     |
       |                  |                                     |
       |           +------v-------+                             |
       |           | COUNTER      |                             |
       |           | 1 Chat       |                             |
       +---------->+ 2 Timestamp  +<----------------------------+
                   | 3 Sender     |
                   | 4 Domain     |
                   | 5 Words      |
                   | 6 Attachment |
                   | 7 Emoji      |
                   +-----+--------+
                         |
                         |
                         |
                         v
              +----------+----------------+
              |          Visualize        |
              +---------------------------+

Getting chat source

Android:

  • Open a chat/group chat
  • Tap on three dots on the top right
  • Tap "More"
  • Choose "Export chat"
  • Choose "Without Media"

iOS

  • Open a chat/group chat
  • Tap on contact name/group name on the top to see the details
  • Scroll down to find "Export Chat" menu
  • Choose "Without Media"

Other Tech Port


  • Web: Coming soon
  • Jupyter Notebook: Coming soon
  • NodeJS: Coming soon

Help Needed


  • Need contributor to rearrange directory structure to match python best practice.
  • iOS exported example needed

Buy me a coffee

Buy Me A Coffee

whatsapp-analyzer's People

Contributors

albertopasqualetto avatar carfmeza avatar carlosffm avatar cvzi avatar finchmeister avatar kybek avatar petengdedet avatar staticf0x avatar yafp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

whatsapp-analyzer's Issues

Date format flipping

I'm using your chatline.py script and have come across a strange bug. For most of the month of July in one of my chats, the timestamp format flips from YYYY-MM-DD to YYYY-DD-MM. Example shown below

Code
error0
Output
error1

You can see here the date format is different between these 2 messages and I can't work out why.

I'll update if I find the source of the bug myself

Unable to run the script

Hi, I have been using this script for long time on my previous MacBook. I have recently switched to MBA M2 but I can't run the script and I keep getting these errors.

python3 whatsapp_analyzer.py _chat.txt

Traceback (most recent call last):
File "whatsapp_analyzer.py", line 103, in
chatline = Chatline(line=line, previous_line=previous_line, debug=args.debug)
File "/Users/ahmadhammouri/Desktop/WhatsApp-Analyzer-master/chatline.py", line 25, in init
self.parse_line(line)
File "/Users/ahmadhammouri/Desktop/WhatsApp-Analyzer-master/chatline.py", line 169, in parse_line
self.parse_body(body)
File "/Users/ahmadhammouri/Desktop/WhatsApp-Analyzer-master/chatline.py", line 226, in parse_body
emjs = self.extract_emojis(message_body)
File "/Users/ahmadhammouri/Desktop/WhatsApp-Analyzer-master/chatline.py", line 119, in extract_emojis
if c in emoji.UNICODE_EMOJI:
AttributeError: module 'emoji' has no attribute 'UNICODE_EMOJI'

Any help is greatly appreciated
Thanks

Duplicated message count

Messages containing new lines are treated as separate messages, which inflates message count.
Grouping by timestamp and sender can resolve this problem

emojis missing?

Hi there,

I'm trying to get this running on Ubuntu. When I try to start it up I get the following error message in the terminal:

ModuleNotFoundError: No module named 'emoji'

Is this missing or something I need to add myself. If I have to add it myself, can you please let me know how?

This chat has no member & This chat does not contain any datetime

I get the following output

Extracting data. Please wait....
Generating dataframe...
Generating plot...
This chat has no member
This chat does not contain any datetime
...

I assume it has to do with the format of the timestamp which seems typical german. Example

16.06.18, 22:16

which represents:

DD.MM.YY, HH:MM

Any chance you can add support for this kind of timestamp?

Date and Time Format

A lot of chats follow the date and time pattern of a 12 hours clock the code does not keep that in mind

Ignore chatlines with no-media references

I did export my chat log with the option "without media"

This results in the exported .txt in some lines like that

TIMESTAMP - MEMBER: <Medien ausgeschlossen>

I assume those lines should be ignored in the data-collection, but right now they are included (i.e. in the wordlist)

Random misalignment on the heatmap sometimes.

When testing the code I noticed that the vertical axis of the heatmap is sometimes misaligned. Probably a resolution or sizing problem. I am not versed in matplot so I wouldn't know.

Omitted media

When loading a chat file without media, in the .txt there is a placeholder string in place of the media.

For example this line is from italian:
08/11/21, 00:29 - Alberto: <Media omessi>

I think those placeholders should be added to each language specific stopwords list.
Otherways "media" and "omessi" will be most the present words
image

unable to see the smileys in Top 20 Emoji

Hi PetengDedet,

Good work by you!!
after execution i dont see smileys in my command prompt. Is there anything else i have to take care ?

Please find the following output.

#Top 20 Emoji

      char_count

emj_char
? 50
? 44
? 16
? 14
? 13
? 9
? 4
? 3
? 2
? 2
? 1

Getting Error in Line 183

Screenshot_2022-07-08_18-15-34
While running the code getting error in line 183

File "whatsapp_analyzer.py", line 183
print("\r{} |{} {}".format(label, bar, Color.bold(str(value))), end = printEnd)
^
SyntaxError: invalid syntax

Error "import pandas as pd"

Problem: When running the script exactly as per instructions provided in this repository, users experience issues suggesting pandas is not installed. If it isn't installed I am sure you would install pandas but in my and other cases I know, we had pandas installed but still get error as per below example.

python whatsapp_analyzer.py

Please input the chat filepath: ../chat.txt
Please select common word file or leave it blank to escape:
1: Indonesian (id_cw.py)
2: English (en_cw.py)
3: Custom file
4: Skip common word
2
You wanna print the verbose mode? y/[N]: y
Traceback (most recent call last):
File "whatsapp_analyzer.py", line 67, in
import pandas as pd
ImportError: No module named pandas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.