Giter VIP home page Giter VIP logo

whatsapp-chat-parser's Introduction

WhatsApp Chat Parser

Continuous Integration codecov npm version minified size

A package to parse WhatsApp chats with Node.js or in the browser 💬

Important notice

🚨 v4.0.0 brings some BREAKING CHANGES, check out the release page for more info.

Introduction

This library allows you to parse WhatsApp chat logs from text format into javascript objects, enabling you to more easily manipulate the data, create statistics, export it in different formats, etc.

You can test the package online with this example website:
whatsapp-chat-parser.netlify.app (Source code)

Install

$ npm install whatsapp-chat-parser

Usage

Node

import fs from 'node:fs';
import * as whatsapp from 'whatsapp-chat-parser';

const text = fs.readFileSync('path/to/_chat.txt', 'utf8');
const messages = whatsapp.parseString(text);

console.log(messages);

Browser

Add the script to your HTML file (usually just before the closing </body> tag).
Then use it in your JavaScript code, the whatsappChatParser variable will be globally available.

<script src="path/to/index.global.js"></script>
<script>
  const messages = whatsappChatParser.parseString(
    '06/03/2017, 00:45 - Sample User: This is a test message',
  );

  console.log(messages);
</script>

Or with type="module" loading the ESM version:

<script type="module">
  import * as whatsapp from 'path/to/index.js';

  const messages = whatsapp.parseString(
    '06/03/2017, 00:45 - Sample User: This is a test message',
  );

  console.log(messages);
</script>

You can also use the jsDelivr CDN.

<script src="https://cdn.jsdelivr.net/npm/whatsapp-chat-parser/dist/index.global.js"></script>
<!-- Or use a specific version -->
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/index.global.js"></script>

Message structure

The messages variable is an array of objects like this:

[
  {
    date: '2018-06-02T22:45:00.000Z', // Date object
    author: 'Luke',
    message: 'Hey how are you?',
  },
  {
    date: '2018-06-02T23:48:00.000Z', // Date object
    author: 'Joe',
    message: 'All good, thanks',
  },
];

When using the option parseAttachments, the message may contain an additional property attachment:

[
  {
    date: '2018-06-02T23:50:00.000Z', // Date object
    author: 'Joe',
    message: '<attached: 00000042-PHOTO-2020-06-07-15-13-20.jpg>',
    attachment: {
      fileName: '00000042-PHOTO-2020-06-07-15-13-20.jpg',
    },
  },
];

In the case of a system message, the author will be null

[
  {
    date: '2018-06-02T22:45:00.000Z', // Date object
    author: null,
    message: 'You created group "Party 🎉"',
  },
];

API

parseString(string, [options]) → Array

string

Type: string

Raw string of the WhatsApp conversation

options

Type: object

A configuration object, more details below

Options

Name Type Default Description
daysFirst Boolean undefined Specify if the dates in your log file start with a day (true) or a month (false). Manually specifying this may improve performance. By default the program will try to infer this information using 3 different methods (look at date.ts for the implementation), if all fails it defaults to days first.
parseAttachments Boolean false Specify if attachments should be parsed. If set to true, messages with attachments will include an attachment property with information about the attachment.

A note about messages order

Sometimes, likely due to connection issues, WhatsApp exports contain messages that are not chronologically ordered.
This library won't change the order of the messages, but if your application expects a certain order make sure to sort the array of messages accordingly before use.

See #247 for more info.

How to export WhatsApp chats

Technologies used

Requirements

Node

Node.js >= 8.0.0

Browser

This package is written in TypeScript with target compilation to ES6.
It should work in all relevant browsers from ~2017 onwards.

Changelog

CHANGELOG

License

MIT

whatsapp-chat-parser's People

Contributors

dependabot[bot] avatar greenkeeper[bot] avatar mintonne avatar pustur avatar renovate-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

whatsapp-chat-parser's Issues

media attachments

Hello, congratulations on the fantastic project. Would that be possible to add support to WhatsApp attachments (images, videos, audio messages, etc.) in the form of links to corresponding files? The media files can be exported together with the chat.txt file. Thank you.

Improve parsing speed

Hey,

Is there a way to improve the parsing times?

Parsing duration (Using a web worker)

Firefox 67.0 (64-bit)

Parsed 8129 messages in 721 ms
Parsed 20251 messages in 4720 ms
Parsed 40645 messages in 20498 ms

Edge Chromium Version 76.0.167.1 (Official build)

Parsed 8129 messages in 569 ms
Parsed 20251 messages in 2705 ms
Parsed 40645 messages in 12423 ms

The duration is almost 2x/3x on low to mid-end smartphones.

parseAttachments stopped working with the new whatsapp exports.

Tried to parse chats from my WhatsApp group and parseAttachments was not working, I think whatsapp had changed the format for including media

Even after using parseAttachments, I am getting this as output:

{
  "date": "2021-04-28T02:00:00.000Z",
  "author": "+91 77606 66008",
  "message": "IMG-20210428-WA0001.jpg (file attached)"
}

Screenshot 2021-06-01 at 11 55 36 AM

How to go about testing both the src/ code and dist/ code

In index.test.js we are not currently testing index.js but instead we are requiring the built and minified file whatsapp-chat-parser.min.js:

const whatsappParser = require('../dist/whatsapp-chat-parser.min.js');

The reason behind this is that I want to make sure that even after going through rollup the file that is shipped to the users still works properly.

But there are a few problems with this approach:

  • The coverage generated by npm run test:coverage is not accurate and thus becomes useless. In the future I'd like to have a badge with the coverage report so we need it to work properly
  • Running npm run test:watch is not reliable because it doesn't rebuild the files after every save
  • Testing the dist/ file is more of an integration test than a unit test

So I think the correct thing to do is test index.js directly inside index.test.js (as we used to) and then add an integration test that tests the distribution file whatsapp-chat-parser.min.js.

This however has another little problem:

  • The contents of the integration test would be the same as index.test.js so it would be a repetition. Edits made to one file should be mirrored to the other and it's easy to forget it.

Also I think that this kind of test should only run before publishing to npm (prepublishOnly) and on circleci, not after every commit since building the files is a bit slow.

How can we solve these problems elegantly?
I'd like to hear some ideas

Add type definitions for Typescript

I'd like to add type definitions but I don't have any experience with Typescript yet.
The definition would be for the public API, so just this function:

function parseString(string, options) {

The API is pretty well documented in the readme.

I'd like to have a types folder in the root of the project, with a index.d.ts file inside.
When done, a PR can be opened against the develop branch.

Anyone interested in taking this? Would be very appreciated.

Sync parseString

Is it possible to have a non Promise version of parseString? Maybe parseStringSync?
Reading index.ts seems like the input string is being passed to makeArrayOfMessages then the result to parseMessages, but neither of them returns a promise. So, is there a reason why the function should return a Promise?
I'm using the library in a context where I don't want async. Making it async forces me to use async everywhere.

Styling in Whatsapp messages

Whatsapp messages have styling within text, such as bold, underline, strikethrough. The given format only allows for plain text messages.

I'm new to open source, but would be happy to help build this feature, but can't think of a way to get this working without breaking the API contract.

Messages order

Sometimes messages are not chronologically ordered, for example WhatsApp may export this (due connection issues I think):

12/21/20, 23:50 - A: blah
12/21/20, 23:49 - B: blah
12/21/20, 23:50 - B: blah

This leads whatsapp-chat-parser to generate the messages array in the same order.

This may be by design, one may want to preserve the exact order messages were found in the text file.
I am working in a project that assumes messages come in chronological order and found out some WhatsApp exports were breaking because of this. I can just sort the messages after the parseString call and everything works fine on my side, I just wanted to let you know that it can happen and you may want to add a warning somewhere or add a sort at the end.


While we are at it, I wanted to thank you for making and maintaining whatsapp-chat-parse, I use it in chat-analytics and WhatsApp is by far the most complicated platform to deal with.

The package can't be installed

Hey, I was debugging some problem (#117) in chat-analytics (which uses whatsapp-chat-parser) and found out the package can't be installed due the postinstall script.

> npm i whatsapp-chat-parser
npm error code 1
npm error path C:\Users\Lombi\node_modules\whatsapp-chat-parser
npm error command failed
npm error command C:\Windows\system32\cmd.exe /d /s /c husky install
npm error 'husky' is not recognized as an internal or external command,
npm error operable program or batch file.

npm error A complete log of this run can be found in: C:\Users\Lombi\AppData\Local\npm-cache\_logs\2024-08-13T17_30_41_373Z-debug-0.log

The problem persists when trying to install chat-analytics.

npm i -g chat-analytics
npm error code 1
npm error path C:\Users\Lombi\AppData\Roaming\npm\node_modules\chat-analytics\node_modules\whatsapp-chat-parser
npm error command failed
npm error command C:\Windows\system32\cmd.exe /d /s /c husky install
npm error 'husky' is not recognized as an internal or external command,
npm error operable program or batch file.

npm error A complete log of this run can be found in: C:\Users\Lombi\AppData\Local\npm-cache\_logs\2024-08-13T17_33_06_953Z-debug-0.log

I recommend removing the postinstall script. I usually have those disabled, but forgot to do so in this machine.

😃

what about new format?

hi thanks for your great work,
I just tried and see the WhatsApp export format seem different, may I ask how to deal with it?

 [7/11/2017, 8:20:41 PM] someone: ‎bla bla bla

thanks!

New format includes a special character between the time and PM / AM

From an email I received:

Hey, you're amazing for creating this tool, so thanks from New York.

But although the example file works perfectly, none of my exports work :-(
I dragged and dropped, and even used file explorer to choose it directly. What am I doing wrong?

I edited the text file only to replace names, didn't touch anything else.

The text chat attached:

1/4/23, 7:02 PM - Messages and calls are end-to-end encrypted. No one outside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more.
2/2/23, 2:51 AM - Person 2: <Media omitted>
2/4/23, 10:02 PM - Person 2: <Media omitted>
2/4/23, 10:35 PM - Person 1: Got it. So is it 4:30 or 6?
2/4/23, 10:35 PM - Person 2: It was changed
2/4/23, 10:35 PM - Person 2: Time change!
2/4/23, 10:35 PM - Person 1: K, I'll be there around 6 bz"h
2/4/23, 10:35 PM - Person 2: Great see you then!

The problem is a Narrow No-Break Space used between the time and AM / PM.

Wrong dates when export has format dd/MM/YYYY

Parsing the following export:

30/12/2020 13:00 - User: text
13/1/2021 13:00 - User: text

Produces the following output:

[
    {
        "date": "2020-12-30T16:00:00.000Z",
        "author": "User",
        "message": "text"
    },
    {
        "date": "2022-01-01T16:00:00.000Z",
        "author": "User",
        "message": "text"
    }
]

Note the second date.

I didn't check the root of the problem, but it seems a dd/MM / MM/dd problem.

And it seems that after that date is broken, every following date is broken too 😢

Right-to-left languages, media limitations and fixed chat constants

As I'm not a programmer, I'm not sure if I should post the issues here or on whatsapp-chat-parser-website.
I wanted to use the program to display a single fixed chat on my website, and ran into the following issues:

  1. when Hebrew or Arabic text is mixed with English words, the text become garbled. This is because the parser can not handle right-to-left languages. It also does not align them to the right.
  2. I wanted to display my chat with the original full-sized images and not those reduced by whatsapp (or omitted altogether). I found that the parser does not recognize png images as media, can not handle large movies (as they are converted to base64 instead of being downloaded directly from the file) and does not understand different media syntax in the filename (if you want to give the images more descriptive names).
  3. I'm missing an option to have the chat's filename and parameters defined as fixed constants without user intervention.

Thank you for your great project!

Different languages format support

I like your library!
Unfortunately different languages generate the file differently. For example a chat history file generated on a German phone looks like this:

[19.04.17, 20:43:09] John Doe: ‎message 123

Is there any way this could be supported?

WhatsApp format problem

Hey, my whatsapp text log uses this format:

20/6/2017 8:28 p. m. - Michelle: test

it doesn't work since it shows:

{
  author: "p. m. - Michelle",
  date: Tue Jun 20 2017 08:28:00 GMT-0500 (hora estándar oriental),
  message: "test"
}

Voice messages not playing

Hi, first of all, thanks for this tool, I really needed something like this.

The issue I'm having is that it doesn't seem to be loading the voice message files correctly. All my voice messages appear as an audio player with the message "Error".

Screenshot 2023-12-20 at 1 35 08 PM

I'm uploading the chat from a zip file, and I've verified that the audio files for the voice messages are present in .opus format. I don't have any errors or messages on my console. Other media (at least photos and videos) seems to work fine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.